Wolfram Computation Meets Knowledge

2D Face Alignment Net Trained on 300W Large Pose Data

Determine the locations of keypoints from a facial image

Developed in 2017 at the Computer Vision Laboratory at the University of Nottingham, this net predicts the locations of 68 2D keypoints (17 for face contour, 10 for eyebrows, 9 for nose, 12 for eyes, 20 for mouth) from a facial image. For each keypoint, a heat map for its location is produced. Its complex architecture features a combination of hourglass modules and multiscale parallel blocks.

Number of layers: 967 | Parameter count: 23,874,320 | Trained size: 97 MB

Training Set Information

Examples

Resource retrieval

Retrieve the resource object:

In[1]:=
ResourceObject["2D Face Alignment Net Trained on 300W Large Pose \
Data"]
Out[1]=

Get the pre-trained net:

In[2]:=
NetModel["2D Face Alignment Net Trained on 300W Large Pose Data"]
Out[2]=

Basic usage

This net outputs a 64x64 heat map for each of the 68 landmarks:

In[3]:=
Out[3]=

Obtain the dimensions of the heat map:

In[4]:=
Dimensions[heatmaps]
Out[4]=

Visualize heat maps 1, 12 and 29:

In[5]:=
MatrixPlot /@ heatmaps[[{1, 12, 29}]]
Out[5]=

Evaluation function

Write an evaluation function that picks the maximum position of each heat map and returns a list of landmark positions:

In[6]:=
netevaluation[img_] := Block[
  {heatmaps, posFlattened, posMat},
  heatmaps = 
   NetModel["2D Face Alignment Net Trained on 300W Large Pose Data"][
    img];
  posFlattened = 
   Map[First@Ordering[#, -1] &, Flatten[heatmaps, {{1}, {2, 3}}]];
  posMat = QuotientRemainder[posFlattened - 1, 64] + 1;
  1/64.*Map[{#[[2]], 64 - #[[1]] + 1} - 0.5 &, posMat]
  ]

Landmark positions

Get the landmarks using the evaluation function. Coordinates are rescaled to the input image size so that the bottom-left corner is identified by {0, 0} and the top-right corner by {1, 1}:

In[7]:=
Out[7]=

Group landmarks associated with different facial features by colors:

In[8]:=
groupings = 
  Span @@@ {{1, 17}, {18, 22}, {23, 27}, {28, 36}, {37, 42}, {43, 
     48}, {49, 60}, {61, 68}};

Visualize the landmarks:

In[9]:=
Out[9]=

Preprocessing

The net must be evaluated on facial crops only. Get an image with multiple faces:

In[10]:=

Write an evaluation function that crops the input image around faces and returns the crops and facial landmarks:

In[11]:=
findFacialLandmarks[img_Image] := Block[
  {crops, points},
  crops = ImageTrim[img, #] & /@ FindFaces[img];
  points = If[Length[crops] > 0, netevaluation /@ crops, {}];
  MapThread[<|"Crop" -> #1, "Landmarks" -> #2|> &, {crops, points}]
  ]

Evaluate the function on the image:

In[12]:=
output = findFacialLandmarks[img]
Out[12]=

Visualize the landmarks:

In[13]:=
HighlightImage[#Crop, 
   Graphics@
    Riffle[Thread@Hue[Range[8]/8.], 
     Map[Point, Function[p, Part[#Landmarks, p]] /@ groupings]], 
   DataRange -> {{0, 1}, {0, 1}}, ImageSize -> 300] & /@ output
Out[13]=

Robustness to facial crop size

Get an image:

In[14]:=

Crop the image at various sizes:

In[15]:=
crops = Table[ImageCrop[img, s, Bottom], {s, 290, 500, 70}]
Out[15]=

Inspect the network performance across the crops:

In[16]:=
HighlightImage[#, 
   Graphics@
    Riffle[Thread@Hue[Range[8]/8.], 
     Map[Point, Function[p, Part[netevaluation[#], p]] /@ groupings]],
    DataRange -> {{0, 1}, {0, 1}}, ImageSize -> 250] & /@ crops
Out[16]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[17]:=
jsonPath = 
 Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], 
  NetModel["2D Face Alignment Net Trained on 300W Large Pose Data"], 
  "MXNet"]
Out[17]=

Export also creates a net.params file containing parameters:

In[18]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[18]=

Get the size of the parameter file:

In[19]:=
FileByteCount[paramPath]
Out[19]=

The size is similar to the byte count of the resource object:

In[20]:=
ResourceObject[
  "2D Face Alignment Net Trained on 300W Large Pose \
Data"]["ByteCount"]
Out[20]=

Requirements

Wolfram Language 11.2 (September 2017) or above

Reference