Wolfram Neural Net Repository
Immediate Computable Access to Neural Net Models
Detect and localize human joints and objects in an image
Get the pre-trained net:
In[1]:= | ![]() |
Out[1]= | ![]() |
This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:
In[2]:= | ![]() |
Out[2]= | ![]() |
Pick a non-default net by specifying the parameters:
In[3]:= | ![]() |
Out[3]= | ![]() |
Pick a non-default uninitialized net:
In[4]:= | ![]() |
Out[4]= | ![]() |
Define the label list for this model:
In[5]:= | ![]() |
Define helper utilities for netevaluate:
In[6]:= | ![]() |
Write an evaluation function to estimate the locations of the objects and human keypoints:
In[7]:= | ![]() |
Obtain the detected bounding boxes with their corresponding classes and confidences as well as the locations of human joints for a given image:
In[8]:= | ![]() |
In[9]:= | ![]() |
Inspect the prediction keys:
In[10]:= | ![]() |
Out[10]= | ![]() |
The "ObjectDetection" key contains the coordinates of the detected objects as well as its confidences and classes:
In[11]:= | ![]() |
Out[11]= | ![]() |
Inspect which classes are detected:
In[12]:= | ![]() |
Out[12]= | ![]() |
The "KeypointEstimation" key contains the locations of top predicted keypoints as well as their confidences for each person:
In[13]:= | ![]() |
Out[13]= | ![]() |
Inspect the predicted keypoint locations:
In[14]:= | ![]() |
Visualize the keypoints:
In[15]:= | ![]() |
Out[15]= | ![]() |
Visualize the keypoints grouped by person:
In[16]:= | ![]() |
Out[16]= | ![]() |
Visualize the keypoints grouped by a keypoint type:
In[17]:= | ![]() |
Out[17]= | ![]() |
Define a function to combine the keypoints into a skeleton shape:
In[18]:= | ![]() |
Visualize the pose keypoints, object detections and human skeletons:
In[19]:= | ![]() |
Out[19]= | ![]() |
In[20]:= | ![]() |
Obtain the detected bounding boxes with their corresponding classes and confidences as well as the locations of human joints for a given image:
In[21]:= | ![]() |
Visualize the pose keypoints, object detections and human skeletons. Note that some of the keypoints are misaligned:
In[22]:= | ![]() |
Out[22]= | ![]() |
Inspect the various effects of a radius defined by an optional parameter "NeighborhoodRadius":
In[23]:= | ![]() |
Out[23]= | ![]() |
For the default input size of 512x512, the net produces 128x128 bounding boxes whose centers mostly follow a square grid. For each bounding box, the net produces the box size and the offset of the box center with respect to the square grid:
In[24]:= | ![]() |
In[25]:= | ![]() |
Out[25]= | ![]() |
Change the coordinate system into a graphics domain:
In[26]:= | ![]() |
Compute and visualize the box center positions:
In[27]:= | ![]() |
Visualize the box center positions. They follow a square grid with offsets:
In[28]:= | ![]() |
Out[28]= | ![]() |
Compute the boxes' coordinates:
In[29]:= | ![]() |
In[30]:= | ![]() |
Out[30]= | ![]() |
Define a function to rescale the box coordinates to the original image size:
In[31]:= | ![]() |
Visualize all the boxes predicted by the net scaled by their "objectness" measures:
In[32]:= | ![]() |
Out[32]= | ![]() |
Visualize all the boxes scaled by the probability that they contain a dog:
In[33]:= | ![]() |
Out[33]= | ![]() |
In[34]:= | ![]() |
Out[34]= | ![]() |
Superimpose the cat prediction on top of the scaled input received by the net:
In[35]:= | ![]() |
Out[35]= | ![]() |
In[36]:= | ![]() |
Every box is associated to a scalar strength value indicating the likelihood that the patch contains an object:
In[37]:= | ![]() |
In[38]:= | ![]() |
Out[38]= | ![]() |
The strength of each patch is the maximal element aggregated across all classes. Obtain the strength of each patch:
In[39]:= | ![]() |
Out[40]= | ![]() |
Visualize the strength of each patch as a heat map:
In[41]:= | ![]() |
Out[41]= | ![]() |
Stretch and unpad the heat map to the original image domain:
In[42]:= | ![]() |
Out[42]= | ![]() |
Overlay the heat map on the image:
In[43]:= | ![]() |
Out[43]= | ![]() |
Obtain and visualize the strength of each patch for the "dog" class:
In[44]:= | ![]() |
Out[47]= | ![]() |
Overlay the heat map on the image:
In[48]:= | ![]() |
Out[48]= | ![]() |
Define a general function to visualize a heat map on an image:
In[49]:= | ![]() |
In[50]:= | ![]() |
Out[50]= | ![]() |
Automatic image resizing can be avoided by replacing the NetEncoder. First get the NetEncoder:
In[51]:= | ![]() |
Out[51]= | ![]() |
Note that the NetEncoder resizes the image by keeping the aspect ratio and then pads the result to have a fixed shape of 512x512. Visualize the output of NetEncoder adjusting for brightness:
In[52]:= | ![]() |
Out[52]= | ![]() |
Create a new NetEncoder with the desired dimensions:
In[53]:= | ![]() |
Out[53]= | ![]() |
Attach the new NetEncoder:
In[54]:= | ![]() |
Out[54]= | ![]() |
Obtain the detected bounding boxes with their corresponding classes and confidences for a given image:
In[55]:= | ![]() |
Visualize the detection:
In[56]:= | ![]() |
Out[56]= | ![]() |
Note that even though the localization results and the box confidences are slightly worse compared to the original net, the resized network runs significantly faster:
In[57]:= | ![]() |
Out[57]= | ![]() |
In[58]:= | ![]() |
Out[58]= | ![]() |
Inspect the number of parameters of all arrays in the net:
In[59]:= | ![]() |
Out[59]= | ![]() |
Obtain the total number of parameters:
In[60]:= | ![]() |
Out[60]= | ![]() |
Obtain the layer type counts:
In[61]:= | ![]() |
Out[61]= | ![]() |
Display the summary graphic:
In[62]:= | ![]() |
Out[62]= | ![]() |