Resource retrieval
Get the pre-trained net:
NetModel parameters
This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:
Pick a non-default net by specifying the parameters:
Pick a non-default uninitialized net:
Evaluation function
Define the label list for this model:
Define helper utilities for netevaluate:
Write an evaluation function to estimate the locations of the objects and human keypoints:
Basic usage
Obtain the detected bounding boxes with their corresponding classes and confidences as well as the locations of human joints for a given image:
Inspect the prediction keys:
The "ObjectDetection" key contains the coordinates of the detected objects as well as its confidences and classes:
Inspect which classes are detected:
The "KeypointEstimation" key contains the locations of top predicted keypoints as well as their confidences for each person:
Inspect the predicted keypoint locations:
Visualize the keypoints:
Visualize the keypoints grouped by person:
Visualize the keypoints grouped by a keypoint type:
Define a function to combine the keypoints into a skeleton shape:
Visualize the pose keypoints, object detections and human skeletons:
Advanced visualization
Obtain the detected bounding boxes with their corresponding classes and confidences as well as the locations of human joints for a given image:
Visualize the pose keypoints, object detections and human skeletons. Note that some of the keypoints are misaligned:
Inspect the various effects of a radius defined by an optional parameter "NeighborhoodRadius":
Network object detection result
For the default input size of 512x512, the net produces 128x128 bounding boxes whose centers mostly follow a square grid. For each bounding box, the net produces the box size and the offset of the box center with respect to the square grid:
Change the coordinate system into a graphics domain:
Compute and visualize the box center positions:
Visualize the box center positions. They follow a square grid with offsets:
Compute the boxes' coordinates:
Define a function to rescale the box coordinates to the original image size:
Visualize all the boxes predicted by the net scaled by their "objectness" measures:
Visualize all the boxes scaled by the probability that they contain a dog:
Superimpose the cat prediction on top of the scaled input received by the net:
Heat map visualization
Every box is associated to a scalar strength value indicating the likelihood that the patch contains an object:
The strength of each patch is the maximal element aggregated across all classes. Obtain the strength of each patch:
Visualize the strength of each patch as a heat map:
Stretch and unpad the heat map to the original image domain:
Overlay the heat map on the image:
Obtain and visualize the strength of each patch for the "dog" class:
Overlay the heat map on the image:
Define a general function to visualize a heat map on an image:
Adapt to any size
Automatic image resizing can be avoided by replacing the NetEncoder. First get the NetEncoder:
Note that the NetEncoder resizes the image by keeping the aspect ratio and then pads the result to have a fixed shape of 512x512. Visualize the output of NetEncoder adjusting for brightness:
Create a new NetEncoder with the desired dimensions:
Attach the new NetEncoder:
Obtain the detected bounding boxes with their corresponding classes and confidences for a given image:
Visualize the detection:
Note that even though the localization results and the box confidences are slightly worse compared to the original net, the resized network runs significantly faster:
Net information
Inspect the number of parameters of all arrays in the net:
Obtain the total number of parameters:
Obtain the layer type counts:
Display the summary graphic: