# Wolfram Neural Net Repository

Immediate Computable Access to Neural Net Models

Detect and localize objects in an image

Released in 2019, this family of object detection models detects objects by their central point instead of directly computing axis-aligned boxes. The models exploit keypoint estimation techniques to find center point locations from generated heat maps and then regress the box sizes and offsets. The center locations are predicted per class, while box sizes and offsets are class agnostic. Compared to the anchor-based approaches, CenterNet does not suffer from the extremely large amounts of box candidates that require complicated labeling methods as well as additional post-processing like non-maximum suppression. Also, the detector does not make any implicit assumptions on the objects scales and aspect ratios, contrary to the other popular approaches that encode them in the anchors.

- Microsoft COCO, a dataset for image recognition, segmentation and captioning, consisting of more than three hundred thousand images in 80 different object classes.

Get the pre-trained net:

In[1]:= |

Out[1]= |

This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:

In[2]:= |

Out[2]= |

Pick a non-default net by specifying the parameters:

In[3]:= |

Out[3]= |

Pick a non-default uninitialized net:

In[4]:= |

Out[4]= |

Define the label list for this model:

In[5]:= |

Write an evaluation function to scale the result to the input image size and suppress the least probable detections:

In[6]:= |

Obtain the detected bounding boxes with their corresponding classes and confidences for a given image:

In[7]:= |

In[8]:= |

Out[8]= |

Inspect which classes are detected:

In[9]:= |

Out[9]= |

Visualize the detection:

In[10]:= |

Out[10]= |

For the default input size of 512x512, the net produces 128x128 bounding boxes whose centers mostly follow a square grid. For each bounding box, the net produces the box’s size and the offset of the box’s center with respect to the square grid:

In[11]:= |

In[12]:= |

Out[12]= |

Change coordinate system into a graphics domain:

In[13]:= |

Compute and visualize the box center positions:

In[14]:= |

Visualize the box center positions. They follow a square grid with offsets:

In[15]:= |

Out[15]= |

Compute the boxes coordinates:

In[16]:= |

In[17]:= |

Out[17]= |

Define a function to rescale the box coordinates to the original image size:

In[18]:= |

Visualize all the boxes predicted by the net scaled by their "*objectness*" measures:

In[19]:= |

Out[19]= |

Visualize all the boxes scaled by the probability that they contain a cat:

In[20]:= |

Out[20]= |

In[21]:= |

Out[21]= |

Superimpose the cat prediction on top of the scaled input received by the net:

In[22]:= |

Out[22]= |

Every box is associated to a scalar strength value indicating the likelihood that the patch contains an object:

In[23]:= |

In[24]:= |

Out[24]= |

The strength of each patch is the maximal element aggregated across all classes. Obtain the strength of each patch:

In[25]:= |

Out[26]= |

Visualize the strength of each patch as a heat map:

In[27]:= |

Out[27]= |

Stretch and unpad the heat map to the original image domain:

In[28]:= |

Out[28]= |

Overlay the heat map on the image:

In[29]:= |

Out[29]= |

Obtain and visualize the strength of each patch for the "cat" class:

In[30]:= |

Out[32]= |

Overlay the heat map on the image:

In[33]:= |

Out[33]= |

Automatic image resizing can be avoided by replacing the NetEncoder. First get the NetEncoder:

In[34]:= |

Out[34]= |

Note that the NetEncoder resizes the image by keeping the aspect ratio and then pads the result to have a fixed shape of 512x512. Visualize the output of NetEncoder adjusting for brightness:

In[35]:= |

Out[35]= |

Create a new NetEncoder with the desired dimensions:

In[36]:= |

Out[36]= |

Attach the new NetEncoder:

In[37]:= |

Out[37]= |

Obtain the detected bounding boxes with their corresponding classes and confidences for a given image:

In[38]:= |

Out[38]= |

Visualize the detection:

In[39]:= |

Out[39]= |

Note that even though the localization results and the box confidences are slightly worse compared to the original net, the resized network runs significantly faster:

In[40]:= |

Out[40]= |

In[41]:= |

Out[41]= |

Inspect the number of parameters of all arrays in the net:

In[42]:= |

Out[42]= |

Obtain the total number of parameters:

In[43]:= |

Out[43]= |

Obtain the layer type counts:

In[44]:= |

Out[44]= |

Display the summary graphic:

In[45]:= |

Out[45]= |

- X. Zhou, D. Wang, P. Krähenbühl, "Objects as Points," arXiv:1904.07850 (2019)
- Available from: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
- Rights: Copyright 2022 Google LLC. All rights reserved. Apache License 2.0