RetinaNet-101 Feature Pyramid Net
Trained on
MS-COCO Data
RetinaNet is a single-stage object detection model that goes straight from image pixels to bounding box coordinates and class probabilities. It is able to exceed the accuracy of the best two-stage detectors while offering comparable speed performance to that of the single-stage detectors. The model architecture is based on a Feature Pyramid Network on top of a feedforward ResNet-101 backbone. The model has been trained using a new loss function, "Focal Loss," which addresses the imbalance between foreground and background classes that arises within single-stage detectors.
Trained size: 337 MB |
Examples
Resource retrieval
Get the pre-trained net:
Label list
Define the label list for this model. Integers in the model's output correspond to elements in the label list:
Evaluation function
Write an evaluation function to scale the result to the input image size and suppress the least probable detections:
Basic usage
Obtain the detected bounding boxes with their corresponding classes and confidences for a given image:
Inspect which classes are detected:
Visualize the detection:
Advanced visualization
Write a function to apply a custom styling to the result of the detection:
Net information
Inspect the number of parameters of all arrays in the net:
Obtain the total number of parameters:
Obtain the layer type counts:
Display the summary graphic:
Export to MXNet
Export the net into a format that can be opened in MXNet:
Export also creates a net.params file containing parameters:
Get the size of the parameter file:
The size is similar to the byte count of the resource object:
Requirements
Wolfram Language
12.1
(March 2020)
or above
Resource History
Reference