RetinaNet-101 Feature Pyramid Net Trained on MS-COCO Data

Contributed by: Julian W. Francis

Detect and localize objects in an image

RetinaNet is a single-stage object detection model that goes straight from image pixels to bounding box coordinates and class probabilities. It is able to exceed the accuracy of the best two-stage detectors while offering comparable speed performance to that of the single-stage detectors. The model architecture is based on a Feature Pyramid Network on top of a feedforward ResNet-101 backbone. The model has been trained using a new loss function, "Focal Loss," which addresses the imbalance between foreground and background classes that arises within single-stage detectors.

Trained size: 337 MB |

Training Set Information

Microsoft COCO (http://mscoco.org), a dataset for image recognition, segmentation and captioning, consisting of more than 300,000 images overall consisting of 80 object classes.

Performance

This model achieves an AP (accuracy) of 37.8% on the MS-COCO test set.

Examples

Download Example Notebook

Open in Wolfram Cloud

Resource retrieval

Get the pre-trained net:

In[1]:=

Out[1]=

Label list

Define the label list for this model. Integers in the model's output correspond to elements in the label list:

In[2]:=

Evaluation function

Write an evaluation function to scale the result to the input image size and suppress the least probable detections:

In[3]:=

$nonMaxSuppression[overlapThreshold_][detection_] := Module[{boxes, confidence}, Fold[{list, new} |-> If[NoneTrue[list[[All, 1]], IoU[#, new[[1]]] > overlapThreshold &], Append[list, new], list], Sequence @@ TakeDrop[Reverse@SortBy[detection, Last], 1]]] ClearAll[IoU] IoU := IoU = With[{c = Compile[{{box1, _Real, 2}, {box2, _Real, 2}}, Module[{area1, area2, x1, y1, x2, y2, w, h, int}, area1 = (box1[[2, 1]] - box1[[1, 1]]) (box1[[2, 2]] - box1[[1, 2]]); area2 = (box2[[2, 1]] - box2[[1, 1]]) (box2[[2, 2]] - box2[[1, 2]]); x1 = Max[box1[[1, 1]], box2[[1, 1]]]; y1 = Max[box1[[1, 2]], box2[[1, 2]]]; x2 = Min[box1[[2, 1]], box2[[2, 1]]]; y2 = Min[box1[[2, 2]], box2[[2, 2]]]; w = Max[0., x2 - x1]; h = Max[0., y2 - y1]; int = w*h; int/(area1 + area2 - int)], RuntimeAttributes -> {Listable}, Parallelization -> True, RuntimeOptions -> "Speed"]}, c @@ Replace[{##}, Rectangle -> List, Infinity, Heads -> True] &]$

In[4]:=

$netevaluate[ img_Image, detectionThreshold_ : .6, overlapThreshold_ : .45 ] := Module[{netOutputDecoder, net, imageConformer, deconformRectangles, detectionsDeconformer}, imageConformer[dims_, fitting_][image_] := First@ConformImages[{image}, dims, fitting, Padding -> 0.5]; deconformRectangles[{}, _, _, _] := {}; deconformRectangles[rboxes_List, image_Image, netDims_List, "Fit"] := With[{netAspectRatio = netDims[[2]]/netDims[[1]]}, With[{boxes = Map[{#[[1]], #[[2]]} &, rboxes], padding = If[ImageAspectRatio[image] < netAspectRatio, {0, (ImageDimensions[image][[1]]* netAspectRatio - ImageDimensions[image][[2]])/ 2}, {(ImageDimensions[image][[2]]*(1/netAspectRatio) - ImageDimensions[image][[1]])/2, 0}], scale = If[ImageAspectRatio[image] < netAspectRatio, ImageDimensions[image][[1]]/netDims[[1]], ImageDimensions[image][[2]]/netDims[[2]]]}, Map[Rectangle[Round[#[[1]]], Round[#[[2]]]] &, Transpose[ Transpose[boxes, {2, 3, 1}]*scale - padding, {3, 1, 2}]]]]; detectionsDeconformer[ image_Image, netDims_List, fitting_String ][ objects_ ] := Transpose[ { deconformRectangles[ objects[[All, 1]], image, netDims, fitting ], objects[[All, 2]] } ]; netOutputDecoder[threshold_ : .5][netOutput_] := Module[{detections = Position[netOutput["ClassProb"], x_ /; x > threshold]}, Transpose[{Rectangle @@@ Extract[netOutput["Boxes"], detections[[All, 1 ;; 1]]], Extract[labels, detections[[All, 2 ;; 2]]], Extract[netOutput["ClassProb"], detections]}]]; net = NetModel[ "RetinaNet-101 Feature Pyramid Net Trained on MS-COCO Data"]; (Flatten[ nonMaxSuppression[ overlapThreshold ] /@ GatherBy[#, #[[2]] &], 1] &)@detectionsDeconformer[img, {1152, 896}, "Fit" ]@ netOutputDecoder[ detectionThreshold ]@ net@imageConformer[{1152, 896}, "Fit"]@img ]$

Basic usage

Obtain the detected bounding boxes with their corresponding classes and confidences for a given image:

In[5]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/841876e5-9610-47c2-bf85-f3d2e52332fc"]

In[6]:=

Out[6]=

Inspect which classes are detected:

In[7]:=

Out[7]=

Visualize the detection:

In[8]:=

HighlightImage[testImage, {#[[1]], Text[Style[#[[2]], White, 12], {20, 20} + #[[1, 1]], Background -> Transparent]} & /@ detection]

Out[8]=

Advanced visualization

Write a function to apply a custom styling to the result of the detection:

In[9]:=

$styleDetection[ detection_] := {RandomColor[], {#[[1]], Text[Style[#[[2]], White, 12], {20, 20} + #[[1, 1]], Background -> Black]} & /@ #} & /@ GatherBy[detection, #[[2]] &]$

In[10]:=

Out[10]=

Net information

Inspect the number of parameters of all arrays in the net:

In[11]:=

$Information[ NetModel[NetModel[ "RetinaNet-101 Feature Pyramid Net Trained on MS-COCO Data"] ], \ "ArraysElementCounts"]$

Out[25]=

Obtain the total number of parameters:

In[26]:=

$Information[ NetModel["RetinaNet-101 Feature Pyramid Net Trained on MS-COCO \ Data"], "ArraysTotalElementCount"]$

Out[26]=

Obtain the layer type counts:

In[27]:=

$Information[ NetModel["RetinaNet-101 Feature Pyramid Net Trained on MS-COCO \ Data"], "LayerTypeCounts"]$

Out[27]=

Display the summary graphic:

In[28]:=

$Information[ NetModel["RetinaNet-101 Feature Pyramid Net Trained on MS-COCO \ Data"], "SummaryGraphic"]$

Out[28]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[29]:=

$jsonPath = Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], NetModel["RetinaNet-101 Feature Pyramid Net Trained on MS-COCO \ Data"], "MXNet"]$

Out[29]=

Export also creates a net.params file containing parameters:

In[30]:=

Out[30]=

Get the size of the parameter file:

In[31]:=

Out[31]=

The size is similar to the byte count of the resource object:

In[32]:=

$ResourceObject[ "RetinaNet-101 Feature Pyramid Net Trained on MS-COCO \ Data"]["ByteCount"]$

Out[32]=

Construction Notebook

Download Construction Notebook

Open in Wolfram Cloud

Requirements

Wolfram Language 12.1 (March 2020) or above

Resource History

Date Created: 19 May 2020

Reference

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, "Focal Loss for Dense Object Detection," Proceedings of the IEEE, 2999–3007 (2017)
Available from: https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md
Rights: Apache 2.0 License