Multi-scale Context Aggregation Net Trained on PASCAL VOC2012 Data

Segment an image into various semantic component classes

Released in 2016, this is the first model featuring a systematic use of dilated convolutions for pixel-wise classification. A context aggregation module featuring convolutions with exponentially increasing dilations is appended to a VGG-style front end.

Number of layers: 53 | Parameter count: 141,149,720 | Trained size: 565 MB |

Training Set Information

Performance

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["Multi-scale Context Aggregation Net Trained on PASCAL \
VOC2012 Data"]
Out[1]=

Evaluation function

Write an evaluation function to handle padding and tiling of the input image:

In[2]:=
netevaluate[img_, device_ : "CPU"] := Block[
  {net, marginImg, inputSize, windowSize, zoom, imgPad, imgSize, takeSpecs, tiles, marginTile, prob},
  (* Parameters *) net = NetModel[
    "Multi-scale Context Aggregation Net Trained on PASCAL VOC2012 \
Data"];
  marginImg = 186;
  inputSize = 900;
  zoom = 8;
  windowSize = inputSize - 2*marginImg;
  (* Pad and tile input *) imgPad = ImagePad[img, marginImg, "Reflected"];
  imgSize = ImageDimensions[imgPad];
  takeSpecs = Table[
    {{i, i + inputSize - 1}, {j, j + inputSize - 1}},
    {i, 1, imgSize[[2]] - 2*marginImg, windowSize},
    {j, 1, imgSize[[1]] - 2*marginImg, windowSize}
    ];
  tiles = Map[ImageTake[imgPad, Sequence @@ #] &, takeSpecs, {2}];
  (* Make all tiles 900x900 *) marginTile = windowSize - Mod[imgSize - 2*marginImg, windowSize];
  tiles = MapAt[ImagePad[#, {{0, marginTile[[1]]}, {0, 0}}, "Reflected"] &, tiles, {All, -1}];
  tiles = MapAt[ImagePad[#, {{0, 0}, {marginTile[[2]], 0}}, "Reflected"] &, tiles, {-1, All}];
  (* Run net on tiles *) prob = net[Flatten@tiles, None, TargetDevice -> device];
  prob = ArrayFlatten@
    ArrayReshape[prob, Join[Dimensions@tiles, {66, 66, 21}]];
  (* Resample probs by zoom factor and trim additional tile margin *) prob = ArrayResample[prob, Dimensions[prob]*{zoom, zoom, 1}, Resampling -> "Linear"];
  prob = Take[prob, Sequence @@ Reverse[ImageDimensions@img], All];
  (* Predict classes *)
  NetExtract[net, "Output"]@prob
  ]

Label list

Define the label list for this model. Integers in the model’s output correspond to elements in the label list:

In[3]:=
labels = {"background", "airplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "table", "dog", "horse", "motorcycle", "person", "plant", "sheep", "sofa", "train",
    "television"};

Basic usage

Obtain a segmentation mask for a given image:

In[4]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/6eb96e08-03ce-4757-95e2-0e0688422a07"]

Inspect which classes are detected:

In[5]:=
detected = DeleteDuplicates@Flatten@mask
Out[5]=
In[6]:=
labels[[detected]]
Out[6]=

Visualize the mask:

In[7]:=
Colorize[mask]
Out[7]=

Advanced visualization

Associate classes to colors:

In[8]:=
indexToColor = Prepend[Thread[
    Range[2, 21] -> ColorData["Atoms", "ColorList"][[1 ;; 20]]], 1 -> Black];

Write a function to overlap the image and the mask with a legend:

In[9]:=
result[img_, device_ : "CPU"] := Block[
  {mask, classes, maskPlot, composition},
  mask = netevaluate[img, device];
  classes = DeleteDuplicates[Flatten@mask];
  maskPlot = Colorize[mask, ColorRules -> indexToColor];
  composition = ImageCompose[img, {maskPlot, 0.5}];
  Legended[
   Row[Image[#, ImageSize -> Large] & /@ {maskPlot, composition}], SwatchLegend[indexToColor[[classes, 2]], labels[[classes]]]]
  ]

Inspect the results:

In[10]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/fc0e1a53-11c1-4e9e-8475-e888794db737"]
Out[10]=
In[11]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/cbcaa152-6757-4247-b844-b821fd390f5c"]
Out[11]=
In[12]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/6b421a16-dd08-4fc8-9b9d-d46a7509f9ee"]
Out[12]=

Net information

Inspect the number of parameters of all arrays in the net:

In[13]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on PASCAL \
VOC2012 Data"], "ArraysElementCounts"]
Out[13]=

Obtain the total number of parameters:

In[14]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on PASCAL \
VOC2012 Data"], "ArraysTotalElementCount"]
Out[14]=

Obtain the layer type counts:

In[15]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on PASCAL \
VOC2012 Data"], "LayerTypeCounts"]
Out[15]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[16]:=
jsonPath = Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], NetModel["Multi-scale Context Aggregation Net Trained on PASCAL \
VOC2012 Data"], "MXNet"]
Out[16]=

Export also creates a net.params file containing parameters:

In[17]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[17]=

Get the size of the parameter file:

In[18]:=
FileByteCount[paramPath]
Out[18]=

The size is similar to the byte count of the resource object:

In[19]:=
ResourceObject[
  "Multi-scale Context Aggregation Net Trained on PASCAL VOC2012 \
Data"]["ByteCount"]
Out[19]=

Represent the MXNet net as a graph:

In[20]:=
Import[jsonPath, {"MXNet", "NodeGraphPlot"}]
Out[20]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Resource History

Reference