Wolfram Computation Meets Knowledge

Multi-scale Context Aggregation Net Trained on CamVid Data

Segment an image of a driving scenario into semantic component classes

Released in 2016, this is the first model featuring a systematic use of dilated convolutions for pixel-wise classification. A context aggregation module featuring convolutions with exponentially increasing dilations is appended to a VGG-style front end.

Number of layers: 53 | Parameter count: 134,313,443 | Trained size: 537 MB

Training Set Information

Performance

Examples

Resource retrieval

Retrieve the resource object:

In[1]:=
ResourceObject["Multi-scale Context Aggregation Net Trained on CamVid \
Data"]
Out[1]=

Get the pre-trained net:

In[2]:=
NetModel["Multi-scale Context Aggregation Net Trained on CamVid Data"]
Out[2]=

Evaluation function

Write an evaluation function to handle padding and tiling of the input image:

In[3]:=
netevaluate[img_, device_: "CPU"] := Block[
  {net, marginImg, inputSize, windowSize, zoom, imgPad, imgSize, 
   takeSpecs, tiles, marginTile, prob},
  (* Parameters *)
  
  net = NetModel[
    "Multi-scale Context Aggregation Net Trained on CamVid Data"];
  marginImg = 186;
  inputSize = {900, 1100};
  zoom = 8;
  windowSize = inputSize - 2*marginImg;
  (* Pad and tile input *)
  
  imgPad = ImagePad[img, marginImg, "Reflected"];
  imgSize = ImageDimensions[imgPad];
  takeSpecs = Table[
    {{i, i + inputSize[[1]] - 1}, {j, j + inputSize[[2]] - 1}},
    {i, 1, imgSize[[2]] - 2*marginImg, windowSize[[1]]},
    {j, 1, imgSize[[1]] - 2*marginImg, windowSize[[1]]}
    ];
  tiles = Map[ImageTake[imgPad, Sequence @@ #] &, takeSpecs, {2}];
  (* Make all tiles 900x1100 *)
  
  marginTile = 
   Reverse[windowSize] - 
    Mod[imgSize - 2*marginImg, Reverse@windowSize];
  tiles = 
   MapAt[ImagePad[#, {{0, marginTile[[1]]}, {0, 0}}, "Reflected"] &, 
    tiles, {All, -1}];
  tiles = 
   MapAt[ImagePad[#, {{0, 0}, {marginTile[[2]], 0}}, "Reflected"] &, 
    tiles, {-1, All}];
  (* Run net on tiles *)
  
  prob = net[Flatten@tiles, None, TargetDevice -> device];
  prob = ArrayFlatten@
    ArrayReshape[prob, Join[Dimensions@tiles, {66, 91, 11}]];
  (* Resample probs by zoom factor and trim additional tile margin *)
    prob = ArrayResample[prob, Dimensions[prob]*{zoom, zoom, 1}, 
    Resampling -> "Linear"];
  prob = Take[prob, Sequence @@ Reverse[ImageDimensions@img], All];
  (* Predict classes *)
  NetExtract[net, "Output"]@prob
  ]

Label list

Define the label list for this model. Integers in the model’s output correspond to elements in the label list:

In[4]:=
labels = {"building", "tree", "sky", "car", "sign, symbol", "road", 
   "pedestrian", "fence", "column, pole", "sidewalk", "bicycle"};

Basic usage

Obtain a segmentation mask for a given image:

In[5]:=
CloudGet["https://www.wolframcloud.com/objects/0bd063c5-ad92-4f34-869c-aa840771c29f"] (* Evaluate this cell to copy the example input from a cloud object *)

Inspect which classes are detected:

In[6]:=
detected = DeleteDuplicates@Flatten@mask
Out[6]=
In[7]:=
labels[[detected]]
Out[7]=

Visualize the mask:

In[8]:=
Colorize[mask]
Out[8]=

Advanced visualization

Associate classes to colors:

In[9]:=
colors = Apply[
  RGBColor, {{70, 70, 70}, {107, 142, 35}, {70, 130, 180}, {0, 0, 
     142}, {220, 220, 0}, {128, 64, 128}, {220, 20, 60}, {190, 153, 
     153}, {250, 170, 30}, {244, 35, 232}, {119, 11, 32}}/255., {1}]
Out[9]=
In[10]:=
indexToColor = Thread[Range[11] -> colors];

Write a function to overlap the image and the mask with a legend:

In[11]:=
result[img_, device_: "CPU"] := Block[
  {mask, classes, maskPlot, composition},
  mask = netevaluate[img, device];
  classes = DeleteDuplicates[Flatten@mask];
  maskPlot = Colorize[mask, ColorRules -> indexToColor];
  composition = ImageCompose[img, {maskPlot, 0.5}];
  Legended[
   Row[Image[#, ImageSize -> Large] & /@ {maskPlot, composition}], 
   SwatchLegend[indexToColor[[classes, 2]], labels[[classes]]]]
  ]

Inspect the results:

In[12]:=
CloudGet["https://www.wolframcloud.com/objects/741fd3c1-40cb-456c-a51c-e0f403c92705"] (* Evaluate this cell to copy the example input from a cloud object *)
Out[12]=
In[13]:=
CloudGet["https://www.wolframcloud.com/objects/9b02ab5a-455a-4087-8432-f04868a40bf2"] (* Evaluate this cell to copy the example input from a cloud object *)
Out[13]=

Net information

Inspect the number of parameters of all arrays in the net:

In[14]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on CamVid \
Data"], "ArraysElementCounts"]
Out[14]=

Obtain the total number of parameters:

In[15]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on CamVid \
Data"], "ArraysTotalElementCount"]
Out[15]=

Obtain the layer type counts:

In[16]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on CamVid \
Data"], "LayerTypeCounts"]
Out[16]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[17]:=
jsonPath = 
 Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], 
  NetModel["Multi-scale Context Aggregation Net Trained on CamVid \
Data"], "MXNet"]
Out[17]=

Export also creates a net.params file containing parameters:

In[18]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[18]=

Get the size of the parameter file:

In[19]:=
FileByteCount[paramPath]
Out[19]=

The size is similar to the byte count of the resource object:

In[20]:=
ResourceObject[
  "Multi-scale Context Aggregation Net Trained on PASCAL VOC2012 \
Data"]["ByteCount"]
Out[20]=

Represent the MXNet net as a graph:

In[21]:=
Import[jsonPath, {"MXNet", "NodeGraphPlot"}]
Out[21]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Resource History

Reference