Wolfram Computation Meets Knowledge

Multi-scale Context Aggregation Net Trained on Cityscapes Data

Segment an image of a driving scenario into semantic component classes

Released in 2016, this is the first model featuring a systematic use of dilated convolutions for pixel-wise classification. A context aggregation module featuring convolutions with exponentially increasing dilations is appended to a VGG-style front end.

Number of layers: 58 | Parameter count: 134,460,595 | Trained size: 538 MB

Training Set Information

Performance

Examples

Resource retrieval

Retrieve the resource object:

In[1]:=
ResourceObject["Multi-scale Context Aggregation Net Trained on \
Cityscapes Data"]
Out[1]=

Get the pre-trained net:

In[2]:=
NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"]
Out[2]=

Evaluation function

Write an evaluation function to handle padding and tiling of the input image:

In[3]:=
netevaluate[img_, device_: "CPU"] := Block[
  {net, marginImg, inputSize, windowSize, imgPad, imgSize, takeSpecs, 
   tiles, marginTile, prob},
  (* Parameters *)
  
  net = NetModel[
    "Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"];
  marginImg = 186;
  inputSize = 1396;
  windowSize = inputSize - 2*marginImg;
  (* Pad and tile input *)
  
  imgPad = ImagePad[img, marginImg, "Reflected"];
  imgSize = ImageDimensions[imgPad];
  takeSpecs = Table[
    {{i, i + inputSize - 1}, {j, j + inputSize - 1}},
    {i, 1, imgSize[[2]] - 2*marginImg, windowSize},
    {j, 1, imgSize[[1]] - 2*marginImg, windowSize}
    ];
  tiles = Map[ImageTake[imgPad, Sequence @@ #] &, takeSpecs, {2}];
  (* Make all tiles 1396x1396 *)
  
  marginTile = windowSize - Mod[imgSize - 2*marginImg, windowSize];
  tiles = 
   MapAt[ImagePad[#, {{0, marginTile[[1]]}, {0, 0}}, "Reflected"] &, 
    tiles, {All, -1}];
  tiles = 
   MapAt[ImagePad[#, {{0, 0}, {marginTile[[2]], 0}}, "Reflected"] &, 
    tiles, {-1, All}];
  (* Run net on tiles *)
  
  prob = net[Flatten@tiles, None, TargetDevice -> device];
  prob = ArrayFlatten@
    ArrayReshape[prob, Join[Dimensions@tiles, {1024, 1024, 19}]];
  (* Trim additional tile margin *)
  
  prob = Take[prob, Sequence @@ Reverse[ImageDimensions@img], All];
  (* Predict classes *)
  NetExtract[net, "Output"]@prob
  ]

Label list

Define the label list for this model. Integers in the model’s output correspond to elements in the label list:

In[4]:=
labels = {"road", "sidewalk", "building", "wall", "fence", "pole", 
   "traffic light", "traffic sign", "vegetation", "terrain", "sky", 
   "person", "rider", "car", "truck", "bus", "train", "motorcycle", 
   "bicycle"};

Basic usage

Obtain a segmentation mask for a given image:

In[5]:=
CloudGet["https://www.wolframcloud.com/objects/f691e775-d97f-4c4a-a32d-ab43f0114ed7"] (* Evaluate this cell to copy the example input from a cloud object *)

Inspect which classes are detected:

In[6]:=
detected = DeleteDuplicates@Flatten@mask
Out[6]=
In[7]:=
labels[[detected]]
Out[7]=

Visualize the mask:

In[8]:=
Colorize[mask]
Out[8]=

Advanced visualization

Associate classes to colors using the standard Cityscapes palette:

In[9]:=
colors = Apply[
  RGBColor, {{128, 64, 128}, {244, 35, 232}, {70, 70, 70}, {102, 102, 
     156}, {190, 153, 153}, {153, 153, 153}, {250, 170, 30}, {220, 
     220, 0}, {107, 142, 35}, {152, 251, 152}, {70, 130, 180}, {220, 
     20, 60}, {255, 0, 0}, {0, 0, 142}, {0, 0, 70}, {0, 60, 100}, {0, 
     80, 100}, {0, 0, 230}, {119, 11, 32}}/255., {1}]
Out[9]=
In[10]:=
indexToColor = Thread[Range[19] -> colors];

Write a function to overlap the image and the mask with a legend:

In[11]:=
result[img_, device_: "CPU"] := Block[
  {mask, classes, maskPlot, composition},
  mask = netevaluate[img, device];
  classes = DeleteDuplicates[Flatten@mask];
  maskPlot = Colorize[mask, ColorRules -> indexToColor];
  composition = ImageCompose[img, {maskPlot, 0.5}];
  Legended[
   Row[Image[#, ImageSize -> Large] & /@ {maskPlot, composition}], 
   SwatchLegend[indexToColor[[classes, 2]], labels[[classes]]]]
  ]

Inspect the results:

In[12]:=
CloudGet["https://www.wolframcloud.com/objects/166663cf-7781-4296-9609-4df11687d6f7"] (* Evaluate this cell to copy the example input from a cloud object *)
Out[12]=
In[13]:=
CloudGet["https://www.wolframcloud.com/objects/5bca17c4-8068-4c0b-bccc-10ce051051a1"] (* Evaluate this cell to copy the example input from a cloud object *)
Out[13]=
In[14]:=
CloudGet["https://www.wolframcloud.com/objects/611cdb1d-0b63-440c-8155-51478912c39a"] (* Evaluate this cell to copy the example input from a cloud object *)
Out[14]=

Net information

Inspect the number of parameters of all arrays in the net:

In[15]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"], "ArraysElementCounts"]
Out[15]=

Obtain the total number of parameters:

In[16]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"], "ArraysTotalElementCount"]
Out[16]=

Obtain the layer type counts:

In[17]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"], "LayerTypeCounts"]
Out[17]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[18]:=
jsonPath = 
 Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], 
  NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"], "MXNet"]
Out[18]=

Export also creates a net.params file containing parameters:

In[19]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[19]=

Get the size of the parameter file:

In[20]:=
FileByteCount[paramPath]
Out[20]=

The size is similar to the byte count of the resource object:

In[21]:=
ResourceObject[
  "Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"]["ByteCount"]
Out[21]=

Represent the MXNet net as a graph:

In[22]:=
Import[jsonPath, {"MXNet", "NodeGraphPlot"}]
Out[22]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Reference