Multi-scale Context Aggregation Net Trained on Cityscapes Data

Segment an image of a driving scenario into semantic component classes

Released in 2016, this is the first model featuring a systematic use of dilated convolutions for pixel-wise classification. A context aggregation module featuring convolutions with exponentially increasing dilations is appended to a VGG-style front end.

Number of layers: 58 | Parameter count: 134,460,595 | Trained size: 538 MB |

Training Set Information

Performance

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"]
Out[1]=

Evaluation function

Write an evaluation function to handle padding and tiling of the input image:

In[2]:=
netevaluate[img_, device_ : "CPU"] := Block[
  {net, marginImg, inputSize, windowSize, imgPad, imgSize, takeSpecs, tiles, marginTile, prob},
  (* Parameters *) net = NetModel[
    "Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"];
  marginImg = 186;
  inputSize = 1396;
  windowSize = inputSize - 2*marginImg;
  (* Pad and tile input *) imgPad = ImagePad[img, marginImg, "Reflected"];
  imgSize = ImageDimensions[imgPad];
  takeSpecs = Table[
    {{i, i + inputSize - 1}, {j, j + inputSize - 1}},
    {i, 1, imgSize[[2]] - 2*marginImg, windowSize},
    {j, 1, imgSize[[1]] - 2*marginImg, windowSize}
    ];
  tiles = Map[ImageTake[imgPad, Sequence @@ #] &, takeSpecs, {2}];
  (* Make all tiles 1396x1396 *) marginTile = windowSize - Mod[imgSize - 2*marginImg, windowSize];
  tiles = MapAt[ImagePad[#, {{0, marginTile[[1]]}, {0, 0}}, "Reflected"] &, tiles, {All, -1}];
  tiles = MapAt[ImagePad[#, {{0, 0}, {marginTile[[2]], 0}}, "Reflected"] &, tiles, {-1, All}];
  (* Run net on tiles *) prob = net[Flatten@tiles, None, TargetDevice -> device];
  prob = ArrayFlatten@
    ArrayReshape[prob, Join[Dimensions@tiles, {1024, 1024, 19}]];
  (* Trim additional tile margin *) prob = Take[prob, Sequence @@ Reverse[ImageDimensions@img], All];
  (* Predict classes *)
  NetExtract[net, "Output"]@prob
  ]

Label list

Define the label list for this model. Integers in the model’s output correspond to elements in the label list:

In[3]:=
labels = {"road", "sidewalk", "building", "wall", "fence", "pole", "traffic light", "traffic sign", "vegetation", "terrain", "sky", "person", "rider", "car", "truck", "bus", "train", "motorcycle", "bicycle"};

Basic usage

Obtain a segmentation mask for a given image:

In[4]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/af44f3b3-06e6-4a26-a6fb-a3e9c55ba5bf"]

Inspect which classes are detected:

In[5]:=
detected = DeleteDuplicates@Flatten@mask
Out[5]=
In[6]:=
labels[[detected]]
Out[6]=

Visualize the mask:

In[7]:=
Colorize[mask]
Out[7]=

Advanced visualization

Associate classes to colors using the standard Cityscapes palette:

In[8]:=
colors = Apply[
  RGBColor, {{128, 64, 128}, {244, 35, 232}, {70, 70, 70}, {102, 102, 156}, {190, 153, 153}, {153, 153, 153}, {250, 170, 30}, {220, 220, 0}, {107, 142, 35}, {152, 251, 152}, {70, 130, 180}, {220, 20, 60}, {255, 0, 0}, {0, 0, 142}, {0, 0, 70}, {0, 60, 100}, {0, 80, 100}, {0, 0, 230}, {119, 11, 32}}/255., {1}]
Out[8]=
In[9]:=
indexToColor = Thread[Range[19] -> colors];

Write a function to overlap the image and the mask with a legend:

In[10]:=
result[img_, device_ : "CPU"] := Block[
  {mask, classes, maskPlot, composition},
  mask = netevaluate[img, device];
  classes = DeleteDuplicates[Flatten@mask];
  maskPlot = Colorize[mask, ColorRules -> indexToColor];
  composition = ImageCompose[img, {maskPlot, 0.5}];
  Legended[
   Row[Image[#, ImageSize -> Large] & /@ {maskPlot, composition}], SwatchLegend[indexToColor[[classes, 2]], labels[[classes]]]]
  ]

Inspect the results:

In[11]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/8bc73a3e-85f3-41b9-ba3f-5bcc30d5c51a"]
Out[11]=
In[12]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/2ccc2de2-f781-4451-9519-d230aa76f764"]
Out[12]=
In[13]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/62d0a32e-c5f7-4499-a7d1-1e6a3de95bb6"]
Out[13]=

Net information

Inspect the number of parameters of all arrays in the net:

In[14]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"], "ArraysElementCounts"]
Out[14]=

Obtain the total number of parameters:

In[15]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"], "ArraysTotalElementCount"]
Out[15]=

Obtain the layer type counts:

In[16]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"], "LayerTypeCounts"]
Out[16]=

Display the summary graphic:

In[17]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"], "SummaryGraphic"]
Out[17]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[18]:=
jsonPath = Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"], "MXNet"]
Out[18]=

Export also creates a net.params file containing parameters:

In[19]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[19]=

Get the size of the parameter file:

In[20]:=
FileByteCount[paramPath]
Out[20]=

The size is similar to the byte count of the resource object:

In[21]:=
ResourceObject[
  "Multi-scale Context Aggregation Net Trained on Cityscapes \
Data"]["ByteCount"]
Out[21]=

Represent the MXNet net as a graph:

In[22]:=
Import[jsonPath, {"MXNet", "NodeGraphPlot"}]
Out[22]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Resource History

Reference