Multi-scale Context Aggregation Net Trained on CamVid Data

Segment an image of a driving scenario into semantic component classes

Released in 2016, this is the first model featuring a systematic use of dilated convolutions for pixel-wise classification. A context aggregation module featuring convolutions with exponentially increasing dilations is appended to a VGG-style front end.

Number of layers: 53 | Parameter count: 134,313,443 | Trained size: 537 MB |

Training Set Information

Performance

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["Multi-scale Context Aggregation Net Trained on CamVid Data"]
Out[1]=

Evaluation function

Write an evaluation function to handle padding and tiling of the input image:

In[2]:=
netevaluate[img_, device_ : "CPU"] := Block[
  {net, marginImg, inputSize, windowSize, zoom, imgPad, imgSize, takeSpecs, tiles, marginTile, prob},
  (* Parameters *) net = NetModel[
    "Multi-scale Context Aggregation Net Trained on CamVid Data"];
  marginImg = 186;
  inputSize = {900, 1100};
  zoom = 8;
  windowSize = inputSize - 2*marginImg;
  (* Pad and tile input *) imgPad = ImagePad[img, marginImg, "Reflected"];
  imgSize = ImageDimensions[imgPad];
  takeSpecs = Table[
    {{i, i + inputSize[[1]] - 1}, {j, j + inputSize[[2]] - 1}},
    {i, 1, imgSize[[2]] - 2*marginImg, windowSize[[1]]},
    {j, 1, imgSize[[1]] - 2*marginImg, windowSize[[1]]}
    ];
  tiles = Map[ImageTake[imgPad, Sequence @@ #] &, takeSpecs, {2}];
  (* Make all tiles 900x1100 *) marginTile = Reverse[windowSize] - Mod[imgSize - 2*marginImg, Reverse@windowSize];
  tiles = MapAt[ImagePad[#, {{0, marginTile[[1]]}, {0, 0}}, "Reflected"] &, tiles, {All, -1}];
  tiles = MapAt[ImagePad[#, {{0, 0}, {marginTile[[2]], 0}}, "Reflected"] &, tiles, {-1, All}];
  (* Run net on tiles *) prob = net[Flatten@tiles, None, TargetDevice -> device];
  prob = ArrayFlatten@
    ArrayReshape[prob, Join[Dimensions@tiles, {66, 91, 11}]];
  (* Resample probs by zoom factor and trim additional tile margin *) prob = ArrayResample[prob, Dimensions[prob]*{zoom, zoom, 1}, Resampling -> "Linear"];
  prob = Take[prob, Sequence @@ Reverse[ImageDimensions@img], All];
  (* Predict classes *)
  NetExtract[net, "Output"]@prob
  ]

Label list

Define the label list for this model. Integers in the model’s output correspond to elements in the label list:

In[3]:=
labels = {"building", "tree", "sky", "car", "sign, symbol", "road", "pedestrian", "fence", "column, pole", "sidewalk", "bicycle"};

Basic usage

Obtain a segmentation mask for a given image:

In[4]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/eba0964d-1078-4a01-b7b1-ea8b9afee76c"]

Inspect which classes are detected:

In[5]:=
detected = DeleteDuplicates@Flatten@mask
Out[5]=
In[6]:=
labels[[detected]]
Out[6]=

Visualize the mask:

In[7]:=
Colorize[mask]
Out[7]=

Advanced visualization

Associate classes to colors:

In[8]:=
colors = Apply[
  RGBColor, {{70, 70, 70}, {107, 142, 35}, {70, 130, 180}, {0, 0, 142}, {220, 220, 0}, {128, 64, 128}, {220, 20, 60}, {190, 153, 153}, {250, 170, 30}, {244, 35, 232}, {119, 11, 32}}/255., {1}]
Out[8]=
In[9]:=
indexToColor = Thread[Range[11] -> colors];

Write a function to overlap the image and the mask with a legend:

In[10]:=
result[img_, device_ : "CPU"] := Block[
  {mask, classes, maskPlot, composition},
  mask = netevaluate[img, device];
  classes = DeleteDuplicates[Flatten@mask];
  maskPlot = Colorize[mask, ColorRules -> indexToColor];
  composition = ImageCompose[img, {maskPlot, 0.5}];
  Legended[
   Row[Image[#, ImageSize -> Large] & /@ {maskPlot, composition}], SwatchLegend[indexToColor[[classes, 2]], labels[[classes]]]]
  ]

Inspect the results:

In[11]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/77c48e7d-651a-47dc-b45a-3955ef330620"]
Out[11]=
In[12]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/ef029a30-5ea5-40b0-938a-0186d21dded0"]
Out[12]=

Net information

Inspect the number of parameters of all arrays in the net:

In[13]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on CamVid \
Data"], "ArraysElementCounts"]
Out[13]=

Obtain the total number of parameters:

In[14]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on CamVid \
Data"], "ArraysTotalElementCount"]
Out[14]=

Obtain the layer type counts:

In[15]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on CamVid \
Data"], "LayerTypeCounts"]
Out[15]=

Display the summary graphic:

In[16]:=
NetInformation[
 NetModel["Multi-scale Context Aggregation Net Trained on CamVid \
Data"], "SummaryGraphic"]
Out[16]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[17]:=
jsonPath = Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], NetModel["Multi-scale Context Aggregation Net Trained on CamVid \
Data"], "MXNet"]
Out[17]=

Export also creates a net.params file containing parameters:

In[18]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[18]=

Get the size of the parameter file:

In[19]:=
FileByteCount[paramPath]
Out[19]=

The size is similar to the byte count of the resource object:

In[20]:=
ResourceObject[
  "Multi-scale Context Aggregation Net Trained on PASCAL VOC2012 \
Data"]["ByteCount"]
Out[20]=

Represent the MXNet net as a graph:

In[21]:=
Import[jsonPath, {"MXNet", "NodeGraphPlot"}]
Out[21]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Resource History

Reference