Multi-scale Context Aggregation Net Trained on CamVid Data

Segment an image of a driving scenario into semantic component classes

Released in 2016, this is the first model featuring a systematic use of dilated convolutions for pixel-wise classification. A context aggregation module featuring convolutions with exponentially increasing dilations is appended to a VGG-style front end.

Number of layers: 53 | Parameter count: 134,313,443 | Trained size: 537 MB |

Training Set Information

Cambridge-driving Labeled Video Database (CamVid), a collection of videos of driving scenarios with pixel-wise labels for 32 classes.

Performance

This model achieves a 65.3% IoU accuracy on the Cambridge-driving Labeled Video Database (CamVid) dataset.

Examples

Download Example Notebook

Open in Wolfram Cloud

Resource retrieval

Get the pre-trained net:

In[1]:=

Out[1]=

Evaluation function

Write an evaluation function to handle padding and tiling of the input image:

In[2]:=

$netevaluate[img_, device_ : "CPU"] := Block[ {net, marginImg, inputSize, windowSize, zoom, imgPad, imgSize, takeSpecs, tiles, marginTile, prob}, (* Parameters *) net = NetModel[ "Multi-scale Context Aggregation Net Trained on CamVid Data"]; marginImg = 186; inputSize = {900, 1100}; zoom = 8; windowSize = inputSize - 2*marginImg; (* Pad and tile input *) imgPad = ImagePad[img, marginImg, "Reflected"]; imgSize = ImageDimensions[imgPad]; takeSpecs = Table[ {{i, i + inputSize[[1]] - 1}, {j, j + inputSize[[2]] - 1}}, {i, 1, imgSize[[2]] - 2*marginImg, windowSize[[1]]}, {j, 1, imgSize[[1]] - 2*marginImg, windowSize[[1]]} ]; tiles = Map[ImageTake[imgPad, Sequence @@ #] &, takeSpecs, {2}]; (* Make all tiles 900x1100 *) marginTile = Reverse[windowSize] - Mod[imgSize - 2*marginImg, Reverse@windowSize]; tiles = MapAt[ImagePad[#, {{0, marginTile[[1]]}, {0, 0}}, "Reflected"] &, tiles, {All, -1}]; tiles = MapAt[ImagePad[#, {{0, 0}, {marginTile[[2]], 0}}, "Reflected"] &, tiles, {-1, All}]; (* Run net on tiles *) prob = net[Flatten@tiles, None, TargetDevice -> device]; prob = ArrayFlatten@ ArrayReshape[prob, Join[Dimensions@tiles, {66, 91, 11}]]; (* Resample probs by zoom factor and trim additional tile margin *) prob = ArrayResample[prob, Dimensions[prob]*{zoom, zoom, 1}, Resampling -> "Linear"]; prob = Take[prob, Sequence @@ Reverse[ImageDimensions@img], All]; (* Predict classes *) NetExtract[net, "Output"]@prob ]$

Label list

Define the label list for this model. Integers in the model’s output correspond to elements in the label list:

In[3]:=

Basic usage

Obtain a segmentation mask for a given image:

In[4]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/eba0964d-1078-4a01-b7b1-ea8b9afee76c"]

Inspect which classes are detected:

In[5]:=

Out[5]=

In[6]:=

Out[6]=

Visualize the mask:

In[7]:=

Out[7]=

Advanced visualization

Associate classes to colors:

In[8]:=

colors = Apply[
RGBColor, {{70, 70, 70}, {107, 142, 35}, {70, 130, 180}, {0, 0, 142}, {220, 220, 0}, {128, 64, 128}, {220, 20, 60}, {190, 153, 153}, {250, 170, 30}, {244, 35, 232}, {119, 11, 32}}/255., {1}]

Out[8]=

In[9]:=

Write a function to overlap the image and the mask with a legend:

In[10]:=

$result[img_, device_ : "CPU"] := Block[ {mask, classes, maskPlot, composition}, mask = netevaluate[img, device]; classes = DeleteDuplicates[Flatten@mask]; maskPlot = Colorize[mask, ColorRules -> indexToColor]; composition = ImageCompose[img, {maskPlot, 0.5}]; Legended[ Row[Image[#, ImageSize -> Large] & /@ {maskPlot, composition}], SwatchLegend[indexToColor[[classes, 2]], labels[[classes]]]] ]$

Inspect the results:

In[11]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/77c48e7d-651a-47dc-b45a-3955ef330620"]

Out[11]=

In[12]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/ef029a30-5ea5-40b0-938a-0186d21dded0"]

Out[12]=

Net information

Inspect the number of parameters of all arrays in the net:

In[13]:=

$NetInformation[ NetModel["Multi-scale Context Aggregation Net Trained on CamVid \ Data"], "ArraysElementCounts"]$

Out[13]=

Obtain the total number of parameters:

In[14]:=

$NetInformation[ NetModel["Multi-scale Context Aggregation Net Trained on CamVid \ Data"], "ArraysTotalElementCount"]$

Out[14]=

Obtain the layer type counts:

In[15]:=

$NetInformation[ NetModel["Multi-scale Context Aggregation Net Trained on CamVid \ Data"], "LayerTypeCounts"]$

Out[15]=

Display the summary graphic:

In[16]:=

$NetInformation[ NetModel["Multi-scale Context Aggregation Net Trained on CamVid \ Data"], "SummaryGraphic"]$

Out[16]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[17]:=

$jsonPath = Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], NetModel["Multi-scale Context Aggregation Net Trained on CamVid \ Data"], "MXNet"]$

Out[17]=

Export also creates a net.params file containing parameters:

In[18]:=

Out[18]=

Get the size of the parameter file:

In[19]:=

Out[19]=

The size is similar to the byte count of the resource object:

In[20]:=

$ResourceObject[ "Multi-scale Context Aggregation Net Trained on PASCAL VOC2012 \ Data"]["ByteCount"]$

Out[20]=

Represent the MXNet net as a graph:

In[21]:=

Out[21]=

Construction Notebook

Download Construction Notebook

Open in Wolfram Cloud

Requirements

Wolfram Language 11.3 (March 2018) or above

Resource History

Date Created: 15 May 2018
Latest Update: 21 June 2018

Reference

F. Yu, V. Koltun, "Multi-scale Context Aggregation by Dilated Convolutions," arXiv:1511.07122 (2016)
Available from: https://github.com/fyu/dilation
Rights: MIT License