Multi-scale Context Aggregation Net Trained on Cityscapes Data

Segment an image of a driving scenario into semantic component classes

Released in 2016, this is the first model featuring a systematic use of dilated convolutions for pixel-wise classification. A context aggregation module featuring convolutions with exponentially increasing dilations is appended to a VGG-style front end.

Number of layers: 58 | Parameter count: 134,460,595 | Trained size: 538 MB |

Training Set Information

Cityscapes, a collection of 25,000 annotated images for semantic understanding of urban street scenes.

Performance

This model achieves a 67.1% IoU accuracy on the Cityscapes dataset.

Examples

Download Example Notebook

Open in Wolfram Cloud

Resource retrieval

Get the pre-trained net:

In[1]:=

$NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \ Data"]$

Out[1]=

Evaluation function

Write an evaluation function to handle padding and tiling of the input image:

In[2]:=

$netevaluate[img_, device_ : "CPU"] := Block[ {net, marginImg, inputSize, windowSize, imgPad, imgSize, takeSpecs, tiles, marginTile, prob}, (* Parameters *) net = NetModel[ "Multi-scale Context Aggregation Net Trained on Cityscapes \ Data"]; marginImg = 186; inputSize = 1396; windowSize = inputSize - 2*marginImg; (* Pad and tile input *) imgPad = ImagePad[img, marginImg, "Reflected"]; imgSize = ImageDimensions[imgPad]; takeSpecs = Table[ {{i, i + inputSize - 1}, {j, j + inputSize - 1}}, {i, 1, imgSize[[2]] - 2*marginImg, windowSize}, {j, 1, imgSize[[1]] - 2*marginImg, windowSize} ]; tiles = Map[ImageTake[imgPad, Sequence @@ #] &, takeSpecs, {2}]; (* Make all tiles 1396x1396 *) marginTile = windowSize - Mod[imgSize - 2*marginImg, windowSize]; tiles = MapAt[ImagePad[#, {{0, marginTile[[1]]}, {0, 0}}, "Reflected"] &, tiles, {All, -1}]; tiles = MapAt[ImagePad[#, {{0, 0}, {marginTile[[2]], 0}}, "Reflected"] &, tiles, {-1, All}]; (* Run net on tiles *) prob = net[Flatten@tiles, None, TargetDevice -> device]; prob = ArrayFlatten@ ArrayReshape[prob, Join[Dimensions@tiles, {1024, 1024, 19}]]; (* Trim additional tile margin *) prob = Take[prob, Sequence @@ Reverse[ImageDimensions@img], All]; (* Predict classes *) NetExtract[net, "Output"]@prob ]$

Label list

Define the label list for this model. Integers in the model’s output correspond to elements in the label list:

In[3]:=

labels = {"road", "sidewalk", "building", "wall", "fence", "pole", "traffic light", "traffic sign", "vegetation", "terrain", "sky", "person", "rider", "car", "truck", "bus", "train", "motorcycle", "bicycle"};

Basic usage

Obtain a segmentation mask for a given image:

In[4]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/af44f3b3-06e6-4a26-a6fb-a3e9c55ba5bf"]

Inspect which classes are detected:

In[5]:=

Out[5]=

In[6]:=

Out[6]=

Visualize the mask:

In[7]:=

Out[7]=

Advanced visualization

Associate classes to colors using the standard Cityscapes palette:

In[8]:=

colors = Apply[
RGBColor, {{128, 64, 128}, {244, 35, 232}, {70, 70, 70}, {102, 102, 156}, {190, 153, 153}, {153, 153, 153}, {250, 170, 30}, {220, 220, 0}, {107, 142, 35}, {152, 251, 152}, {70, 130, 180}, {220, 20, 60}, {255, 0, 0}, {0, 0, 142}, {0, 0, 70}, {0, 60, 100}, {0, 80, 100}, {0, 0, 230}, {119, 11, 32}}/255., {1}]

Out[8]=

In[9]:=

Write a function to overlap the image and the mask with a legend:

In[10]:=

$result[img_, device_ : "CPU"] := Block[ {mask, classes, maskPlot, composition}, mask = netevaluate[img, device]; classes = DeleteDuplicates[Flatten@mask]; maskPlot = Colorize[mask, ColorRules -> indexToColor]; composition = ImageCompose[img, {maskPlot, 0.5}]; Legended[ Row[Image[#, ImageSize -> Large] & /@ {maskPlot, composition}], SwatchLegend[indexToColor[[classes, 2]], labels[[classes]]]] ]$

Inspect the results:

In[11]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/8bc73a3e-85f3-41b9-ba3f-5bcc30d5c51a"]

Out[11]=

In[12]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/2ccc2de2-f781-4451-9519-d230aa76f764"]

Out[12]=

In[13]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/62d0a32e-c5f7-4499-a7d1-1e6a3de95bb6"]

Out[13]=

Net information

Inspect the number of parameters of all arrays in the net:

In[14]:=

$NetInformation[ NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \ Data"], "ArraysElementCounts"]$

Out[14]=

Obtain the total number of parameters:

In[15]:=

$NetInformation[ NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \ Data"], "ArraysTotalElementCount"]$

Out[15]=

Obtain the layer type counts:

In[16]:=

$NetInformation[ NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \ Data"], "LayerTypeCounts"]$

Out[16]=

Display the summary graphic:

In[17]:=

$NetInformation[ NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \ Data"], "SummaryGraphic"]$

Out[17]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[18]:=

$jsonPath = Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], NetModel["Multi-scale Context Aggregation Net Trained on Cityscapes \ Data"], "MXNet"]$

Out[18]=

Export also creates a net.params file containing parameters:

In[19]:=

Out[19]=

Get the size of the parameter file:

In[20]:=

Out[20]=

The size is similar to the byte count of the resource object:

In[21]:=

$ResourceObject[ "Multi-scale Context Aggregation Net Trained on Cityscapes \ Data"]["ByteCount"]$

Out[21]=

Represent the MXNet net as a graph:

In[22]:=

Out[22]=

Construction Notebook

Download Construction Notebook

Open in Wolfram Cloud

Requirements

Wolfram Language 11.3 (March 2018) or above

Resource History

Date Created: 15 May 2018
Latest Update: 21 June 2018

Reference

F. Yu, V. Koltun, "Multi-scale Context Aggregation by Dilated Convolutions," arXiv:1511.07122 (2016)
Available from: https://github.com/fyu/dilation
Rights: MIT License