U2-Net Trained on DUTS-TR Data

Segment objects in an image

The architecture of this models features a two-level nesting of U structures where each node of the top-level UNet is a UNet itself. This design is able to capture more contextual information from different scales thanks to the mixture of receptive fields of different sizes in the proposed ReSidual U-blocks (RSU). It also increases the depth of the whole architecture without significantly increasing the computational cost because of the pooling operations used in the RSU blocks. Such architecture enables the training of a deep network from scratch without using backbones from image classification tasks.

Training Set Information

Model Information

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["U2-Net Trained on DUTS-TR Data"]
Out[2]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:

In[3]:=
NetModel["U2-Net Trained on DUTS-TR Data", "ParametersInformation"]
Out[4]=

Pick a non-default net by specifying the parameters:

In[5]:=
NetModel[{"U2-Net Trained on DUTS-TR Data", "Size" -> "Small"}]
Out[6]=

Pick a non-default uninitialized net:

In[7]:=
NetModel[{"U2-Net Trained on DUTS-TR Data", "Size" -> "Small"}, "UninitializedEvaluationNet"]
Out[8]=

Evaluation function

Define an evaluation function to resize the net output to the input image dimensions and round it to obtain the segmentation mask:

In[9]:=
netevaluate[net_, img_, device_ : "CPU"] := Round@ArrayResample[net[img, TargetDevice -> device], Reverse@ImageDimensions[img], Resampling -> "Bilinear"];

Basic usage

Obtain the segmentation mask for the most salient object in the image:

In[10]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/b7f483cb-28ed-4d6c-b946-620347985106"]
In[11]:=
mask = netevaluate[NetModel["U2-Net Trained on DUTS-TR Data"], img];

Visualize the mask:

In[12]:=
ArrayPlot[mask]
Out[12]=

The mask is a matrix of 0 and 1 whose size matches the dimensions of the input image:

In[13]:=
DeleteDuplicates@Flatten[mask]
Out[13]=

Overlay the mask on the input image:

In[14]:=
HighlightImage[img, {Opacity[0.75], mask}]
Out[14]=

Convert the mask to an image:

In[15]:=
maskImg = Image[mask]
Out[15]=

Crop the object from the image:

In[16]:=
ImageAdd[img, ColorNegate@maskImg]
Out[16]=

Delete the object from the image:

In[17]:=
ImageAdd[img, maskImg]
Out[17]=

Results showcase

Take a list of images and obtain their segmentation masks:

In[18]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/23e7cc8d-9a5e-477b-b26b-6e8870bab8a7"]
In[19]:=
results = Transpose@{imgs, Map[ArrayPlot[
       netevaluate[NetModel["U2-Net Trained on DUTS-TR Data"], #], Frame -> False] &, imgs]};

Inspect the results. Some images are more challenging than others and salient object identification is inherently an ambiguous task, so results can sometimes not be as expected:

In[20]:=
GraphicsGrid[ArrayReshape[results, {4, 6}], ImageSize -> Large]
Out[20]=

Net information

Inspect the number of parameters of all arrays in the net:

In[21]:=
Information[
 NetModel["U2-Net Trained on DUTS-TR Data"], "ArraysElementCounts"]
Out[21]=

Obtain the total number of parameters:

In[22]:=
Information[
 NetModel[
  "U2-Net Trained on DUTS-TR Data"], "ArraysTotalElementCount"]
Out[22]=

Obtain the layer type counts:

In[23]:=
Information[
 NetModel["U2-Net Trained on DUTS-TR Data"], "LayerTypeCounts"]
Out[23]=

Display the summary graphic:

In[24]:=
Information[
 NetModel["U2-Net Trained on DUTS-TR Data"], "SummaryGraphic"]
Out[24]=

Resource History

Reference