U2-Net Trained on DUTS-TR Data

Segment objects in an image

This model is also available through the built-in functions ImageSaliencyFilter and RemoveBackground

The architecture of this models features a two-level nesting of U structures where each node of the top-level UNet is a UNet itself. This design is able to capture more contextual information from different scales thanks to the mixture of receptive fields of different sizes in the proposed ReSidual U-blocks (RSU). It also increases the depth of the whole architecture without significantly increasing the computational cost because of the pooling operations used in the RSU blocks. Such architecture enables the training of a deep network from scratch without using backbones from image classification tasks.

Training Set Information

DUTS-TR dataset, a part of DUTS dataset, containing 10553 images in total. It is currently the largest and most frequently used training dataset for salient object detection. The dataset is augmented with horizontal flipping to obtain 21106 training images.

Model Information

Examples

Download Example Notebook

Open in Wolfram Cloud

Resource retrieval

Get the pre-trained net:

In[1]:=

Out[2]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:

In[3]:=

Out[4]=

Pick a non-default net by specifying the parameters:

In[5]:=

Out[6]=

Pick a non-default uninitialized net:

In[7]:=

Out[8]=

Evaluation function

Define an evaluation function to resize the net output to the input image dimensions and round it to obtain the segmentation mask:

In[9]:=

netevaluate[net_, img_, device_ : "CPU"] := Round@ArrayResample[net[img, TargetDevice -> device], Reverse@ImageDimensions[img], Resampling -> "Bilinear"];

Basic usage

Obtain the segmentation mask for the most salient object in the image:

In[10]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/a0233609-fe9a-4ad6-bd45-9785becb66ce"]

In[11]:=

Visualize the mask:

In[12]:=

Out[12]=

The mask is a matrix of 0 and 1 whose size matches the dimensions of the input image:

In[13]:=

Out[13]=

Overlay the mask on the input image:

In[14]:=

Out[14]=

Convert the mask to an image:

In[15]:=

Out[15]=

Crop the object from the image:

In[16]:=

Out[16]=

Delete the object from the image:

In[17]:=

Out[17]=

Results showcase

Take a list of images and obtain their segmentation masks:

In[18]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/4d4650ad-fcc8-4d5f-9b93-c60c306ce150"]

In[19]:=

results = Transpose@{imgs, Map[ArrayPlot[
netevaluate[NetModel["U2-Net Trained on DUTS-TR Data"], #], Frame -> False] &, imgs]};

Inspect the results. Some images are more challenging than others and salient object identification is inherently an ambiguous task, so results can sometimes not be as expected:

In[20]:=

Out[20]=

Net information

Inspect the number of parameters of all arrays in the net:

In[21]:=

Out[21]=

Obtain the total number of parameters:

In[22]:=

Out[22]=

Obtain the layer type counts:

In[23]:=

Out[23]=

Display the summary graphic:

In[24]:=

Out[24]=

Resource History

Date Created: 23 February 2023

Reference

X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, M. Jagersand, U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection," arXiv:2005.09007v3 (2022)
Available from: https://github.com/xuebinqin/U-2-Net
Rights: Apache License 2.0