MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets

Estimate the depth map of an image

Released in 2020, these ResNet-based models are trained to predict the relative depth map from a single image. MiDaS V2.1 was trained on multiple datasets, some of which have incompatible annotations. The authors leveraged the diversity of the training data and developed a robust multi-objective training loss to achieve a state-of-the-art result for monocular depth estimation.

Training Set Information

Model Information

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"]
Out[1]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:

In[2]:=
NetModel["MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets", "ParametersInformation"]
Out[2]=

Pick a non-default net by specifying the parameters:

In[3]:=
NetModel[{"MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets", "Size" -> "Large"}]
Out[3]=

Pick a non-default uninitialized net:

In[4]:=
NetModel[{"MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets", "Size" -> "Large"}, "UninitializedEvaluationNet"]
Out[4]=

Basic usage

Define a test image:

In[5]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/4bd7b1eb-424d-41d8-b189-dcf0e584f339"]

Obtain the depth map of an image:

In[6]:=
depthMap = NetModel[
    "MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"][
   testImage];

Show the depth map:

In[7]:=
ImageAdjust[depthMap]
Out[7]=

Visualize a 3D model

Get an image:

In[8]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/fa4013e8-8644-41d9-bdd8-4f8a21785852"]

Obtain the depth map:

In[9]:=
depthMap = ImageResize[
   NetModel[
     "MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"]@
    img,
   ImageDimensions@img
   ];

Visualize a 3D model using the depth map:

In[10]:=
ListPlot3D[ImageData[depthMap], PlotStyle -> Texture[ImageReflect@img], PlotTheme -> {"Minimal", "NoAxes"}, ViewPoint -> {0.002, 0.8, 1.2}]
Out[10]=

Adapt to any size

The net resizes the input image to 256x256 pixels and produces a depth map of the same size:

In[11]:=
NetExtract[
 NetModel[
  "MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"], {"Input", "ImageSize"}]
Out[11]=

The recommended way to obtain a depth map with the same dimensions of the input image is to resample the depth map after the net evaluation. Get an image:

In[12]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/19297e82-8fa0-4b9d-b0c8-5163e68eb120"]
In[13]:=
imgDims = ImageDimensions[img]
Out[13]=

Obtain the depth map and resize it to match the original image dimensions:

In[14]:=
{time1, depthMap1} = RepeatedTiming@ImageResize[
    NetModel[
      "MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"][img],
    imgDims,
    Resampling -> "Cubic"
    ];
In[15]:=
depthMap1 = Colorize[ImageAdjust@depthMap1, ColorFunction -> "ThermometerColors"]
Out[15]=

Now modify the net, changing the image size in the NetEncoder. The new net natively produces a depth map of the original image's size:

In[16]:=
resizedNet = NetReplacePart[
  NetModel[
   "MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"], {"Input", "ImageSize"} -> imgDims]
Out[16]=

Obtain the depth map from the new net and visualize it:

In[17]:=
{time2, depthMap2} = RepeatedTiming@resizedNet[img];
In[18]:=
depthMap2 = Colorize[ImageAdjust@depthMap2, ColorFunction -> "ThermometerColors"]
Out[18]=

Compare the results. Notice that the depth map obtained by resizing the net output (top-right corner, depthMap1) more accurately predicts the depth of the roof lamp and the carpet but is less accurate at predicting the depth of the background furniture:

In[19]:=
ImageCollage[{
  0.8 -> img,
  0.2 -> depthMap1,
  0.2 -> depthMap2
  }]
Out[19]=

The first pipeline is also faster:

In[20]:=
{time1, time2}
Out[20]=

Net information

Inspect the number of parameters of all arrays in the net:

In[21]:=
Information[
 NetModel[
  "MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"], "ArraysElementCounts"]
Out[21]=

Obtain the total number of parameters:

In[22]:=
Information[
 NetModel[
  "MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"], "ArraysTotalElementCount"]
Out[22]=

Obtain the layer type counts:

In[23]:=
Information[
 NetModel[
  "MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"], "LayerTypeCounts"]
Out[23]=

Display the summary graphic:

In[24]:=
Information[
 NetModel[
  "MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"], "SummaryGraphic"]
Out[24]=

Export to ONNX

Export the net to the ONNX format:

In[25]:=
onnxFile = Export[FileNameJoin[{$TemporaryDirectory, "net.onnx"}], NetModel[
   "MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets"]]
Out[25]=

Get the size of the ONNX file:

In[26]:=
FileByteCount[onnxFile]
Out[26]=

The size is similar to the byte count of the resource object:

In[27]:=
NetModel["MiDaS V2.1 Depth Perception Nets Trained on Multiple-Datasets", "ByteCount"]
Out[27]=

Check some metadata of the ONNX model:

In[28]:=
{OpsetVersion, IRVersion} = {Import[onnxFile, "OperatorSetVersion"], Import[onnxFile, "IRVersion"]}
Out[28]=

Import the model back into Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[29]:=
Import[onnxFile]
Out[29]=

Resource History

Reference