Wolfram Research

D2-Net Trained on MegaDepth Data

Find generic keypoints and their feature vectors in an image

Released in 2019 by Mihai Dusmanu et al., this VGG-like model is able to find generic keypoints in an image and describe each keypoint with a feature vector. Such feature vectors can be used to find correspondences between different images of the same scene, mapping the movement of keypoints from one image to the other. It performs local feature extraction using a describe-and-detect methodology, jointly optimizing the detection and description objectives during training. The joint objective is to minimize the distance between the corresponding keypoints in feature space while maximizing the distance between other confounding points in either image. This objective is similar to the triplet margin ranking loss with an additional detection term.

Number of layers: 22 | Parameter count: 7,635,264 | Trained size: 31 MB |

Training Set Information

Performance

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["D2-Net Trained on MegaDepth Data"]
Out[1]=

Evaluation function

Write an evaluation function to post-process the net output in order to obtain keypoint position, strength and features:

In[2]:=
Options[netevaluate] = {MaxFeatures -> 50};
netevaluate[img_Image, opts : OptionsPattern[]] := Module[
  {dims, featureMap, c, h, w, transposed, normalized, strengthArray, pos, scalex, scaley, keypointStr, keypointPos, keypointFeats},
  dims = ImageDimensions[img];
  featureMap = NetModel["D2-Net Trained on MegaDepth Data"][img];
  {c, h, w} = Dimensions[featureMap];
  transposed = Transpose[featureMap, {3, 1, 2}];
  normalized = transposed/Map[Norm, transposed, {2}]; (* Matrix containing the strengths of each keypoint *) strengthArray = Map[Max, normalized, {2}];
  (* Find positions of (up to) MaxFeatures strongest keypoints *) pos = Ordering[
     Flatten@strengthArray, -Min[OptionValue[MaxFeatures], w*h]] - 1;
  pos = QuotientRemainder[#, w] + {1, 1} & /@ pos; (* matrix position *)
  (* From array positions to image keypoint positions *)
  {scalex, scaley} = N[dims/{w, h}];
  keypointPos = {scalex*(#[[1]] - 0.5), scaley*(h - #[[2]] + 0.5)} & /@
     Reverse /@ pos;
  (* Extract the features and strengths *) keypointFeats = Extract[normalized, pos];
  keypointStr = Extract[strengthArray, pos];
  {keypointPos, keypointStr, keypointFeats}
  ]

Basic usage

Obtain the keypoints of a given image:

In[3]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/ae785a2c-eb63-4af1-ad3c-70c6946b852b"]
In[4]:=
{keypointPos, keypointStr, keypointFeats} = netevaluate[testImage];

Visualize the keypoints:

In[5]:=
HighlightImage[testImage, keypointPos]
Out[5]=

Specify a maximum of 15 keypoints and visualize the new detection:

In[6]:=
{keypointPos, keypointStr, keypointFeats} = netevaluate[testImage, MaxFeatures -> 15];
In[7]:=
HighlightImage[testImage, keypointPos]
Out[7]=

Network result

For the default input size of 224⨯224, the net divides the input image in 55⨯55 patches and computes a feature vector of size 512 for each patch:

In[8]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/085bf1db-e081-4a91-b16e-dc13c2f24cb7"]
In[9]:=
netResult = NetModel["D2-Net Trained on MegaDepth Data"][testImage];
In[10]:=
Dimensions[netResult]
Out[10]=

Every patch is associated to a scalar strength value indicating the likelihood that the patch contains a keypoint. The strength of each patch is the maximal element of its feature vector after an L2 normalization. Obtain the strength of each patch:

In[11]:=
strengthArray = With[{transposed = Transpose[netResult, {3, 1, 2}]},
   Map[Max, transposed/Map[Norm, transposed, {2}], {2}]
   ];

Visualize the strength of each patch as a heat map:

In[12]:=
heatmap = ImageApply[{#, 1 - #, 1 - #} &, ImageAdjust@Image[strengthArray]]
Out[12]=

Overlay the heat map on the image:

In[13]:=
overlayed = ImageCompose[
  testImage, {ImageResize[heatmap, ImageDimensions@testImage], 0.5}]
Out[13]=

Keypoints are selected starting from the patch with highest strength, up to keypoints. Highlight the top 10 keypoints:

In[14]:=
{keypointPos, keypointStr, keypointFeats} = netevaluate[testImage, MaxFeatures -> 10];
In[15]:=
HighlightImage[overlayed, keypointPos]
Out[15]=

Find correspondences between images

The main application of computing feature vectors for the image keypoints is to find correspondences in different images of the same scene. Get two hundred keypoint features from two images:

In[16]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/e0808678-8175-4e4d-89c9-6e3205aabbd9"]
In[17]:=
{keypointPos2, keypointStr2, keypointFeats2} = netevaluate[img2, MaxFeatures -> 200];

Define a function to find the n nearest pairs of keypoints (in feature space) and use it to find the five nearest pairs:

In[18]:=
findKeypointPairs[feats1_, feats2_, n_] := Module[
   {distances, nearestPairs, nearestDistances},
   distances = DistanceMatrix[feats1, feats2];
   nearestPairs = MapIndexed[Flatten@{#2, Ordering[#1, 1]} &, distances];
   nearestDistances = Extract[distances, nearestPairs];
   nearestPairs[[Ordering[nearestDistances, n]]]
   ];
In[19]:=
pairs = findKeypointPairs[keypointFeats1, keypointFeats2, 5]
Out[19]=

Get the keypoint positions associated with each pair and visualize them on the respective images:

In[20]:=
{pos1, pos2} = Transpose@
   Map[{keypointPos1[[First@#]], keypointPos2[[Last@#]]} &, pairs];
In[21]:=
GraphicsRow@MapThread[
  Function[{img, keypoints},
   Show[img, Graphics@
     MapIndexed[Inset[Style[First@#2, 12, Yellow, Bold], #1] &, keypoints]]
   ],
  {{img1, img2}, {pos1, pos2}}
  ]
Out[21]=

Net information

Inspect the number of parameters of all arrays in the net:

In[22]:=
Information[
 NetModel["D2-Net Trained on MegaDepth Data"], "ArraysElementCounts"]
Out[22]=

Obtain the total number of parameters:

In[23]:=
Information[
 NetModel["D2-Net Trained on MegaDepth Data"], "ArraysTotalElementCount"]
Out[23]=

Obtain the layer type counts:

In[24]:=
Information[
 NetModel["D2-Net Trained on MegaDepth Data"], "LayerTypeCounts"]
Out[24]=

Display the summary graphic:

In[25]:=
Information[
 NetModel["D2-Net Trained on MegaDepth Data"], "SummaryGraphic"]
Out[25]=

Export to ONNX

Export the net to the ONNX format:

In[26]:=
onnxFile = Export[FileNameJoin[{$TemporaryDirectory, "net.onnx"}], NetModel["D2-Net Trained on MegaDepth Data"]]
Out[26]=

Get the size of the ONNX file:

In[27]:=
FileByteCount[onnxFile]
Out[27]=

The byte count of the resource object is similar to the ONNX file:

In[28]:=
NetModel["D2-Net Trained on MegaDepth Data", "ByteCount"]
Out[28]=

Check some metadata of the ONNX model:

In[29]:=
{OpsetVersion, IRVersion} = {Import[onnxFile, "OperatorSetVersion"], Import[onnxFile, "IRVersion"]}
Out[29]=

Import the model back into the Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[30]:=
Import[onnxFile]
Out[30]=

Resource History

Reference