# Wolfram Neural Net Repository

Immediate Computable Access to Neural Net Models

Find generic keypoints and their feature vectors in an image

Released in 2019 by Mihai Dusmanu et al., this VGG-like model is able to find generic keypoints in an image and describe each keypoint with a feature vector. Such feature vectors can be used to find correspondences between different images of the same scene, mapping the movement of keypoints from one image to the other. It performs local feature extraction using a describe-and-detect methodology, jointly optimizing the detection and description objectives during training. The joint objective is to minimize the distance between the corresponding keypoints in feature space while maximizing the distance between other confounding points in either image. This objective is similar to the triplet margin ranking loss with an additional detection term.

Number of layers: 22 | Parameter count: 7,635,264 | Trained size: 31 MB |

- MegaDepth, consisting of 196 different locations reconstructed from COLMAP SfM/MVS with 130 thousand images. Of these 130 thousand photos, around one hundred thousand images are used for Euclidean depth data, and the remaining 30 thousand images are used to derive ordinal depth data.

This model achieves 74.2% accuracy for correctly localized queries with a distance threshold equal to one meter on the InLoc dataset.

Get the pre-trained net:

In[1]:= |

Out[1]= |

Write an evaluation function to post-process the net output in order to obtain keypoint position, strength and features:

In[2]:= |

Obtain the keypoints of a given image:

In[3]:= |

In[4]:= |

Visualize the keypoints:

In[5]:= |

Out[5]= |

Specify a maximum of 15 keypoints and visualize the new detection:

In[6]:= |

In[7]:= |

Out[7]= |

For the default input size of 224⨯224, the net divides the input image in 55⨯55 patches and computes a feature vector of size 512 for each patch:

In[8]:= |

In[9]:= |

In[10]:= |

Out[10]= |

Every patch is associated to a scalar strength value indicating the likelihood that the patch contains a keypoint. The strength of each patch is the maximal element of its feature vector after an L2 normalization. Obtain the strength of each patch:

In[11]:= |

Visualize the strength of each patch as a heat map:

In[12]:= |

Out[12]= |

Overlay the heat map on the image:

In[13]:= |

Out[13]= |

Keypoints are selected starting from the patch with highest strength, up to keypoints. Highlight the top 10 keypoints:

In[14]:= |

In[15]:= |

Out[15]= |

The main application of computing feature vectors for the image keypoints is to find correspondences in different images of the same scene. Get two hundred keypoint features from two images:

In[16]:= |

In[17]:= |

Define a function to find the *n* nearest pairs of keypoints (in feature space) and use it to find the five nearest pairs:

In[18]:= |

In[19]:= |

Out[19]= |

Get the keypoint positions associated with each pair and visualize them on the respective images:

In[20]:= |

In[21]:= |

Out[21]= |

Inspect the number of parameters of all arrays in the net:

In[22]:= |

Out[22]= |

Obtain the total number of parameters:

In[23]:= |

Out[23]= |

Obtain the layer type counts:

In[24]:= |

Out[24]= |

Display the summary graphic:

In[25]:= |

Out[25]= |

Export the net to the ONNX format:

In[26]:= |

Out[26]= |

Get the size of the ONNX file:

In[27]:= |

Out[27]= |

The byte count of the resource object is similar to the ONNX file:

In[28]:= |

Out[28]= |

Check some metadata of the ONNX model:

In[29]:= |

Out[29]= |

Import the model back into the Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[30]:= |

Out[30]= |

- M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, T. Sattler, "D2-Net: A Trainable CNN for Joint Detection and Description of Local Features," arXiv:1905.03561 (2019)
- (available from https://github.com/mihaidusmanu/d2-net)
- Rights: D2-net BSD license