EfficientNet-V2 Trained on ImageNet-21K

Identify the main object in an image

Released in 2021, this family of image classification models are trained on the full ImageNet-21K dataset, a superset of the ImageNet dataset containing more than 21 thousand classes of objects. Models pretrained on ImageNet-21K and fine-tuned on ImageNet-1K are also available and achieve a high testing accuracy on the ImageNet ILSVRC2012.

Number of models: 8

Training Set Information

ImageNet Large Scale Visual Recognition Challenge 2012 classification dataset, consisting of 1.2 million training images, with one thousand classes of objects. ImageNet-21K, consisting of 14,197,122 training images, with 21,841 classes of objects.

Performance

The fine-tuned models achieve the following accuracies on the original ImageNet validation set.

Examples

Download Example Notebook

Open in Wolfram Cloud

Resource retrieval

Get the pre-trained net:

In[1]:=

Out[1]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:

In[2]:=

Out[2]=

Pick a non-default net by specifying the parameters:

In[3]:=

Out[3]=

Pick a non-default uninitialized net:

In[4]:=

Out[4]=

Basic usage

In[5]:=

Classify an image:

In[6]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/3ef3e8a7-4a3e-4a18-8b4e-c7968df739d2"]

Out[6]=

The prediction is an Entity object, which can be queried:

In[7]:=

Out[7]=

Get a list of available properties of the predicted Entity:

In[8]:=

Out[8]=

Obtain the probabilities of the 10 most likely entities predicted by the net. Note that the top 10 predictions are not mutually exclusive:

In[9]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/bdb04381-54f3-4341-8c63-4bef344cba54"]

Out[9]=

Obtain the list of names of all available classes:

In[10]:=

Out[10]=

Feature extraction

Remove the last two layers of the trained net so that the net produces a vector representation of an image:

In[11]:=

Out[11]=

Get a set of images:

In[12]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/edb00a7e-2aaf-4098-9867-726b2e936973"]

Use the net as a feature extractor to build a clustering tree of the images:

In[13]:=

Out[13]=

Transfer learning

Use the pre-trained model to build a classifier for telling apart indoor and outdoor photos. Create a test set and a training set:

In[14]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/cf9bc77c-7015-41a9-a924-ff974a14cd3a"]

In[15]:=

Remove the last linear layer from the pre-trained net:

In[16]:=

Out[16]=

Create a new net composed of the pre-trained net followed by a linear layer and a softmax layer:

In[17]:=

Train on the dataset, freezing all the weights except for those in the "linearNew" layer (use TargetDevice -> "GPU" for training on a GPU):

In[18]:=

$trainedNet = NetTrain[newNet, trainSet, LearningRateMultipliers -> {"linearNew" -> 1, _ -> 0}]$

Out[18]=

Perfect accuracy is obtained on the test set:

In[19]:=

Out[19]=

Net information

Inspect the number of parameters of all arrays in the net:

In[20]:=

Out[20]=

Obtain the total number of parameters:

In[21]:=

Out[21]=

Obtain the layer type counts:

In[22]:=

Out[22]=

Export to ONNX

Export the net to the ONNX format:

In[23]:=

Out[23]=

Get the size of the ONNX file:

In[24]:=

Out[24]=

Check some metadata of the ONNX model:

In[25]:=

Out[25]=

Import the model back into the Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[26]:=

Out[26]=

Construction Notebook

Download Construction Notebook

Open in Wolfram Cloud

Resource History

Date Created: 20 January 2022

Reference

M. Tan, Q. V. Le, "EfficientNetV2: Smaller Models and Faster Training," arXiv:2104.00298 (2021)
Available from:
- https://github.com/google/automl/tree/master/efficientnetv2
Rights: Copyright 2020 Google Research. All rights reserved. Apache License 2.0