EfficientNet-V2 Trained on ImageNet-21K

Identify the main object in an image

Released in 2021, this family of image classification models are trained on the full ImageNet-21K dataset, a superset of the ImageNet dataset containing more than 21 thousand classes of objects. Models pretrained on ImageNet-21K and fine-tuned on ImageNet-1K are also available and achieve a high testing accuracy on the ImageNet ILSVRC2012.

Number of models: 8

Training Set Information

Performance

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["EfficientNet-V2 Trained on ImageNet-21K"]
Out[1]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:

In[2]:=
NetModel["EfficientNet-V2 Trained on ImageNet-21K", "ParametersInformation"]
Out[2]=

Pick a non-default net by specifying the parameters:

In[3]:=
NetModel[{"EfficientNet-V2 Trained on ImageNet-21K", "Architecture" -> "S", "ImageNet1KFinetuned" -> False}]
Out[3]=

Pick a non-default uninitialized net:

In[4]:=
NetModel[{"EfficientNet-V2 Trained on ImageNet-21K", "Architecture" -> "S", "ImageNet1KFinetuned" -> False}, "UninitializedEvaluationNet"]
Out[4]=

Basic usage

In[5]:=

Classify an image:

In[6]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/3ef3e8a7-4a3e-4a18-8b4e-c7968df739d2"]
Out[6]=

The prediction is an Entity object, which can be queried:

In[7]:=
pred["Definition"]
Out[7]=

Get a list of available properties of the predicted Entity:

In[8]:=
pred["Properties"]
Out[8]=

Obtain the probabilities of the 10 most likely entities predicted by the net. Note that the top 10 predictions are not mutually exclusive:

In[9]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/bdb04381-54f3-4341-8c63-4bef344cba54"]
Out[9]=

Obtain the list of names of all available classes:

In[10]:=
EntityValue[
 NetExtract[
  NetModel["EfficientNet-V2 Trained on ImageNet-21K"], {"Output", "Labels"}], "Name"]
Out[10]=

Feature extraction

Remove the last two layers of the trained net so that the net produces a vector representation of an image:

In[11]:=
extractor = NetDrop[NetModel["EfficientNet-V2 Trained on ImageNet-21K"], -2]
Out[11]=

Get a set of images:

In[12]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/edb00a7e-2aaf-4098-9867-726b2e936973"]

Use the net as a feature extractor to build a clustering tree of the images:

In[13]:=
ClusteringTree[imgs, FeatureExtractor -> extractor, ImageSize -> Large]
Out[13]=

Transfer learning

Use the pre-trained model to build a classifier for telling apart indoor and outdoor photos. Create a test set and a training set:

In[14]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/cf9bc77c-7015-41a9-a924-ff974a14cd3a"]
In[15]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/5658e5fd-c4dc-433c-941a-f222f45e896e"]

Remove the last linear layer from the pre-trained net:

In[16]:=
tempNet = Take[NetModel["EfficientNet-V2 Trained on ImageNet-21K"], {1, -3}]
Out[16]=

Create a new net composed of the pre-trained net followed by a linear layer and a softmax layer:

In[17]:=
newNet = NetAppend[
   tempNet, {"linearNew" -> LinearLayer[], "softmax" -> SoftmaxLayer[]}, "Output" -> NetDecoder[{"Class", {"indoor", "outdoor"}}]];

Train on the dataset, freezing all the weights except for those in the "linearNew" layer (use TargetDevice -> "GPU" for training on a GPU):

In[18]:=
trainedNet = NetTrain[newNet, trainSet, LearningRateMultipliers -> {"linearNew" -> 1, _ -> 0}]
Out[18]=

Perfect accuracy is obtained on the test set:

In[19]:=
ClassifierMeasurements[trainedNet, testSet, "Accuracy"]
Out[19]=

Net information

Inspect the number of parameters of all arrays in the net:

In[20]:=
Information[
 NetModel["EfficientNet-V2 Trained on ImageNet-21K"], "ArraysElementCounts"]
Out[20]=

Obtain the total number of parameters:

In[21]:=
Information[
 NetModel["EfficientNet-V2 Trained on ImageNet-21K"], "ArraysTotalElementCount"]
Out[21]=

Obtain the layer type counts:

In[22]:=
Information[
 NetModel["EfficientNet-V2 Trained on ImageNet-21K"], "LayerTypeCounts"]
Out[22]=

Export to ONNX

Export the net to the ONNX format:

In[23]:=
onnxFile = Export[FileNameJoin[{$TemporaryDirectory, "net.onnx"}], NetModel["EfficientNet-V2 Trained on ImageNet-21K"]]
Out[23]=

Get the size of the ONNX file:

In[24]:=
FileByteCount[onnxFile]
Out[24]=

Check some metadata of the ONNX model:

In[25]:=
{OpsetVersion, IRVersion} = {Import[onnxFile, "OperatorSetVersion"], Import[onnxFile, "IRVersion"]}
Out[25]=

Import the model back into the Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[26]:=
Import[onnxFile]
Out[26]=

Resource History

Reference