3D-Inflated ResNet-50
Trained on
Kinetics 400 Data
This model applies a 3D-inflation technique to bootstrap the kernels of a 3D convolutional network from a 2D ResNet-50 architecture, directly leveraging years of progress on the image domain architectures for video applications. The weights of the 3D convolutional filters were initialized by replicating the 2D filters of ResNet-50 along the time dimension, which can be seen as an implicit pre-training on a video dataset consisting of static ImageNet images replicated across time.
Examples
Resource retrieval
Get the pre-trained net:
Basic usage
Classify a video:
Obtain the probabilities predicted by the net:
Feature extraction
Remove the last two layers of the trained net so that the net produces a vector representation of an image:
Get a set of videos:
Visualize the features of a set of videos:
Transfer learning
Use the pre-trained model to build a classifier for telling apart images from two action classes not present in the dataset. Create a test set and a training set:
Remove the last two layers from the pre-trained net:
Create a new net composed of the pre-trained net followed by a linear layer, an aggregation layer and a softmax layer:
Train on the dataset, freezing all the weights except for those in the "Linear" layer (use TargetDevice -> "GPU" for training on a GPU):
Perfect accuracy is obtained on the test set:
Net information
Inspect the number of parameters of all arrays in the net:
Obtain the total number of parameters:
Obtain the layer type counts:
Display the summary graphic:
Resource History
Reference