# Channel-Separated Video Action Classification Net Trained onKinetics-400 Data

Identify the main action in a video

Inspired by 2D separable convolutions in image classification, the authors propose 3D Channel-Separated Networks (CSNs), in which all convolutional operations are separated into either pointwise 1×1×1 or depthwise 3×3×3 convolutions, resulting in a significant accuracy improvement on Sports-1M, Kinetics and Something-Something datasets while being two to three times faster.

## Training Set Information

• Kinetics-400 human action video data, consisting of four hundred human action classes, with at least four hundred video clips for each action. Each clip lasts around 10 seconds and is taken from a different YouTube video.

## Examples

### Resource retrieval

Get the pre-trained net:

 In[1]:=
 Out[1]=

### Basic usage

Classify a video:

 In[2]:=
 Out[2]=
 In[3]:=
 Out[3]=

Obtain the probabilities predicted by the net:

 In[4]:=
 Out[4]=

### Feature extraction

Remove the last three layers of the trained net so that the net produces a vector representation of an image:

 In[5]:=
 Out[5]=

Get a set of videos:

 In[6]:=

Visualize the features of a set of videos:

 In[7]:=
 Out[7]=

### Transfer learning

Use the pre-trained model to build a classifier for telling apart images from two action classes not present in the dataset. Create a test set and a training set:

 In[8]:=
 In[9]:=
 In[10]:=
 In[11]:=

Remove the linear and the softmax layers from the pre-trained net:

 In[12]:=
 Out[12]=

Create a new net composed of the pre-trained net followed by a LinearLayer and a SoftmaxLayer:

 In[13]:=
 Out[14]=

Train on the dataset, freezing all the weights except for those in the "Linear" layer (use TargetDevice-> "GPU" for training on a GPU):

 In[15]:=
 Out[15]=

Perfect accuracy is obtained on the test set:

 In[16]:=
 Out[16]=

### Net information

Inspect the number of parameters of all arrays in the net:

 In[17]:=
 Out[17]=

Obtain the total number of parameters:

 In[18]:=
 Out[18]=

Obtain the layer type counts:

 In[19]:=
 Out[19]=

Display the summary graphic:

 In[20]:=
 Out[4]=