Resource retrieval
Get the pre-trained net:
Out[1]= |  |
NetModel parameters
This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:
Out[2]= |  |
Pick a non-default model by specifying the parameters:
Out[3]= |  |
Pick a non-default untrained net:
Out[4]= |  |
Basic usage
Identify an Audio object:
Out[5]= |  |
The prediction is an Entity object, which can be queried:
Out[6]= |  |
Get a list of available properties of the predicted Entity:
Out[7]= |  |
Obtain the probabilities of the ten most likely entities predicted by the net:
Out[8]= |  |
The probabilities do not sum to 1 since the net was trained as a collection of independent binary classifiers, one per each class. This reflects the possibility of having multiple sound classes in a single recording.
The network was trained on the AudioSet dataset, where each audio signal is annotated with the sound classes/sources that are present in the recording. The labels are organized in an ontology of about 632 classes that span a very wide domain of sound types or sources, from musical instruments and music types to animal, mechanical and human sounds. Obtain the list of names of all available classes:
Out[9]= |  |
Feature extraction
The core of the network takes a fixed-size chunk of the mel-spectrogram of the input signal and is mapped over overlapping chunks using NetMapOperator. Extract the core net:
Out[10]= |  |
Chop off the last few layers in charge of the classification:
Out[11]= |  |
This net takes a single chunk of the input signal and outputs a tensor of semantically meaningful features. Reconstruct the whole variable-length net using NetMapOperator to compute the features on each chunk and AggregationLayer to aggregate them over the time dimension:
Out[12]= |  |
Get a set of Audio objects:
Visualize the features of a set of recordings:
Out[14]= |  |
Transfer learning
Use the pre-trained model to build a classifier for telling apart recordings of cows and birds. Create a test set and a training set:
Remove the classification layers from the pre-trained net:
Out[16]= |  |
Create a classifier net using a simple LinearLayer:
Out[17]= |  |
Precompute the result of the feature net to avoid redundant evaluations. This is equivalent to freezing all the weights except for those in the new classifier net:
Train on the dataset (use TargetDevice -> "GPU" for training on a GPU):
Out[19]= |  |
Perfect accuracy is obtained on the test set:
Out[20]= |  |
Net information
Inspect the number of parameters of all arrays in the net:
Out[21]= |  |
Obtain the total number of parameters:
Out[22]= |  |
Obtain the layer type counts:
Out[23]= |  |
Display the summary graphic:
Out[24]= |  |
Export to MXNet
Export the net into a format that can be opened in MXNet:
Out[25]= |  |
Export also creates a net.params file containing parameters:
Out[26]= |  |
Get the size of the parameter file:
Out[27]= |  |