CREPE Pitch Detection Net
Trained on
Monophonic Signal Data
Released in 2018, CREPE is a state-of-the-art system based on a deep convolutional neural network that operates directly on the time-domain waveform. The architecture is based on a chain of six convolution stacks, followed by a classifier. The net outputs a vector that represents the probability of the pitch being in one of 360 frequency classes nonlinearly spaced.
Number of layers: 41 |
Parameter count: 22,244,328 |
Trained size: 89 MB |
Examples
Resource retrieval
Get the pre-trained net:
Evaluation function
Define a Hidden Markov process that will be used for decoding the output of the net:
This net takes a monophonic audio signal and outputs an estimation of the pitch of the signal on a logarithmic pitch scale. Write an evaluation function to convert the result to a TimeSeries containing the predicted frequency in Hz and the confidence of the prediction:
Basic usage
Detect the pitch of a monophonic signal:
Plot the predicted frequency with the confidence mapped to the opacity:
Performance evaluation
Generate a signal using a sinusoidal oscillator:
Compare the frequency predicted by the net with the ground truth:
Net information
Inspect the number of parameters of all arrays in the net:
Obtain the total number of parameters:
Obtain the layer type counts:
Display the summary graphic for the main net:
Export to MXNet
Export the net into a format that can be opened in MXNet:
Export also creates a net.params file containing parameters:
Get the size of the parameter file:
The size is similar to the byte count of the resource object:
Represent the MXNet net as a graph:
Resource History
Reference