Wav2Vec2
Trained on
Multiple Datasets
These models are derived from the "Wav2Vec2 Trained on LibriSpeech Data" family. They explore more general setups where the domain of the unlabeled data for pre-training data differs from the domain of the labeled data for fine-tuning, which may differ from the test data domain. The results show that pre-training on multiple domains improves generalization performance on domains not seen during training. The models are pre-trained using a single large Wav2Vec2 model on four domains (Libri-Light, Switchboard, Fisher and Common Voice) and fine-tuned on the LibriSpeech and Switchboard datasets.
Examples
Resource retrieval
Get the pre-trained net:
NetModel parameters
This model consists of a family of individual nets, each identified by a specific parameter. Inspect the available parameters:
Pick a non-default net by specifying the parameters:
Pick a non-default uninitialized net:
Evaluation function
Define an evaluation function that runs the net and produces the final transcribed text:
Basic usage
Record an audio sample and transcribe it:
Try it over different audio samples. Notice that the output can contain spelling mistakes, especially with noisy audio. Hence a spellchecker is usually needed as a post-processing step:
Feature extraction
Take the feature extractor from the trained net and aggregate the output so that the net produces a vector representation of an audio clip:
Get a set of utterances in various languages:
Visualize the features of a set of audio clips:
Net information
Inspect the sizes of all arrays in the net:
Obtain the total number of parameters:
Obtain the layer type counts:
Display the summary graphic:
Requirements
Wolfram Language
13.2
(December 2022)
or above
Resource History
Reference
-
W.-N. Hsu, A. Sriram, A. Baevski, T. Likhomanenko, Q. Xu, V. Pratap, J. Kahn, A. Lee, R. Collobert, G. Synnaeve, M. Auli, "Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-training," arXiv:2104.01027 (2021)
- Available from: https://github.com/facebookresearch/fairseq
-
Rights:
MIT License