DistilBERT Trained on BookCorpus and English Wikipedia Data

Represent text as a sequence of vectors

Released in 2019, this model utilizes the technique of knowledge distillation during pre-training, reducing the size of BERT models by 40% and making it 60% faster while retaining 99% of its language understanding capabilities.

Number of models: 3

Training Set Information

Performance

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["DistilBERT Trained on BookCorpus and English Wikipedia \
Data"]
Out[1]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:

In[2]:=
NetModel["DistilBERT Trained on BookCorpus and English Wikipedia \
Data", "ParametersInformation"]
Out[2]=

Pick a non-default net by specifying the parameters:

In[3]:=
NetModel[{"DistilBERT Trained on BookCorpus and English Wikipedia \
Data", "Type" -> "BaseMultilingualCased", "InputType" -> "ListOfStrings"}]
Out[3]=

Pick a non-default uninitialized net:

In[4]:=
NetModel[{"DistilBERT Trained on BookCorpus and English Wikipedia \
Data", "Type" -> "BaseUncased", "InputType" -> "ListOfStrings"}, "UninitializedEvaluationNet"]
Out[4]=

Basic usage

Given a piece of text, the DistilBERT net produces a sequence of feature vectors of size 768, which correspond to the sequence of input words or subwords:

In[5]:=
input = "Hello world! I am here";
embeddings = NetModel["DistilBERT Trained on BookCorpus and English Wikipedia \
Data"][input];

Obtain dimensions of the embeddings:

In[6]:=
Dimensions@embeddings
Out[6]=

Visualize the embeddings:

In[7]:=
MatrixPlot@embeddings
Out[7]=

Requirements

Wolfram Language 12.1 (March 2020) or above

Resource History

Reference