BERT Trained on BookCorpus and Wikipedia Data

Represent text as a sequence of vectors

This model is also available through the built-in function FindTextualAnswer

Released in 2018, Bidirectional Encoder Representations from Transformers (BERT) is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right contexts in all layers. This model can be fine tuned with an additional output layer to create state-of-the art models for a wide range of tasks. It uses bidirectional self-attention, often referred to as a "transformer encoder".

Trained size: 436 MB | Number of models: 7

Training Set Information

Performance

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["BERT Trained on BookCorpus and Wikipedia Data"]
Out[1]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:

In[2]:=
NetModel["BERT Trained on BookCorpus and Wikipedia Data", "ParametersInformation"]
Out[2]=

Pick a non-default net by specifying the parameters:

In[3]:=
NetModel[{"BERT Trained on BookCorpus and Wikipedia Data", "Type" -> "LargeUncased", "InputType" -> "ListOfStrings"}]
Out[3]=

Pick a non-default uninitialized net:

In[4]:=
NetModel[{"BERT Trained on BookCorpus and Wikipedia Data", "Type" -> "BaseCased", "InputType" -> "ListOfStrings"}, "UninitializedEvaluationNet"]
Out[4]=

Basic usage

Given a piece of text, the BERT net produces a sequence of feature vectors of size 768, which corresponds to the sequence of input words or subwords:

In[5]:=
input = "Hello world! I am here";
embeddings = NetModel["BERT Trained on BookCorpus and Wikipedia Data"][input];

Obtain dimensions of the embeddings:

In[6]:=
Dimensions@embeddings
Out[6]=

Visualize the embeddings:

In[7]:=
MatrixPlot@embeddings
Out[7]=

Requirements

Wolfram Language 12.1 (March 2020) or above

Resource History

Reference