Wolfram Research

Wolfram English Character-Level Language Model V1

Generate text in English

This language model is based on a simple stack of gated recurrent layers. It was trained by Wolfram Research in 2018 using teacher forcing on sequences of length 80. English-language models are often used to improve the performance of other systems, such as speech-to-text applications.

Number of layers: 8 | Parameter count: 4,144,930 | Trained size: 17 MB

Training Set Information

Examples

Resource retrieval

Get the pre-trained network:

In[1]:=
NetModel["Wolfram English Character-Level Language Model V1"]
Out[1]=

Basic usage

Predict the next character of a given sequence:

In[2]:=
NetModel["Wolfram English Character-Level Language Model V1"]["hello \
worl"]
Out[2]=

Get the top 15 probabilities:

In[3]:=
topProbs = 
 NetModel["Wolfram English Character-Level Language Model V1"][
  "hello worl", {"TopProbabilities", 15}]
Out[3]=

Plot the top 15 probabilities:

In[4]:=
BarChart[Thread@
  Labeled[Values@topProbs, 
   Keys[topProbs] /. {"\n" -> "\\n", "\t" -> "\\t"}], 
 ScalingFunctions -> "Log"]
Out[4]=

Generation

Generate text efficiently with NetStateObject. A built-in option for temperature sampling is available in Wolfram Language 12.0, while it has to be implemented explicitly in earlier versions.

In[5]:=
generateSample[start_, len_, temp_: 1] := 
 Block[{net, score, sampler, obj},
  net = NetModel[
    "Wolfram English Character-Level Language Model V1"];
  If[$VersionNumber < 12.0,
   score = NetTake[net, 7];
   sampler = NetTake[net, -1];
   obj = NetStateObject[score];
   StringJoin@
    NestList[sampler[obj[#]/temp, "RandomSample"] &, start, len],
   obj = NetStateObject[net];
   StringJoin@
    NestList[obj[#, {"RandomSample", "Temperature" -> temp}] &, start,
      len]
   ]
  ]

Generate for 100 steps using “hello” as an initial string:

In[6]:=
generateSample["hello", 100]
Out[6]=

The third optional argument is a “temperature” parameter that scales the input to the final softmax. A high temperature flattens the distribution from which characters are sampled, increasing the probability of extracting less likely characters:

In[7]:=
generateSample["hello", 100, 1.1]
Out[7]=

Decreasing the temperature sharpens the peaks of the sampling distribution, further decreasing the probability of extracting less likely characters:

In[8]:=
generateSample["hello", 100, 0.4]
Out[8]=

Very high temperature settings are equivalent to random sampling:

In[9]:=
generateSample["hello", 100, 10]
Out[9]=

Very low temperature settings are equivalent to always picking the character with maximum probability. It is typical for sampling to “get stuck in a loop”:

In[10]:=
generateSample["hello", 100, 0.01]
Out[10]=

Inspection of predictions

Define a function that takes a string and guesses the next character as it reads, showing the predictions in a grid. The input string is shown on top, while the top 5 predictions are aligned below each character, starting from more likely guesses. For each prediction, the intensity of the color is proportional to the probability:

In[11]:=
inspectPredictions[string_] := Block[
  {obj, chars, pred, predItems, charItems},
  obj = NetStateObject[
    NetModel["Wolfram English Character-Level Language Model V1"]];
  chars = Characters[string];
  pred = Map[obj[#, {"TopProbabilities", 5}] &, chars];
  predItems = 
   Map[Item[First[#], 
      Background -> Opacity[Last[#], Darker[Green]]] &, pred, {2}];
  predItems = 
   Prepend[Most[predItems], Table[Item["", Background -> Gray], 5]];
  charItems = Item[#, Background -> LightBlue] & /@ chars;
  Grid[
   Prepend[Transpose[predItems], charItems],
   Spacings -> {0.6, 0.2}, Dividers -> All, FrameStyle -> Gray
   ]
  ]
In[12]:=
inspectPredictions["It was the best of times, it was the worst of \
times"]
Out[12]=
In[13]:=
inspectPredictions@
 StringTake[ResourceData["The Jungle Book"], {840, 905}]
Out[13]=
In[14]:=
inspectPredictions@
 StringTake[ResourceData["Oliver Twist"], {6652, 6714}]
Out[14]=

Word completion

Define a function to complete a partial word by sampling with the model. Keep generating until a non-letter character is found:

In[15]:=
autocomplete[word_] := Block[
  {obj = NetStateObject[
     NetModel["Wolfram English Character-Level Language Model V1"]]},
  StringJoin@
   Most@NestWhileList[obj, word, 
     StringMatchQ[#, LetterCharacter | word] &]
  ]

Autocomplete a list of words:

In[16]:=
autocomplete /@ {"absolu", "incred", "attach", "perspicuo", 
  "arbitrari", "durabi", "magne", "acclima"}
Out[16]=

Create “fantasy” words:

In[17]:=
autocomplete /@ {"foob", "cromu", "embig", "mestiph", "bazi", 
  "promip", "qwe", "tato", "salh"}
Out[17]=

Net information

Inspect the sizes of all arrays in the net:

In[18]:=
NetInformation[
 NetModel["Wolfram English Character-Level Language Model V1"], \
"ArraysElementCounts"]
Out[18]=

Obtain the total number of parameters:

In[19]:=
NetInformation[
 NetModel["Wolfram English Character-Level Language Model V1"], \
"ArraysTotalElementCount"]
Out[19]=

Obtain the layer type counts:

In[20]:=
NetInformation[
 NetModel["Wolfram English Character-Level Language Model V1"], \
"LayerTypeCounts"]
Out[20]=

Display the summary graphic:

In[21]:=
NetInformation[
 NetModel["Wolfram English Character-Level Language Model V1"], \
"FullSummaryGraphic"]
Out[21]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[22]:=
jsonPath = 
 Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], 
  NetModel["Wolfram English Character-Level Language Model V1"], 
  "MXNet"]
Out[22]=

Export also creates a net.params file containing parameters:

In[23]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[23]=

Get the size of the parameter file:

In[24]:=
FileByteCount[paramPath]
Out[24]=

The size is similar to the byte count of the resource object:

In[25]:=
ResourceObject[
  "Wolfram English Character-Level Language Model V1"]["ByteCount"]
Out[25]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Resource History

Reference

  • Wolfram Research (2018)