Wolfram English Character-Level Language Model V1

Generate text in English

This language model is based on a simple stack of gated recurrent layers. It was trained by Wolfram Research in 2018 using teacher forcing on sequences of length 80. English-language models are often used to improve the performance of other systems, such as speech-to-text applications.

Number of layers: 8 | Parameter count: 4,144,930 | Trained size: 17 MB |

Training Set Information

Internal Wolfram training set, consisting of 1.5 GB of text from old novels and news articles.

Examples

Download Example Notebook

Open in Wolfram Cloud

Resource retrieval

Get the pre-trained network:

In[1]:=

Out[1]=

Basic usage

Predict the next character of a given sequence:

In[2]:=

$NetModel["Wolfram English Character-Level Language Model V1"]["hello \ worl"]$

Out[2]=

Get the top 15 probabilities:

In[3]:=

Out[3]=

Plot the top 15 probabilities:

In[4]:=

$BarChart[Thread@ Labeled[Values@topProbs, Keys[topProbs] /. {"\n" -> "\\n", "\t" -> "\\t"}], ScalingFunctions -> "Log"]$

Out[4]=

Generation

Generate text efficiently with NetStateObject. A built-in option for temperature sampling is available in Wolfram Language 12.0, while it has to be implemented explicitly in earlier versions.

In[5]:=

$generateSample[start_, len_, temp_ : 1] := Block[{net, score, sampler, obj}, net = NetModel[ "Wolfram English Character-Level Language Model V1"]; If[$VersionNumber < 12.0, score = NetTake[net, 7]; sampler = NetTake[net, -1]; obj = NetStateObject[score]; StringJoin@ NestList[sampler[obj[#]/temp, "RandomSample"] &, start, len], obj = NetStateObject[net]; StringJoin@ NestList[obj[#, {"RandomSample", "Temperature" -> temp}] &, start, len] ] ]$

Generate for 100 steps using “hello” as an initial string:

In[6]:=

Out[6]=

The third optional argument is a “temperature” parameter that scales the input to the final softmax. A high temperature flattens the distribution from which characters are sampled, increasing the probability of extracting less likely characters:

In[7]:=

Out[7]=

Decreasing the temperature sharpens the peaks of the sampling distribution, further decreasing the probability of extracting less likely characters:

In[8]:=

Out[8]=

Very high temperature settings are equivalent to random sampling:

In[9]:=

Out[9]=

Very low temperature settings are equivalent to always picking the character with maximum probability. It is typical for sampling to “get stuck in a loop”:

In[10]:=

Out[10]=

Inspection of predictions

Define a function that takes a string and guesses the next character as it reads, showing the predictions in a grid. The input string is shown on top, while the top 5 predictions are aligned below each character, starting from more likely guesses. For each prediction, the intensity of the color is proportional to the probability:

In[11]:=

$inspectPredictions[string_] := Block[ {obj, chars, pred, predItems, charItems}, obj = NetStateObject[ NetModel["Wolfram English Character-Level Language Model V1"]]; chars = Characters[string]; pred = Map[obj[#, {"TopProbabilities", 5}] &, chars]; predItems = Map[Item[First[#], Background -> Opacity[Last[#], Darker[Green]]] &, pred, {2}]; predItems = Prepend[Most[predItems], Table[Item["", Background -> Gray], 5]]; charItems = Item[#, Background -> LightBlue] & /@ chars; Grid[ Prepend[Transpose[predItems], charItems], Spacings -> {0.6, 0.2}, Dividers -> All, FrameStyle -> Gray ] ]$

In[12]:=

$inspectPredictions["It was the best of times, it was the worst of \ times"]$

Out[12]=

In[13]:=

Out[13]=

In[14]:=

Out[14]=

Word completion

Define a function to complete a partial word by sampling with the model. Keep generating until a non-letter character is found:

In[15]:=

$autocomplete[word_] := Block[ {obj = NetStateObject[ NetModel["Wolfram English Character-Level Language Model V1"]]}, StringJoin@ Most@NestWhileList[obj, word, StringMatchQ[#, LetterCharacter | word] &] ]$