Wolfram LaTeX Character-Level Language Model V1

Generate LaTeX code

This language model is based on a simple stack of gated recurrent layers. It was trained by Wolfram Research in 2018 using teacher forcing on sequences of length 100.

Number of layers: 7 | Parameter count: 7,896,330 | Trained size: 32 MB |

Training Set Information

• Internal Wolfram training set, consisting of over 5 GB of LaTeX code scraped from 150,000 articles from arXiv.

Examples

Resource retrieval

Get the pre-trained network:

 In[1]:=
 Out[1]=

Basic usage

Predict the next character of a given sequence:

 In[2]:=
 Out[2]=

Get the top 15 probabilities:

 In[3]:=
 Out[3]=

Plot the top 15 probabilities:

 In[4]:=
 Out[4]=

Generation

Generate text efficiently with NetStateObject. A built-in option for temperature sampling is available in Wolfram Language 12.0, while it has to be implemented explicitly in earlier versions.

 In[5]:=

Generate for 100 steps using “\begin” as an initial string:

 In[6]:=
 Out[6]=

The third optional argument is a “temperature” parameter that scales the input to the final softmax. A high temperature flattens the distribution from which characters are sampled, increasing the probability of extracting less likely characters:

 In[7]:=
 Out[7]=

Decreasing the temperature sharpens the peaks of the sampling distribution, further decreasing the probability of extracting less likely characters:

 In[8]:=
 Out[8]=

Very high temperature settings are equivalent to random sampling:

 In[9]:=
 Out[9]=

Very low temperature settings are equivalent to always picking the character with maximum probability. It is typical for sampling to “get stuck in a loop”:

 In[10]:=
 Out[10]=

Inspection of predictions

Define a function that takes a string and guesses the next character as it reads, showing the predictions in a grid. The input string is shown on top, while the top 5 predictions are aligned below each character, starting from more likely guesses. For each prediction, the intensity of the color is proportional to the probability:

 In[11]:=
 In[12]:=
 Out[12]=

Net information

Inspect the sizes of all arrays in the net:

 In[13]:=
 Out[13]=

Obtain the total number of parameters:

 In[14]:=
 Out[14]=

Obtain the layer type counts:

 In[15]:=
 Out[15]=

Display the summary graphic:

 In[16]:=
 Out[16]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

 In[17]:=
 Out[17]=

Export also creates a net.params file containing parameters:

 In[18]:=
 Out[18]=

Get the size of the parameter file:

 In[19]:=
 Out[19]=

The size is similar to the byte count of the resource object:

 In[20]:=
 Out[20]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Reference

• Wolfram Research (2018)