#
Wolfram Neural Net Repository

Immediate Computable Access to Neural Net Models

Generate text in English and analyze sentiment

Released in 2017, this language model uses a single multiplicative LSTM (mLSTM) and byte-level UTF-8 encoding. After training, a "sentiment unit" in the mLSTM hidden state was discovered, whose value directly corresponds to the sentiment of the text.

Number of layers: 27 | Parameter count: 86,245,112 | Trained size: 345 MB |

- Amazon Product Review dataset, consisting of 82.83 million unique reviews, from around 20 million users, dating from May 1996-July 2014.

When used for sentiment analysis, fitting a threshold on the sentiment unit achieves

92.3% accuracy on the Large Movie Review Dataset. Using the full 4096-dimensional

hidden state accuracy is increased to 92.88%.

Get the pre-trained net:

In[1]:= |

Out[1]= |

Predict the next character in a piece of text:

In[2]:= |

Out[2]= |

The output values correspond to bytes in the UTF-8 encoding (modulo a subtraction by 1). Decode the prediction:

In[3]:= |

Out[3]= |

Note that since UTF-8 is a variable-length encoding, decoding single byte values may not always make sense:

In[4]:= |

Out[4]= |

This model features a non-standard multiplicative LSTM (mLSTM), which can be implemented using NetFoldOperator:

In[5]:= |

Out[5]= |

Inspect the inner structure of the multiplicative LSTM:

In[6]:= |

Out[6]= |

This net encodes its input string into a sequence of byte values corresponding to its UTF-8 encoding. Inspect the input encoder:

In[7]:= |

Out[7]= |

The net predicts the next byte of the sequence. UTF-8 allows single byte values in the range 0-247; hence, there are 248 possible outputs. Inspect the input decoder:

In[8]:= |

Out[8]= |

Write a function to generate text efficiently using NetStateObject. Note that FromCharacterCode[…,"UTF-8"] is used at the end, but output byte values are not guaranteed to produce a valid UTF-8 sequence. In this case, FromCharacterCode will issue messages:

In[9]:= |

Generate for 100 steps using “This produc” as an initial string:

In[10]:= |

Out[10]= |

The third optional argument is a “temperature” parameter that scales the input to the final softmax. A high temperature flattens the distribution from which characters are sampled, increasing the probability of extracting less likely characters:

In[11]:= |

Out[11]= |

Decreasing the temperature sharpens the peaks of the sampling distribution, further decreasing the probability of extracting less likely characters:

In[12]:= |

Out[12]= |

Very low temperature settings are equivalent to always picking the character with maximum probability. It is typical for sampling to “get stuck in a loop”:

In[13]:= |

Out[13]= |

Very high temperature settings are equivalent to random sampling. Since the output classes are byte values to be decoded using UTF-8, a very high temperature will almost certainly generate invalid sequences:

In[14]:= |

Out[14]= |

In this model, a single unit of the output state (unit #2389) turns out to directly reflect the sentiment of the text. The final value of this unit can be used as a feature for sentiment analysis. Define a function to extract it:

In[15]:= |

Obtain sentiment scores:

In[16]:= |

Out[16]= |

In[17]:= |

Out[17]= |

Get a subset of the movie review dataset:

In[18]:= |

In[19]:= |

Out[19]= |

Obtain the review scores. If available, GPU evaluation (TargetDevice -> “GPU”) is recommended:

In[20]:= |

Positive and negative reviews are mostly separated by this single score value:

In[21]:= |

Out[21]= |

Classify the reviews using the score:

In[22]:= |

Out[22]= |

Keeping in mind that the training used scalar features from a model not explicitly trained for sentiment analysis, the obtained accuracy is remarkable:

In[23]:= |

Out[23]= |

Display the evolution of the sentiment cell as the input is read, with red background corresponding to negative sentiment and green to positive:

In[24]:= |

As the net reads a positive review, the sentiment cell evolves toward values in the positive score range:

In[25]:= |

Out[25]= |

In[26]:= |

Out[26]= |

As the net reads a negative review, the sentiment cell evolves toward values in the negative score range:

In[27]:= |

Out[27]= |

In[28]:= |

Out[28]= |

Inspect the number of parameters of all arrays in the net:

In[29]:= |

Out[29]= |

Obtain the total number of parameters:

In[30]:= |

Out[30]= |

Obtain the layer type counts:

In[31]:= |

Out[31]= |

Wolfram Language 12.0 (April 2019) or above

- A. Radford, R. Jozefowicz, I. Sutskever, "Learning to Generate Reviews and Discovering Sentiment," arXiv:1704.01444 (2017)
- (available from https://github.com/guillitte/pytorch-sentiment-neuron)
- Rights: MIT License