Sentiment Language Model Trained on Amazon Product Review Data

Generate text in English and analyze sentiment

Released in 2017, this language model uses a single multiplicative LSTM (mLSTM) and byte-level UTF-8 encoding. After training, a "sentiment unit" in the mLSTM hidden state was discovered, whose value directly corresponds to the sentiment of the text.

Number of layers: 27 | Parameter count: 86,245,112 | Trained size: 345 MB |

Training Set Information

Amazon Product Review dataset, consisting of 82.83 million unique reviews, from around 20 million users, dating from May 1996-July 2014.

Performance

When used for sentiment analysis, fitting a threshold on the sentiment unit achieves

92.3% accuracy on the Large Movie Review Dataset. Using the full 4096-dimensional

hidden state accuracy is increased to 92.88%.

Examples

Download Example Notebook

Open in Wolfram Cloud

Resource retrieval

Get the pre-trained net:

In[1]:=

Out[1]=

Basic usage

Predict the next character in a piece of text:

In[2]:=

Out[2]=

The output values correspond to bytes in the UTF-8 encoding (modulo a subtraction by 1). Decode the prediction:

In[3]:=

Out[3]=

Note that since UTF-8 is a variable-length encoding, decoding single byte values may not always make sense:

In[4]:=

Out[4]=

Multiplicative LSTM and UTF-8 encoding

This model features a non-standard multiplicative LSTM (mLSTM), which can be implemented using NetFoldOperator:

In[5]:=

Out[5]=

Inspect the inner structure of the multiplicative LSTM:

In[6]:=

Out[6]=

This net encodes its input string into a sequence of byte values corresponding to its UTF-8 encoding. Inspect the input encoder:

In[7]:=

Out[7]=

The net predicts the next byte of the sequence. UTF-8 allows single byte values in the range 0-247; hence, there are 248 possible outputs. Inspect the input decoder:

In[8]:=

Out[8]=

Generation

Write a function to generate text efficiently using NetStateObject. Note that FromCharacterCode[…,"UTF-8"] is used at the end, but output byte values are not guaranteed to produce a valid UTF-8 sequence. In this case, FromCharacterCode will issue messages:

In[9]:=

$generateSample[start_, len_, temp_ : 1, device_ : "CPU"] := Block[{enc, obj, generated, bytes}, enc = NetExtract[ NetModel[ "Sentiment Language Model Trained on Amazon Product Review Data"], "Input"]; obj = NetStateObject@ NetReplacePart[ NetModel[ "Sentiment Language Model Trained on Amazon Product Review Data"], "Input" -> {"Varying", "Integer"}]; generated = NestList[{obj[#, {"RandomSample", "Temperature" -> temp}]} &, enc[start], len]; bytes = Flatten[generated] - 1; FromCharacterCode[bytes, "UTF-8"] ]$

Generate for 100 steps using “This produc” as an initial string:

In[10]:=

Out[10]=

The third optional argument is a “temperature” parameter that scales the input to the final softmax. A high temperature flattens the distribution from which characters are sampled, increasing the probability of extracting less likely characters:

In[11]:=

Out[11]=

Decreasing the temperature sharpens the peaks of the sampling distribution, further decreasing the probability of extracting less likely characters:

In[12]:=

Out[12]=

Very low temperature settings are equivalent to always picking the character with maximum probability. It is typical for sampling to “get stuck in a loop”:

In[13]:=

Out[13]=

Very high temperature settings are equivalent to random sampling. Since the output classes are byte values to be decoded using UTF-8, a very high temperature will almost certainly generate invalid sequences:

In[14]:=

Out[14]=

Sentiment analysis

In this model, a single unit of the output state (unit #2389) turns out to directly reflect the sentiment of the text. The final value of this unit can be used as a feature for sentiment analysis. Define a function to extract it:

In[15]:=

$sentimentScore[text_, device_ : "CPU"] := With[ {output = NetModel[ "Sentiment Language Model Trained on Amazon Product Review Data"][text, NetPort[{"LastState", "Output"}], TargetDevice -> device]}, Switch[Depth[output], 2, output[[2389]], 3, output[[All, 2389]]] ]$

Obtain sentiment scores:

In[16]:=

Out[16]=

In[17]:=

Out[17]=

Get a subset of the movie review dataset:

In[18]:=

trainData = RandomSample[
ResourceData["Sample Data: Movie Review Sentence Polarity", "TrainingData"], 1000];
testData = RandomSample[
ResourceData["Sample Data: Movie Review Sentence Polarity", "TestData"], 100];

In[19]:=

Out[19]=

Obtain the review scores. If available, GPU evaluation (TargetDevice -> “GPU”) is recommended:

In[20]:=

Positive and negative reviews are mostly separated by this single score value:

In[21]:=

With[{gathered = GatherBy[scoresTrainData, Last]},
Histogram@
MapThread[Legended, {gathered[[All, All, 1]], gathered[[All, 1, 2]]}]
]

Out[21]=

Classify the reviews using the score:

In[22]:=

Out[22]=

Keeping in mind that the training used scalar features from a model not explicitly trained for sentiment analysis, the obtained accuracy is remarkable:

In[23]:=

Out[23]=

Sentiment visualization

Display the evolution of the sentiment cell as the input is read, with red background corresponding to negative sentiment and green to positive:

In[24]:=

$visualizeSentiment[text_String, device_ : "CPU"] := With[ {sentiment = NetModel[ "Sentiment Language Model Trained on Amazon Product Review Data"][text, NetPort[{"mLSTM", "OutputState"}]][[All, 2389]]}, Grid[ {MapThread[ Item[Style[#1, Bold], Background -> Opacity[0.75, #2]] &, {Characters[text], RGBColor[1 - #, #, 0] & /@ Rescale[sentiment, {-0.1, 0.1}, {0, 1}]}]}, ItemSize -> Small ] ]$

As the net reads a positive review, the sentiment cell evolves toward values in the positive score range:

In[25]:=

Out[25]=

In[26]:=

Out[26]=

As the net reads a negative review, the sentiment cell evolves toward values in the negative score range:

In[27]:=

Out[27]=

In[28]:=

Out[28]=

Net information

Inspect the number of parameters of all arrays in the net:

In[29]:=

Out[29]=

Obtain the total number of parameters:

In[30]:=

Out[30]=

Obtain the layer type counts:

In[31]:=

Out[31]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[32]:=

Out[32]=

Export also creates a net.params file containing parameters:

In[33]:=

Out[33]=

Get the size of the parameter file:

In[34]:=

Out[34]=

The size is similar to the byte count of the resource object:

In[35]:=

Out[35]=

Construction Notebook

Download Construction Notebook

Open in Wolfram Cloud

Requirements

Wolfram Language 12.0 (April 2019) or above

Resource History

Date Created: 25 April 2019
Latest Update: 23 May 2019

Reference

A. Radford, R. Jozefowicz, I. Sutskever, "Learning to Generate Reviews and Discovering Sentiment," arXiv:1704.01444 (2017)
Available from: https://github.com/guillitte/pytorch-sentiment-neuron
Rights: MIT License