Sentiment Language Model Trained on Amazon Product Review Data

Generate text in English and analyze sentiment

Released in 2017, this language model uses a single multiplicative LSTM (mLSTM) and byte-level UTF-8 encoding. After training, a "sentiment unit" in the mLSTM hidden state was discovered, whose value directly corresponds to the sentiment of the text.

Number of layers: 27 | Parameter count: 86,245,112 | Trained size: 345 MB |

Training Set Information

Performance

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["Sentiment Language Model Trained on Amazon Product Review Data"]
Out[1]=

Basic usage

Predict the next character in a piece of text:

In[2]:=
result = NetModel[
   "Sentiment Language Model Trained on Amazon Product Review Data"][
  "This produc"]
Out[2]=

The output values correspond to bytes in the UTF-8 encoding (modulo a subtraction by 1). Decode the prediction:

In[3]:=
FromCharacterCode[result - 1, "UTF-8"]
Out[3]=

Note that since UTF-8 is a variable-length encoding, decoding single byte values may not always make sense:

In[4]:=
FromCharacterCode[240, "UTF-8"]
Out[4]=

Multiplicative LSTM and UTF-8 encoding

This model features a non-standard multiplicative LSTM (mLSTM), which can be implemented using NetFoldOperator:

In[5]:=
NetExtract[
 NetModel["Sentiment Language Model Trained on Amazon Product Review Data"], "mLSTM"]
Out[5]=

Inspect the inner structure of the multiplicative LSTM:

In[6]:=
NetExtract[
 NetModel["Sentiment Language Model Trained on Amazon Product Review Data"], {"mLSTM", "Net"}]
Out[6]=

This net encodes its input string into a sequence of byte values corresponding to its UTF-8 encoding. Inspect the input encoder:

In[7]:=
NetExtract[
 NetModel["Sentiment Language Model Trained on Amazon Product Review Data"], "Input"]
Out[7]=

The net predicts the next byte of the sequence. UTF-8 allows single byte values in the range 0-247; hence, there are 248 possible outputs. Inspect the input decoder:

In[8]:=
NetExtract[
 NetModel["Sentiment Language Model Trained on Amazon Product Review Data"], "Output"]
Out[8]=

Generation

Write a function to generate text efficiently using NetStateObject. Note that FromCharacterCode[,"UTF-8"] is used at the end, but output byte values are not guaranteed to produce a valid UTF-8 sequence. In this case, FromCharacterCode will issue messages:

In[9]:=
generateSample[start_, len_, temp_ : 1, device_ : "CPU"] := Block[{enc, obj, generated, bytes},
  enc = NetExtract[
    NetModel[
     "Sentiment Language Model Trained on Amazon Product Review Data"], "Input"];
  obj = NetStateObject@
    NetReplacePart[
     NetModel[
      "Sentiment Language Model Trained on Amazon Product Review Data"], "Input" -> {"Varying", "Integer"}];
  generated = NestList[{obj[#, {"RandomSample", "Temperature" -> temp}]} &, enc[start], len];
  bytes = Flatten[generated] - 1;
  FromCharacterCode[bytes, "UTF-8"]
  ]

Generate for 100 steps using “This produc” as an initial string:

In[10]:=
generateSample[ "This produc", 300]
Out[10]=

The third optional argument is a “temperature” parameter that scales the input to the final softmax. A high temperature flattens the distribution from which characters are sampled, increasing the probability of extracting less likely characters:

In[11]:=
generateSample[ "This produc", 300, 1.3]
Out[11]=

Decreasing the temperature sharpens the peaks of the sampling distribution, further decreasing the probability of extracting less likely characters:

In[12]:=
generateSample[ "This produc", 300, 0.4]
Out[12]=

Very low temperature settings are equivalent to always picking the character with maximum probability. It is typical for sampling to “get stuck in a loop”:

In[13]:=
generateSample[ "This produc", 300, 0.0001]
Out[13]=

Very high temperature settings are equivalent to random sampling. Since the output classes are byte values to be decoded using UTF-8, a very high temperature will almost certainly generate invalid sequences:

In[14]:=
generateSample[ "This produc", 300, 10]
Out[14]=

Sentiment analysis

In this model, a single unit of the output state (unit #2389) turns out to directly reflect the sentiment of the text. The final value of this unit can be used as a feature for sentiment analysis. Define a function to extract it:

In[15]:=
sentimentScore[text_, device_ : "CPU"] := With[
  {output = NetModel[
      "Sentiment Language Model Trained on Amazon Product Review Data"][text, NetPort[{"LastState", "Output"}], TargetDevice -> device]},
  Switch[Depth[output], 2, output[[2389]], 3, output[[All, 2389]]]
  ]

Obtain sentiment scores:

In[16]:=
sentimentScore["This movie is great"]
Out[16]=
In[17]:=
sentimentScore["This movie is awful"]
Out[17]=

Get a subset of the movie review dataset:

In[18]:=
trainData = RandomSample[
   ResourceData["Sample Data: Movie Review Sentence Polarity", "TrainingData"], 1000];
testData = RandomSample[
   ResourceData["Sample Data: Movie Review Sentence Polarity", "TestData"], 100];
In[19]:=
RandomSample[trainData, 4]
Out[19]=

Obtain the review scores. If available, GPU evaluation (TargetDevice -> “GPU”) is recommended:

In[20]:=
scoresTrainData = Thread[sentimentScore[trainData[[All, 1]], "CPU"] -> trainData[[All, 2]]];
scoresTestData = Thread[sentimentScore[testData[[All, 1]], "CPU"] -> testData[[All, 2]]];

Positive and negative reviews are mostly separated by this single score value:

In[21]:=
With[{gathered = GatherBy[scoresTrainData, Last]},
 Histogram@
  MapThread[Legended, {gathered[[All, All, 1]], gathered[[All, 1, 2]]}]
 ]
Out[21]=

Classify the reviews using the score:

In[22]:=
classifier = Classify[scoresTrainData]
Out[22]=

Keeping in mind that the training used scalar features from a model not explicitly trained for sentiment analysis, the obtained accuracy is remarkable:

In[23]:=
ClassifierMeasurements[classifier, scoresTestData]["Accuracy"]
Out[23]=

Sentiment visualization

Display the evolution of the sentiment cell as the input is read, with red background corresponding to negative sentiment and green to positive:

In[24]:=
visualizeSentiment[text_String, device_ : "CPU"] := With[
  {sentiment = NetModel[
       "Sentiment Language Model Trained on Amazon Product Review Data"][text, NetPort[{"mLSTM", "OutputState"}]][[All, 2389]]},
  Grid[
   {MapThread[
     Item[Style[#1, Bold], Background -> Opacity[0.75, #2]] &, {Characters[text], RGBColor[1 - #, #, 0] & /@ Rescale[sentiment, {-0.1, 0.1}, {0, 1}]}]},
   ItemSize -> Small
   ]
  ]

As the net reads a positive review, the sentiment cell evolves toward values in the positive score range:

In[25]:=
visualizeSentiment[
 ResourceData["Sample Data: Movie Review Sentence Polarity", "TrainingData"][[6, 1]]]
Out[25]=
In[26]:=
visualizeSentiment[
 ResourceData["Sample Data: Movie Review Sentence Polarity", "TrainingData"][[3236, 1]]]
Out[26]=

As the net reads a negative review, the sentiment cell evolves toward values in the negative score range:

In[27]:=
visualizeSentiment[
 ResourceData["Sample Data: Movie Review Sentence Polarity", "TrainingData"][[3742, 1]]]
Out[27]=
In[28]:=
visualizeSentiment[
 ResourceData["Sample Data: Movie Review Sentence Polarity", "TrainingData"][[7455, 1]]]
Out[28]=

Net information

Inspect the number of parameters of all arrays in the net:

In[29]:=
NetInformation[
 NetModel["Sentiment Language Model Trained on Amazon Product Review Data"], "ArraysElementCounts"]
Out[29]=

Obtain the total number of parameters:

In[30]:=
NetInformation[
 NetModel["Sentiment Language Model Trained on Amazon Product Review Data"], "ArraysTotalElementCount"]
Out[30]=

Obtain the layer type counts:

In[31]:=
NetInformation[
 NetModel["Sentiment Language Model Trained on Amazon Product Review Data"], "LayerTypeCounts"]
Out[31]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[32]:=
jsonPath = Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], NetModel["Sentiment Language Model Trained on Amazon Product Review Data"], "MXNet"]
Out[32]=

Export also creates a net.params file containing parameters:

In[33]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[33]=

Get the size of the parameter file:

In[34]:=
FileByteCount[paramPath]
Out[34]=

The size is similar to the byte count of the resource object:

In[35]:=
ResourceObject[
  "Sentiment Language Model Trained on Amazon Product Review Data"]["ByteCount"]
Out[35]=

Requirements

Wolfram Language 12.0 (April 2019) or above

Resource History

Reference