Wolfram Computation Meets Knowledge

GloVe 100-Dimensional Word Vectors Trained on Tweets

Represent words as vectors

Released in 2014 by the computer science department at Stanford University, this 100-dimensional representation is trained using an original method called Global Vectors (GloVe). It encodes 1,193,515 tokens as unique vectors, with all tokens outside the vocabulary encoded as the zero-vector. Token case is ignored.

Number of layers: 1 | Parameter count: 119,351,500 | Trained size: 490 MB

Training Set Information

Examples

Resource retrieval

Retrieve the resource object:

In[1]:=
ResourceObject["GloVe 100-Dimensional Word Vectors Trained on Tweets"]
Out[1]=

Get the pre-trained net:

In[2]:=
NetModel["GloVe 100-Dimensional Word Vectors Trained on Tweets"]
Out[2]=

Basic usage

Use the net to obtain a list of word vectors:

In[3]:=
vectors = 
 NetModel["GloVe 100-Dimensional Word Vectors Trained on Tweets"][
  "hello world"]
Out[3]=

Obtain the dimensions of the vectors:

In[4]:=
Dimensions[vectors]
Out[4]=

Use the embedding layer inside a NetChain:

In[5]:=
chain = NetChain[{NetModel[
    "GloVe 100-Dimensional Word Vectors Trained on Tweets"], 
   LongShortTermMemoryLayer[10]}]
Out[5]=

Feature visualization

Create two lists of related words:

In[6]:=
animals = {"Alligator", "Ant", "Bear", "Bee", "Bird", "Camel", "Cat", 
   "Cheetah", "Chicken", "Chimpanzee", "Cow", "Crocodile", "Deer", 
   "Dog", "Dolphin", "Duck", "Eagle", "Elephant", "Fish", "Fly"};
In[7]:=
fruits = {"Apple", "Apricot", "Avocado", "Banana", "Blackberry", 
   "Blueberry", "Cherry", "Coconut", "Cranberry", "Grape", "Turnip", 
   "Mango", "Melon", "Papaya", "Peach", "Pineapple", "Raspberry", 
   "Strawberry", "Ribes", "Fig"};

Visualize relationships between the words using the net as a feature extractor:

In[8]:=
FeatureSpacePlot[Join[animals, fruits], 
 FeatureExtractor -> 
  NetModel["GloVe 100-Dimensional Word Vectors Trained on Tweets"]]
Out[8]=

Word analogies

Get the pre-trained net:

In[9]:=
net = NetModel["GloVe 100-Dimensional Word Vectors Trained on Tweets"]
Out[9]=

Get a list of words:

In[10]:=
words = NetExtract[net, "Input"][["Tokens"]]
Out[10]=

Obtain the vectors:

In[11]:=
vecs = NetExtract[net, "Weights"][[1 ;; -2]];

Create an association whose keys are words and whose values are vectors:

In[12]:=
word2vec = AssociationThread[words -> vecs];

Find the eight nearest words to "king":

In[13]:=
Nearest[word2vec, word2vec["king"], 8]
Out[13]=

Man is to king as woman is to:

In[14]:=
Nearest[word2vec, 
 word2vec["king"] - word2vec["man"] + word2vec["woman"], 5]
Out[14]=

France is to Paris as Germany is to:

In[15]:=
Nearest[word2vec, 
 word2vec["paris"] - word2vec["france"] + word2vec["germany"], 5]
Out[15]=

Net information

Inspect the number of parameters of all arrays in the net:

In[16]:=
NetInformation[
 NetModel["GloVe 100-Dimensional Word Vectors Trained on Tweets"], \
"ArraysElementCounts"]
Out[16]=

Obtain the total number of parameters:

In[17]:=
NetInformation[
 NetModel["GloVe 100-Dimensional Word Vectors Trained on Tweets"], \
"ArraysTotalElementCount"]
Out[17]=

Obtain the layer type counts:

In[18]:=
NetInformation[
 NetModel["GloVe 100-Dimensional Word Vectors Trained on Tweets"], \
"LayerTypeCounts"]
Out[18]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[19]:=
jsonPath = 
 Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], 
  NetModel["GloVe 100-Dimensional Word Vectors Trained on Tweets"], 
  "MXNet"]
Out[19]=

Export also creates a net.params file containing parameters:

In[20]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[20]=

Get the size of the parameter file:

In[21]:=
FileByteCount[paramPath]
Out[21]=

The size is similar to the byte count of the resource object:

In[22]:=
ResourceObject[
  "GloVe 100-Dimensional Word Vectors Trained on Tweets"]["ByteCount"]
Out[22]=

Represent the MXNet net as a graph:

In[23]:=
Import[jsonPath, {"MXNet", "NodeGraphPlot"}]
Out[23]=

Requirements

Wolfram Language 11.2 (September 2017) or above

Resource History

Reference