# GloVe 300-Dimensional Word Vectors Trained onCommon Crawl 42B

Represent words as vectors

Released in 2014 by the computer science department at Stanford University, this representation is trained using an original method called Global Vectors (GloVe). It encodes 1,917,495 tokens as unique vectors, with all tokens outside the vocabulary encoded as the zero-vector. Token case is ignored.

Number of layers: 1 | Parameter count: 575,248,500 | Trained size: 2 GB

## Training Set Information

• Web data from Common Crawl, trained on 42 billion tokens, with around 1.9 million unique tokens, taking into account the case.

## Examples

### Resource retrieval

Retrieve the resource object:

 In[1]:=
 Out[1]=

Get the pre-trained net:

 In[2]:=
 Out[2]=

### Basic usage

Use the net to obtain a list of word vectors:

 In[3]:=
 Out[3]=

Obtain the dimensions of the vectors:

 In[4]:=
 Out[4]=

Use the embedding layer inside a NetChain:

 In[5]:=
 Out[5]=

### Feature visualization

Create two lists of related words:

 In[6]:=
 In[7]:=

Visualize relationships between the words using the net as a feature extractor:

 In[8]:=
 Out[8]=

### Word analogies

Get the pre-trained net:

 In[9]:=
 Out[9]=

Get a list of words:

 In[10]:=
 Out[10]=

Obtain the vectors:

 In[11]:=

Create an association whose keys are words and whose values are vectors:

 In[12]:=

Find the eight nearest words to "king":

 In[13]:=
 Out[13]=

Man is to king as woman is to:

 In[14]:=
 Out[14]=

France is to Paris as Germany is to:

 In[15]:=
 Out[15]=

### Net information

Inspect the number of parameters of all arrays in the net:

 In[16]:=
 Out[16]=

Obtain the total number of parameters:

 In[17]:=
 Out[17]=

Obtain the layer type counts:

 In[18]:=
 Out[18]=

### Export to MXNet

Export the net into a format that can be opened in MXNet:

 In[19]:=
 Out[19]=

Export also creates a net.params file containing parameters:

 In[20]:=
 Out[20]=

Get the size of the parameter file:

 In[21]:=
 Out[21]=

The size is similar to the byte count of the resource object:

 In[22]:=
 Out[22]=

Represent the MXNet net as a graph:

 In[23]:=
 Out[23]=

## Requirements

Wolfram Language 11.2 (September 2017) or above

## Reference

• J. Pennington, R. Socher, C. D. Manning, "GloVe: Global Vectors for Word Representation," Empirical Methods in Natural Language Processing (EMNLP), 1,532-1,543 (2014)
• (available from http://nlp.stanford.edu/projects/glove)