Wolfram Computation Meets Knowledge

ConceptNet Numberbatch Word Vectors V17.06

Represent words as vectors

Released in 2016, these word representations were obtained by combining knowledge from the human-made ConceptNet graph and multiple pre-trained, distributional-based embeddings: GloVe, word2vec and the fastText algorithm trained on the Open Subtitles 2016 dataset. This net encodes more than 400,000 tokens as unique vectors, with all tokens outside the vocabulary encoded as the zero-vector. Underscores in the original model's tokens have been replaced with white spaces.

Number of layers: 1 | Parameter count: 125,158,500 | Trained size: 503 MB

Training Set Information

Examples

Resource retrieval

Retrieve the resource object:

In[1]:=
ResourceObject["ConceptNet Numberbatch Word Vectors V17.06"]
Out[1]=

Get the pre-trained net:

In[2]:=
NetModel["ConceptNet Numberbatch Word Vectors V17.06"]
Out[2]=

Basic usage

Use the net to obtain a list of word vectors:

In[3]:=
vectors = 
 NetModel["ConceptNet Numberbatch Word Vectors V17.06"][{"hello", 
   "world"}]
Out[3]=

Obtain the dimensions of the vectors:

In[4]:=
Dimensions[vectors]
Out[4]=

Use the embedding layer inside a NetChain:

In[5]:=
chain = NetChain[{NetModel[
    "ConceptNet Numberbatch Word Vectors V17.06"], 
   LongShortTermMemoryLayer[10]}]
Out[5]=

Feature visualization

Create two lists of related words:

In[6]:=
animals = {"alligator", "ant", "bear", "bee", "bird", "camel", "cat", 
   "cheetah", "chicken", "chimpanzee", "cow", "crocodile", "deer", 
   "dog", "dolphin", "duck", "eagle", "elephant", "fish", "fly"};
In[7]:=
fruits = {"apple", "apricot", "avocado", "banana", "blackberry", 
   "blueberry", "cherry", "coconut", "cranberry", "grape", "turnip", 
   "mango", "melon", "papaya", "peach", "pineapple", "raspberry", 
   "strawberry", "ribes", "fig"};

Visualize relationships between the words using the net as a feature extractor:

In[8]:=
FeatureSpacePlot[Join[animals, fruits], 
 FeatureExtractor -> (NetModel[
      "ConceptNet Numberbatch Word Vectors V17.06"][{#}] &)]
Out[8]=

Word analogies

Get the pre-trained net:

In[9]:=
net = NetModel["ConceptNet Numberbatch Word Vectors V17.06"]
Out[9]=

Get a list of tokens:

In[10]:=
words = NetExtract[net, "Input"][["Tokens"]]
Out[10]=

Obtain the vectors:

In[11]:=
vecs = NetExtract[net, "Weights"];

Create an association whose keys are tokens and whose values are vectors:

In[12]:=
word2vec = AssociationThread[words -> Most[vecs]];

Find the eight nearest tokens to "king":

In[13]:=
Nearest[word2vec, word2vec["king"], 8]
Out[13]=

France is to Paris as Germany is to:

In[14]:=
Nearest[word2vec, 
 word2vec["paris"] - word2vec["france"] + word2vec["germany"], 5]
Out[14]=

Net information

Inspect the number of parameters of all arrays in the net:

In[15]:=
NetInformation[
 NetModel["ConceptNet Numberbatch Word Vectors V17.06"], \
"ArraysElementCounts"]
Out[15]=

Obtain the total number of parameters:

In[16]:=
NetInformation[
 NetModel["ConceptNet Numberbatch Word Vectors V17.06"], \
"ArraysTotalElementCount"]
Out[16]=

Obtain the layer type counts:

In[17]:=
NetInformation[
 NetModel["ConceptNet Numberbatch Word Vectors V17.06"], \
"LayerTypeCounts"]
Out[17]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[18]:=
jsonPath = 
 Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], 
  NetModel["ConceptNet Numberbatch Word Vectors V17.06"], "MXNet"]
Out[18]=

Export also creates a net.params file containing parameters:

In[19]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[19]=

Get the size of the parameter file:

In[20]:=
FileByteCount[paramPath]
Out[20]=

The size is similar to the byte count of the resource object:

In[21]:=
ResourceObject[
  "ConceptNet Numberbatch Word Vectors V17.06"]["ByteCount"]
Out[21]=

Represent the MXNet net as a graph:

In[22]:=
Import[jsonPath, {"MXNet", "NodeGraphPlot"}]
Out[22]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Resource History

Reference