ConceptNet Numberbatch Word Vectors V17.06

Represent words as vectors

Released in 2016, these word representations were obtained by combining knowledge from the human-made ConceptNet graph and multiple pre-trained, distributional-based embeddings: GloVe, word2vec and the fastText algorithm trained on the Open Subtitles 2016 dataset. This net encodes more than 400,000 tokens as unique vectors, with all tokens outside the vocabulary encoded as the zero-vector. Underscores in the original model's tokens have been replaced with white spaces.

Number of layers: 1 | Parameter count: 125,158,500 | Trained size: 503 MB |

Training Set Information

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["ConceptNet Numberbatch Word Vectors V17.06"]
Out[1]=

Basic usage

Use the net to obtain a list of word vectors:

In[2]:=
vectors = NetModel["ConceptNet Numberbatch Word Vectors V17.06"][{"hello", "world"}]
Out[2]=

Obtain the dimensions of the vectors:

In[3]:=
Dimensions[vectors]
Out[3]=

Use the embedding layer inside a NetChain:

In[4]:=
chain = NetChain[{NetModel[
    "ConceptNet Numberbatch Word Vectors V17.06"], LongShortTermMemoryLayer[10]}]
Out[4]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Resource History

Reference