Resource retrieval
Get the pre-trained net:
Basic usage
For each token, the net produces three length-1024 feature vectors: one that is context-independent (port "Embedding") and two that are contextual (ports "ContextualEmbedding/1" and "ContextualEmbedding/2").
Input strings are tokenized, meaning they are split into tokens that are words and punctuation marks:
Pre-tokenized inputs can be given using TextElement:
The representation of the same word in two different sentences is different. Extract the embeddings for a different sentence:
The context-independent embedding for the same word is the same, whatever the surrounding text is. For instance, for the word "Hello":
The context-dependent embeddings are different for the same word in two different sentences:
The recommended usage is to take a (possibly weighted) average of the embeddings:
Word analogies without context
Extract the non-contextual part of the net:
Precompute the context-independent embeddings for a list of common words (if available, set TargetDevice -> "GPU" for faster evaluation time):
Find the five nearest words to "king":
Man is to king as woman is to:
Visualize the similarity between the words using the net as a feature extractor:
Word analogies in context
Define a function that shows the word in context along with the average of its embeddings:
Check the result on a sentence:
Define a function to find the nearest word in context in a set of sentences, for a given word in context:
Find the semantically nearest word to the word "play" in "I play the piano":
Find the semantically nearest word to the word "set" in "The set of values higher than a threshold":
Train a model with the word embeddings
Take text-processing dataset:
Pre-compute the ELMo vectors on the training and the validation dataset (if available, GPU is recommended):
Define a network that takes word vectors instead of strings for the text-processing task:
Train the network on the pre-computed ELMo vectors:
Check the classification error rate on the validation data:
Compare the results with the performance of the same model trained on context-independent embeddings:
Net information
Inspect the number of parameters of all arrays in the net:
Obtain the total number of parameters:
Obtain the layer type counts:
Display the summary graphic:
Export to MXNet
Export the net into a format that can be opened in MXNet:
Export also creates a net.params file containing parameters:
Get the size of the parameter file:
The size is similar to the byte count of the resource object: