Function Repository Resource:

SentenceBERTEmbedding

Source Notebook

Compute a sentence embedding for a piece of text

Contributed by: Arnoud Buzing

ResourceFunction["SentenceBERTEmbedding"][string]

embeds a string into a NumericArray.

ResourceFunction["SentenceBERTEmbedding"][list]

embeds the list of strings into a two dimensional array.

Details

Uses Sentence-BERT (SBERT) to embed the given text.
Returns a NumericArray which represents the 384-dimensional embedding space.

Examples

Basic Examples (2) 

Generate sentence embeddings for a few strings:

In[1]:=
e1 = ResourceFunction["SentenceBERTEmbedding"][
  "Tap into the Future with Wolfram Language."]
Out[1]=
In[2]:=
e2 = ResourceFunction["SentenceBERTEmbedding"][
  "Transform your Ideas into Reality with Wolfram Language"]
Out[2]=
In[3]:=
e3 = ResourceFunction["SentenceBERTEmbedding"][
  "Get Ahead. Stay Empowered. Choose Wolfram Language."]
Out[3]=

Note that these are positively correlated (cosine distances are positive):

In[4]:=
{CosineDistance[e1, e2], CosineDistance[e2, e3], CosineDistance[e3, e1]}
Out[4]=

Scope (2) 

Compute the embeddings for a list of strings:

In[5]:=
sentences = TextSentences[ExampleData[{"Text", "AliceInWonderland"}]];
embeddings = ResourceFunction["SentenceBERTEmbedding"][sentences]
Out[6]=

Create an plot of the cosine distances, comparing each embedding to the others:

In[7]:=
ArrayPlot[
 Outer[CosineDistance, Normal[embeddings], Normal[embeddings], 1]]
Out[7]=

Applications (2) 

Implement a simple semantic search:

In[8]:=
sentences = TextSentences[ExampleData[{"Text", "AliceInWonderland"}]];
embeddings = ResourceFunction["SentenceBERTEmbedding"][sentences];
query = "sentences where Alice speaks to the Rabbit";
embedding = Normal[ResourceFunction["SentenceBERTEmbedding"][query]];

Find sentences in "Alice" that most closely match the query:

In[9]:=
Grid[Take[
  SortBy[MapThread[{#1, CosineDistance[embedding, #2]} &, {sentences, Normal[embeddings]}], Last], 10], Alignment -> {Left, "."}, BaseStyle -> {FontSize -> 10}]
Out[7]=

Publisher

Arnoud Buzing

Version History

  • 1.0.0 – 20 December 2023

Related Resources

License Information