Function Repository Resource:

JaroSimilarity

Source Notebook

Compute the Jaro similarity between two strings

Contributed by: Arnoud Buzing

ResourceFunction["JaroSimilarity"][s1,s2]

computes the Jaro similarity between strings s1 and s2.

Details

Computes the Jaro similarity between strings s1 and s2, returning a value between 0 and 1, where 1 indicates identical strings and 0 indicates no similarity. The function compares two strings and calculates the number of matching characters and transpositions, using these to determine the similarity.
Two characters from the strings s1 and s2 are considered matching if they are the same and not farther apart than the floor of half the length of the longer string minus one: Floor[Max[StringLength[s1],StringLength[s2]]/2]-1.
A transposition occurs when two matching characters are in different orders in the two strings.
The Jaro similarity formula is given by , where is the number of matching characters, and is half the number of transpositions.

Examples

Basic Examples (5) 

Identical strings have a similarity of 1.0:

In[1]:=
ResourceFunction["JaroSimilarity"]["CRATE", "CRATE"]
Out[1]=

These two strings have no similarity:

In[2]:=
ResourceFunction["JaroSimilarity"]["RAISE", "CLOUD"]
Out[2]=

These two strings have some similarity:

In[3]:=
ResourceFunction["JaroSimilarity"]["CRATE", "TRACE"]
Out[3]=

Similar strings with transpositions. There are 6 matching characters (all characters) and 2 transposed characters (T and H):

In[4]:=
ResourceFunction["JaroSimilarity"]["MARTHA", "MARHTA"]
Out[4]=

This matches the Jaro similarity formula:

In[5]:=
N[(1/3)*(6/6 + 6/6 + (6 - 1)/6)]
Out[5]=

Compute the Jaro similarity of two strings of unequal length:

In[6]:=
ResourceFunction["JaroSimilarity"]["DWAYNE", "DUANE"]
Out[6]=

There are four matching characters (D, A, N, E) and no transposed characters:

In[7]:=
(1/3)*(4/6 + 4/5 + (4 - 0)/4) // N
Out[7]=

Scope (1) 

Generate 10 random words and compute the Jaro similarity between all of them:

In[8]:=
words = RandomWord[10]
Out[8]=
In[9]:=
Outer[ResourceFunction["JaroSimilarity"], words, words] // ArrayPlot
Out[9]=

Publisher

Arnoud Buzing

Requirements

Wolfram Language 14.0 (January 2024) or above

Version History

  • 1.0.0 – 25 November 2024

Related Resources

Author Notes

This code was generated with the assistance of the Wolfram AI LLM kit.

License Information