Function Repository Resource:

Soundex

Source Notebook

Compute the Soundex phonetic identifier of a word

Contributed by: Anton Antonov

ResourceFunction["Soundex"][word]

finds a ResourceFunction["Soundex"] phonetic identifier of the string word.

Details

The goal of the ResourceFunction["Soundex"] algorithm is to have homophones encoded with the same identifiers. (Therefore, same-sounding words match regardless of spelling variants or misspellings.)
ResourceFunction["Soundex"] has the attribute Listable.

Examples

Basic Examples (2) 

Find the Soundex identifier of a word:

In[1]:=
ResourceFunction["Soundex"]["horror"]
Out[1]=

The Soundex identifier of a misspelled word (the result is the same as the previous one):

In[2]:=
ResourceFunction["Soundex"]["horur"]
Out[2]=

Scope (2) 

The character case does not matter:

In[3]:=
ResourceFunction["Soundex"]["dancer"]
Out[3]=
In[4]:=
ResourceFunction["Soundex"]["DANCER"]
Out[4]=

Digits are not processed and become part of the result:

In[5]:=
ResourceFunction["Soundex"]["d4ncer"]
Out[5]=

Soundex has the attribute Listable:

In[6]:=
ResourceFunction["Soundex"][{"dancer", "danncer", "Bouncer"}]
Out[6]=

Applications (4) 

Compute an association of dictionary words and corresponding Soundex codes:

In[7]:=
AbsoluteTiming[
 codes = Association@
    Map[# -> ResourceFunction["Soundex"][#] &, DictionaryLookup["*"]];
 ]
Out[7]=

Show the top 20 most "popular" Soundex codes:

In[8]:=
top = TakeLargestBy[Tally[Values[codes]], #[[2]] &, 20]
Out[8]=

Show that the Soundex codes adhere to the Pareto principle:

In[9]:=
freqs = SortBy[Tally[Values[codes]], -#[[2]] &][[All, 2]];
freqs = Accumulate[freqs]/Total[freqs];
ListLinePlot[freqs, PlotTheme -> "Detailed"]
Out[11]=

Show the words corresponding to one of the top Soundex codes:

In[12]:=
GroupBy[Normal[codes], #[[2]] &, #[[All, 1]] &][top[[20, 1]]]
Out[12]=

Neat Examples (4) 

Pick some words:

In[13]:=
SeedRandom[332];
words = RandomWord["CommonWords", 12]
Out[8]=

Introduce random misspellings:

In[14]:=
misspelled = MapThread[
  RandomChoice[{StringDrop[#1, {#2}], StringReplacePart[#, RandomChoice[CharacterRange["a", "x"]], {#2, #2}]}] &, {words, RandomInteger[{1, StringLength[#]}] & /@ words}]
Out[14]=

Compare the corresponding Soundex codes for both word sets:

In[15]:=
Tally[MapThread[
  ResourceFunction["Soundex"][#1] == ResourceFunction["Soundex"][#2] &, {words, misspelled}]]
Out[15]=

The Soundex codes of the misspelled words are found in the Soundex codes of the spelling correction lists:

In[16]:=
MapThread[
 MemberQ[ResourceFunction["Soundex"] /@ SpellingCorrectionList[#2], ResourceFunction["Soundex"][#1]] &, {words, misspelled}]
Out[16]=

Publisher

Anton Antonov

Version History

  • 1.1.0 – 09 November 2021
  • 1.0.0 – 03 October 2019

Author Notes

Soundex is a 100-year-old algorithm, so there are better, upgraded algorithms based on its core idea.

License Information