Function Repository Resource:

LetterFrequencyData

Source Notebook

Get the frequency of letters appearing in texts

Contributed by: Sander Huisman

ResourceFunction["LetterFrequencyData"][]

gives the frequency of letters appearing in English texts.

ResourceFunction["LetterFrequencyData"][lang]

gives the frequency of letters for the language lang.

ResourceFunction["LetterFrequencyData"][All]

gives the available languages.

Details

For simple ciphers like Caesar and Vigenère ciphers the letter frequencies can be used to crack the encryption.
Data is taken from Wikipedia and converted to lower case with diacritics removed, so that we only consider the letters a–z.
Each frequency is given as a probabilty such that all the values for a certain language add up to 1.

Examples

Basic Examples (1) 

Ask for the frequency of the letters in English texts:

In[1]:=
ResourceFunction["LetterFrequencyData"][]
Out[1]=

Scope (2) 

Ask for the frequency of the letters in Spanish texts:

In[2]:=
ResourceFunction["LetterFrequencyData"]["Spanish"]
Out[2]=

Query the supported languages:

In[3]:=
ResourceFunction["LetterFrequencyData"][All]
Out[3]=

Neat Examples (2) 

Let's compare with a sample text:

In[4]:=
hamlet = ExampleData[{"Text", "Hamlet"}];
hamlet //= RemoveDiacritics/*ToLowerCase/*CharacterCounts;
hamlet //= KeyTake[CharacterRange["a", "z"]];
hamlet /= Total[N@Values[hamlet]];
ListPlot[Transpose@
  Values[Merge[{hamlet, ResourceFunction["LetterFrequencyData"]["English"]}, List]][[
   All, 1]], PlotRange -> All, Frame -> True, FrameTicks -> {{Automatic, Automatic}, {MapThread[{#1, #2} &, {Range[26], CharacterRange["a", "z"]}], None}}, PlotLegends -> SwatchLegend[{"Hamlet", "LetterFrequencyData"}]]
Out[8]=

Find the top 3 guesses for which language Hamlet is written in, just based on the letter frequencies:

In[9]:=
hamlet = ExampleData[{"Text", "Hamlet"}];
hamlet //= RemoveDiacritics/*ToLowerCase/*CharacterCounts;
hamlet //= Association[
    Thread[CharacterRange["a", "z"] -> Lookup[#, CharacterRange["a", "z"], 0]]] &;
hamlet /= Total[N@Values[hamlet]];
TakeLargestBy[{#, Correlation[Values[ResourceFunction["LetterFrequencyData"][#]], Values[hamlet]]} & /@ ResourceFunction["LetterFrequencyData"][All], Last, 3] // Grid
Out[13]=

Publisher

SHuisman

Version History

  • 1.0.0 – 19 April 2022

Source Metadata

License Information