Function Repository Resource:

HuffmanTree

Source Notebook

Find a Huffman tree encoding from an input string

Contributed by: Shenghui Yang

ResourceFunction["HuffmanTree"]["string"]

gives an optimal binary Huffman tree encoding for a string.

Details and Options

Huffman encoding is featured in A New Kind of Science, page 1071.
HuffmanTree returns a Tree object by recursive construction.
HuffmanTree uses the resource function HuffmanEncode internally.
HuffmanTree use the convention descibed in the Wolfram MathWorld article for Huffman coding.
HuffmanTree accepts the same options as Tree.
Distinct strings can lead to the same tree. The encoding is not reversible.

Examples

Basic Examples (1) 

Create the Huffman tree from an input string:

In[1]:=
ResourceFunction["HuffmanTree"]["BCCABBDDAECCBBAEDDCC"]
Out[1]=

Scope (1) 

Create a case-insensitive Huffman tree:

In[2]:=
ResourceFunction["HuffmanTree"][ToUpperCase["Mississippi_River"]]
Out[2]=

Options (1) 

HuffmanTree accepts all options of Tree:

In[3]:=
ResourceFunction["HuffmanTree"]["BCCABBDDAECCBBAEDDCC", TreeElementLabelStyle -> (All -> Directive[Blue, Large])]
Out[3]=

Properties and Relations (2) 

The Huffman tree is the graph representation for the Huffman code words for the input string:

In[4]:=
ResourceFunction["HuffmanEncode"]["BCCABBDDAECCBBAEDDCC"]
Out[4]=

For instance the binary code word for "A" is 101 from above and it is retrieved in this way from the Huffman tree:

In[5]:=
With[{myTree = ResourceFunction["HuffmanTree"]["BCCABBDDAECCBBAEDDCC"]}, Sequence @@@ (TreePosition[myTree, "A"][[1]])]
Out[5]=

Possible Issues (4) 

The input cannot be an empty string:

In[6]:=
ResourceFunction["HuffmanTree"][""]
Out[6]=

Single character or a string with all same character are not supported:

In[7]:=
ResourceFunction["HuffmanTree"]["a"]
Out[7]=
In[8]:=
ResourceFunction["HuffmanTree"]["aaa"]
Out[8]=

Two or more types of characters are valid:

In[9]:=
{ResourceFunction["HuffmanTree"]["ab"], ResourceFunction["HuffmanTree"]["abcc"]}
Out[9]=

The Huffman tree might not be unique for the same input if the data has the same sum of weights for some internal nodes and different conventions for left and right branches. Here, the two trees above shares the same depth information for each character:

In[10]:=
{ResourceFunction["HuffmanTree"]["BCCABBDDAECCBBAEDDCC"],
 Tree[
  <|0 -> Tree[<|0 -> Tree[<|0 -> "E", 1 -> "A"|>], 1 -> "D"|>], 1 -> Tree[<|0 -> "B", 1 -> "C"|>]|>]}
Out[10]=

Different strings can lead to the same tree:

In[11]:=
Row@{ResourceFunction["HuffmanTree"]["ab"], ResourceFunction["HuffmanTree"]["ababababab"]}
Out[11]=

Neat Examples (4) 

Huffman tree is an efficient data structure to represent mood status of an emotional cat. Use emoji to represent emotions:

In[12]:=
cr = CharacterRange["😸", "🙀"];
Style[#, 20] & /@ cr
Out[13]=

Create a long string to track the status for 50000 data points:

In[14]:=
SeedRandom[1234];
weight = RandomSample[Range[9]];
str = StringJoin@RandomChoice[weight -> cr, 50000];

The cat's mood from first 15 observations:

In[15]:=
Style[#, 20] & /@ StringPart[str, 1 ;; 15]
Out[15]=

Generate a tree based on the optimal binary Huffman encoding for all observations:

In[16]:=
Tree[ResourceFunction["HuffmanTree"][str], TreeElementLabelStyle -> All -> 32]
Out[16]=

Publisher

Shenghui Yang

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

  • 1.0.0 – 31 January 2024

Source Metadata

Related Resources

License Information