# Wolfram Function Repository

Instant-use add-on functions for the Wolfram Language

Function Repository Resource:

Generate a tree or an association with properties for non-equivalent compact binary Huffman codes

Contributed by:
Shenghui Yang

ResourceFunction["DistinctCompactBinaryHuffmanCode"][ returns an association with properties about | |

ResourceFunction["DistinctCompactBinaryHuffmanCode"][ returns the number of non-equivalent compact binary Huffman trees with | |

ResourceFunction["DistinctCompactBinaryHuffmanCode"][ generates the compact Huffman tree given the proper binary code | |

ResourceFunction["DistinctCompactBinaryHuffmanCode"][ generates the step-by-step construction for the compact Huffman tree. |

The *n*-bit code (0^{r0}10^{r1}10^{r2}10^{r3}… 10^{ru-1}10^{ru}), where 0^{r} represents *r* consecutive zeros, gives an enumeration of non-equivalent Huffman trees with *n*+2 leaves. It contains L 1’s and satisfies the conditions *r*_{0}=0, *r*_{i}>=0, *r*_{0}+*r*_{1}+*r*_{2}+…+*r*_{u-1}+*r*_{u}+L=*n* and *r*_{i+1}<=2·*r*_{i}+1 for all 0<=*i*<=*u*.

The procedure for constructing a binary tree from a given Huffman code is as follows: Start from a simple tree with one root and two leaves. Every time we read 1 in the list, attach a two-leaf simple tree to the right most leaf on the current tree; when we read 0, attach a two-leaf simple tree to the right most leaf on the upper level (one less in TreeDepth).

The binary code associated with a compact Huffman tree is also the encoded solution for the binary Egyption fraction problem or characteristic sum problem.

Code tree equivalence is not exactly the same as isomorphism in graph theory.

The returned association contains the following properties:

"BitCodeLength" | same as the input value n |

"TreeCount" | number of non-equivalent binary Huffman trees with n+2 leaf nodes |

"TreeLeafCount" | n+2 |

"TreeEnumerationBitVector" | all non-equivalent binary Huffman trees in bit vector form |

ResourceFunction["DistinctCompactBinaryHuffmanCode"] uses compiled code on a bit vector data structure for both memory and time efficiency because of the rapid growth of the data. This function may take some time for initial evaluation due to compilation, but subsequent evaluation is faster than the non-compiled implementation in the same session.

Enumerate all 16 proper six-bit binary codes corresponding to non-equivalent compact Huffman trees with 8 leaves in a "BitVector" data structure:

In[1]:= |

Out[1]= |

List all 16 six-bit binary codes that satisfy the proper conditions given enumeration data in bit vector:

In[2]:= |

Out[3]= |

Find the compact Huffman tree for the proper 8-bit code {1, 1, 0, 1, 0, 0, 0, 1}:

In[4]:= |

Out[4]= |

Demonstrate the process for constructing the tree corresponding to the code {1, 1, 0, 1, 0, 0, 0, 1} as described in Details & Options above:

In[5]:= |

Out[5]= |

DistinctCompactBinaryHuffmanCode provides a way to generate random trees with certain constraints. For instance, for *n*=22, instead of generating all 175586 24-leaf distinct compact Huffman trees at once, we randomly generate an index first and access the corresponding binary code from the bit vector:

In[6]:= |

Out[6]= |

In[7]:= |

In[8]:= |

Out[9]= |

In[10]:= |

Out[10]= |

There is one-to-one correspondence between proper *n*-bit binary codes and (*n*+2)-leaf non-equivalent compact Huffman trees. For example when n=6:

In[11]:= |

Out[14]= |

All trees above have 8=6+2 leaves:

In[15]:= |

Out[15]= |

All trees above have one depth greater than the number of ones in the binary code:

In[16]:= |

Out[16]= |

The number of *n*-bit proper codes and the corresponding trees grow rapidly with bit length *n*. For instance a 60-bit code has more than 7.7 trillion candidates. Therefore use "CountOnly" to bypass the enumeration function and avoid binary code generation:

In[17]:= |

Out[17]= |

DistinctCompactBinaryHuffmanCode returns unevaluated for large bit-code length:

In[18]:= |

Out[18]= |

Only binary code is supported:

In[19]:= |

Out[19]= |

The following example does not meet the proper conditions described above in Details & Options. Reading from left to right, we have a run of one zero and a next run of 6 zeros. Since 6 > 2·1+1, this binary code fails to correspond to a compact Huffman tree:

In[20]:= |

Out[20]= |

The concept of equivalence for compact Huffman trees is not the same as graph isomorphism. The following two trees are considered equivalent Huffman trees (Even and Lempel, 1972), but obviously they are not isomorphic in graph-theoretical sense:

In[21]:= |

Out[22]= |

In[23]:= |

Out[23]= |

Plot a large graph from the 12-dimensional proper codes. Points are connected if their Hamming distances meet a predefined value:

In[24]:= |

In[25]:= |

If the target Hamming distance is 1:

In[26]:= |

Out[26]= |

If the target Hamming distance is 2:

In[27]:= |

Out[27]= |

Visualize the compact codes with ArrayPlot:

In[28]:= |

In[29]:= |

Out[29]= |

- Huffman coding on Wikipedia
- OEIS A002572
- Sequences From Bona's Handbook of Enumerative Combinatorics

Wolfram Language 13.0 (December 2021) or above

- 1.0.0 – 21 February 2024

This work is licensed under a Creative Commons Attribution 4.0 International License