ABSTRACT

Huffman coding is a variable-length, prefix-free encoding scheme. It achieves data compression by assigning shorter bitstrings to more frequent symbols and longer bitstrings to less frequent ones. It is mathematically proven to be an optimal character-by-character encoding method.


The Core Concept

Unlike Fixed Length CBC, where every character is taxed the same amount of bits, Huffman coding is adaptive. It minimizes the total weighted path length of a binary tree, where weights represent character frequencies.

Key Properties

  • Prefix-Free: No codeword is a prefix of any other codeword (e.g., if ‘A’ is 01, no other character can start with 01). This makes the code comma-free or self-delimiting.
  • Greedy Approach: The algorithm builds the tree from the “bottom up” by repeatedly merging the two least frequent nodes.


Building a Huffman Tree: Step-by-Step

1. Initialize and Sort

List all characters and their frequencies. Sort them in decreasing order. If frequencies are tied, use alphabetical order as a tie-breaker.

2. The Merge Loop

Repeat the following until only one tree (the root) remains:

  1. Take the two nodes with the lowest frequencies.
  2. Combine them into a new internal node.
  3. The frequency of the new node is the sum of the two children.
  4. Re-insert this new node into the list and re-sort.

3. Decorate the Tree

Assign a binary value to each edge:

  • Left branch = 0
  • Right branch = 1


Practical Example Analysis

Given the dataset:

CharacterFrequency
A6
B5
C4
D4
E2
F2
G1

Resulting Codewords

LetterFrequencyHuffman CodeLength (bits)
A6012
B5102
C40003
D40013
E21113
F211004
G111014

Efficiency Comparison: Fixed vs. Variable

For a message containing the 24 characters above:

1. Fixed-Length Analysis

  • Alphabet Size: 7 unique characters.
  • Bits per Char: bits.
  • Total Size: .

2. Huffman Analysis

The size is calculated by

Total Savings: compression over fixed-length.


Decoding Logic

Decoding is efficient because of the Prefix-Free property.

  1. Start at the root.
  2. Read the bitstream:
    • If 0, move Left.
    • If 1, move Right.
  3. When you reach a leaf, output that character.
  4. Immediately jump back to the root for the very next bit.