Huffman Coding Algorithm

In the world of data compression, one of the most efficient and widely used techniques is the Huffman Coding Algorithm.

Invented by David A. Huffman in 1952, it uses a greedy approach to generate prefix codes that minimize the overall size of the encoded data.

What is Huffman Coding?

Huffman Coding is a lossless compression algorithm. It assigns variable-length codes to input characters based on their frequency of occurrence. The basic idea is

Characters that occur more frequently are assigned shorter codes.
Characters that occur less frequently are assigned longer codes.

This reduces the average length of encoded data and hence achieves compression.

Why Use Trees?

The algorithm builds a special type of binary tree, called a Huffman Tree, to represent the codes.

Each leaf node in the tree represents a character, and the path from the root to the leaf forms its binary code.

Steps of the Huffman Algorithm

Step 1: Create a Priority Queue (Min-Heap)

For each character and its frequency Create a node with

char: the character
freq: the frequency
left and right: initially null

Insert the node into a min-heap based on frequency.

Step 2: Build the Huffman Tree

While the heap contains more than one node:
Remove the two nodes with the smallest frequencies.
Create a new internal node:
- freq = freq1 + freq2
- left = node1, right = node2
- char = null (since it's not a leaf)
Insert the new node back into the min-heap.

After the loop, the remaining node in the heap is the root of the Huffman Tree.

Step 3: Generate Huffman Codes

Define a recursive function generateCodes(node, code)
If node is a leaf (i.e., node.char is not null):
Assign the current code as the Huffman code for the character.
Recursively call
- generateCodes(node.left, code + "0")
- generateCodes(node.right, code + "1")
Call generateCodes(root, "") to start traversal from the root with an empty code.

Step 4: Calculate Total Encoded Data Length

Initialize totalBits = 0
For each character
- Get its frequency and code length
- Multiply: encodedBits = frequency × code length
- Add to totalBits

Step 5: Calculate Average Code Length

Compute total frequency: sumFreq = sum of all character frequencies
Compute: averageLength = totalBits ÷ sumFreq

Huffman Coding Calculation

Here’s a step-by-step explanation to calculate the Huffman code, length of the encoded data, and the average code length for the following frequency table: { A = 45, B = 13, C = 12, D = 16, E = 9, F = 5 }

Character	Frequency
A	45
B	13
C	12
D	16
E	9
F	5

Step 1: Build a Min-Heap (Priority Queue)

Start by placing all characters as leaf nodes in a priority queue, sorted by frequency: [F:5, E:9, C:12, B:13, D:16, A:45]

Step 2: Build the Huffman Tree

We repeatedly remove the two nodes with the lowest frequency, combine them into a new node, and reinsert it into the queue.

Iteration 1:

Remove F (5) and E (9) → New node FE = 5 + 9 = 14

Heap: [C:12, B:13, D:16, A:45, FE:14]

Iteration 2:

Remove C (12) and B (13) → New node CB = 12 + 13 = 25

Heap: [D:16, A:45, FE:14, CB:25]

Iteration 3:

Remove FE (14) and D (16) → New node FED = 14 + 16 = 30

Heap: [A:45, CB:25, FED:30]

Iteration 4:

Remove CB (25) and FED (30) → New node CBFED = 25 + 30 = 55

Heap: [A:45, CBFED:55]

Final Iteration:

Remove A (45) and CBFED (55) → New node Root = 45 + 55 = 100

Now, the tree is complete. Let’s move on to generating codes.

Step 3: Assign Huffman Codes (by Tree Traversal)

Assign

0 when going left
1 when going right

Start from the root and trace to each character.

             (100)
            /     \
         A(45)   (55)
                /    \
            (25)      (30)
           /   \      /   \
        C(12) B(13)  (14)  D(16)
                     /  \
                  F(5)   E(9)

Now, trace each character

Character	Code Path	Huffman Code
A	Left	`0`
C	Right → Left → Left	`100`
B	Right → Left → Right	`101`
F	Right → Right → Left → Left	`1100`
E	Right → Right → Left → Right	`1101`
D	Right → Right → Right	`111`

Step 4: Calculate Encoded Data Length

Multiply each character's frequency by its code length.

Character	Frequency	Code	Code Length	Frequency × Code Length
A	45	0	1	45 × 1 = 45
B	13	101	3	13 × 3 = 39
C	12	100	3	12 × 3 = 36
D	16	111	3	16 × 3 = 48
E	9	1101	4	9 × 4 = 36
F	5	1100	4	5 × 4 = 20

Total Encoded Bits Data Length = 45 + 39 + 36 + 48 + 36 + 20 = 224 bits

Step 5: Calculate Average Code Length

Use the formula

Average Code Length (bits per symbol) = (Total Encoded Data Length) / (Total Frequency)
Total Frequency = 45 + 13 + 12 + 16 + 9 + 5 = 100
Average Code Length (bits per symbol) = 224 / 100 = 2.24 bits/character

Final Results:

Huffman Codes

A: 0
B: 101
C: 100
D: 111
E: 1101
F: 1100

Total Encoded Length: 224 bits

Average Code Length: 2.24 bits per character

Numerical Question

1. Given the following characters and their frequencies:

Character	Frequency
A	10
B	15
C	30
D	16
E	29

Tasks

Build the Huffman tree.
Generate the Huffman code for each character.
Calculate the total encoded data length.
Find the average code length.

2. Given these frequencies: {M: 35, N: 30, O: 20, P: 10, Q: 5}

Tasks

Create the Huffman tree step-by-step.
Show how the codes are generated.
Calculate the total bits used to encode the data.
Compute the average bits per symbol.

3. You are given the following symbols with their frequencies:

Symbol	Frequency
W	7
X	3
Y	8
Z	6
V	2
T	10
S	5

Tasks:

Construct the full Huffman tree.
Assign codes to each character.
Find the total encoded data length.
Compare it with fixed-length encoding (3 bits per symbol). How many bits are saved?

Advantages of Huffman Coding

Efficient Compression: Reduces file size by minimizing average code length.

Lossless: Original data can be perfectly reconstructed.

Widely Used: Employed in formats like ZIP, JPEG, MP3, etc.

Time and Space Complexity

Time Complexity: O(n log n) where n is the number of unique characters (due to priority queue operations).

Space Complexity: O(n) for storing the tree and the codes.

The Huffman Algorithm is a classic example of how greedy strategies and tree structures can be combined to solve real-world problems efficiently. It's not just theoretical it powers everyday technologies we use to store and transmit data.

By understanding how Huffman trees are built and how codes are assigned, you gain insights into both compression and algorithmic thinking.

Menu

Huffman Coding Algorithm

Steps of the Huffman Algorithm

Huffman Coding Calculation

Numerical Question

Advantages of Huffman Coding

Time and Space Complexity

Introduction to Algorithm

Recursion Algorithm

Introduction

Recursion

Stacks

Queues

Linked List

Trees

Sorting

Searching

Hashing

Graphs

Algorithm Design Techniques

Huffman Coding Algorithm

Steps of the Huffman Algorithm

Huffman Coding Calculation

Numerical Question

Advantages of Huffman Coding

Time and Space Complexity

Introduction to Algorithm

Recursion Algorithm