Huffman Coding Algorithm

Last updated on July 13th, 2024 at 11:09 am

We will discuss the Huffman Coding Algorithm, pseudo code, Code implementation of Huffman Coding in C, Java, JavaScript, Python, advantages & disadvantages, and time complexity of Huffman Coding.

1. What is Huffman Coding Algorithm?

Huffman Coding is a common technique of encoding, including lossless data compression.

Huffman Coding is based on the frequency of occurrence of a data item (pixel in images). The main aim of Huffman coding is to encode the data that occurs frequently using less number of bits.

Code books store codes that are constructed for images. Both the code book and encoded data are transmitted to decode the information.

Huffman codes can be constructed using the following two ideas:

  1. For an optimum code, symbols that have high probability should be assigned shorter code words.
  2. To make optimum prefix code, two symbols that occur least frequently should be assigned the same length.

2. Pseudocode for Huffman Coding

HUFFMAN(f[1...n])
1. T = empty binary tree 
2. Q = priority queue of pairs (i, f[i]),
   i = 1...n, with f as comparison key 
3. for each k = 1...n − 1 
4. i = extractMin(Q) 
5. j = extractMin(Q) 
6. f[n + k] = f[i] + f[j] 
7. insertNode(T, n + k) with children i, j 
8. insertRear(Q,(n + k, f[n + k])) 
9. return T

3. Code Implementation of Huffman Coding

3.1 Huffman Coding in C

#include <stdio.h>
#include <stdlib.h>

// A Huffman tree node
struct MinHeapNode {
    char data;
    unsigned freq;
    struct MinHeapNode *left, *right;
};

struct MinHeap {
    unsigned size;
    unsigned capacity;
    struct MinHeapNode** array;
};

// Create a new node
struct MinHeapNode* newNode(char data, unsigned freq) {
    struct MinHeapNode* temp = (struct MinHeapNode*)malloc(sizeof(struct MinHeapNode));
    temp->left = temp->right = NULL;
    temp->data = data;
    temp->freq = freq;
    return temp;
}

// Create a MinHeap
struct MinHeap* createMinHeap(unsigned capacity) {
    struct MinHeap* minHeap = (struct MinHeap*)malloc(sizeof(struct MinHeap));
    minHeap->size = 0;
    minHeap->capacity = capacity;
    minHeap->array = (struct MinHeapNode**)malloc(minHeap->capacity * sizeof(struct MinHeapNode*));
    return minHeap;
}

// Swap two nodes
void swapMinHeapNode(struct MinHeapNode** a, struct MinHeapNode** b) {
    struct MinHeapNode* t = *a;
    *a = *b;
    *b = t;
}

// Heapify
void minHeapify(struct MinHeap* minHeap, int idx) {
    int smallest = idx;
    int left = 2 * idx + 1;
    int right = 2 * idx + 2;

    if (left < minHeap->size && minHeap->array[left]->freq < minHeap->array[smallest]->freq)
        smallest = left;

    if (right < minHeap->size && minHeap->array[right]->freq < minHeap->array[smallest]->freq)
        smallest = right;

    if (smallest != idx) {
        swapMinHeapNode(&minHeap->array[smallest], &minHeap->array[idx]);
        minHeapify(minHeap, smallest);
    }
}

// Check if size is one
int isSizeOne(struct MinHeap* minHeap) {
    return (minHeap->size == 1);
}

// Extract minimum value node from heap
struct MinHeapNode* extractMin(struct MinHeap* minHeap) {
    struct MinHeapNode* temp = minHeap->array[0];
    minHeap->array[0] = minHeap->array[minHeap->size - 1];
    --minHeap->size;
    minHeapify(minHeap, 0);
    return temp;
}

// Insert a new node to MinHeap
void insertMinHeap(struct MinHeap* minHeap, struct MinHeapNode* minHeapNode) {
    ++minHeap->size;
    int i = minHeap->size - 1;

    while (i && minHeapNode->freq < minHeap->array[(i - 1) / 2]->freq) {
        minHeap->array[i] = minHeap->array[(i - 1) / 2];
        i = (i - 1) / 2;
    }
    minHeap->array[i] = minHeapNode;
}

// Build MinHeap
void buildMinHeap(struct MinHeap* minHeap) {
    int n = minHeap->size - 1;
    int i;

    for (i = (n - 1) / 2; i >= 0; --i)
        minHeapify(minHeap, i);
}

// Print an array of size n
void printArr(int arr[], int n) {
    int i;
    for (i = 0; i < n; ++i)
        printf("%d", arr[i]);
    printf("\n");
}

// Check if node is leaf
int isLeaf(struct MinHeapNode* root) {
    return !(root->left) && !(root->right);
}

// Create and build min heap
struct MinHeap* createAndBuildMinHeap(char data[], int freq[], int size) {
    struct MinHeap* minHeap = createMinHeap(size);

    for (int i = 0; i < size; ++i)
        minHeap->array[i] = newNode(data[i], freq[i]);

    minHeap->size = size;
    buildMinHeap(minHeap);
    return minHeap;
}

// Build Huffman Tree
struct MinHeapNode* buildHuffmanTree(char data[], int freq[], int size) {
    struct MinHeapNode *left, *right, *top;
    struct MinHeap* minHeap = createAndBuildMinHeap(data, freq, size);

    while (!isSizeOne(minHeap)) {
        left = extractMin(minHeap);
        right = extractMin(minHeap);

        top = newNode('$', left->freq + right->freq);
        top->left = left;
        top->right = right;

        insertMinHeap(minHeap, top);
    }
    return extractMin(minHeap);
}

// Print Huffman Codes
void printCodes(struct MinHeapNode* root, int arr[], int top) {
    if (root->left) {
        arr[top] = 0;
        printCodes(root->left, arr, top + 1);
    }

    if (root->right) {
        arr[top] = 1;
        printCodes(root->right, arr, top + 1);
    }

    if (isLeaf(root)) {
        printf("%c: ", root->data);
        printArr(arr, top);
    }
}

// Build Huffman Tree and print codes
void HuffmanCodes(char data[], int freq[], int size) {
    struct MinHeapNode* root = buildHuffmanTree(data, freq, size);
    int arr[100], top = 0;
    printCodes(root, arr, top);
}

// Driver code
int main() {
    char arr[] = {'a', 'b', 'c', 'd', 'e', 'f'};
    int freq[] = {5, 9, 12, 13, 16, 45};
    int size = sizeof(arr) / sizeof(arr[0]);

    HuffmanCodes(arr, freq, size);
    return 0;
}

3.1.1 Explanation of Huffman Coding

  1. Include Headers: #include <stdio.h>, #include <stdlib.h> for standard input-output and memory allocation.
  2. Struct Definitions: Define MinHeapNode and MinHeap structures for tree nodes and heap management.
  3. New Node Creation: Function to create a new tree node.
  4. MinHeap Creation: Function to create a MinHeap.
  5. Node Swap: Utility function to swap two nodes in the heap.
  6. Heapify: Maintain the heap property.
  7. Check Size One: Utility to check if heap size is one.
  8. Extract Min: Function to extract the minimum value node from the heap.
  9. Insert Node: Insert a new node to the heap.
  10. Build MinHeap: Build a min heap from an unordered array.
  11. Print Array: Utility to print an array.
  12. Check Leaf: Check if a node is a leaf.
  13. Create and Build MinHeap: Create and build a min heap from given data and frequency arrays.
  14. Build Huffman Tree: Construct the Huffman Tree.
  15. Print Huffman Codes: Print codes by traversing the tree.
  16. Generate Huffman Codes: Function to build the tree and print the codes.
  17. Driver Code: Main function to execute the program.

3.2 Huffman Coding in Java

import java.util.PriorityQueue;
import java.util.Comparator;

class Node {
    char data;
    int freq;
    Node left, right;

    Node(char data, int freq) {
        left = right = null;
        this.data = data;
        this.freq = freq;
    }
}

class Huffman {
    public static void printCodes(Node root, String str) {
        if (root == null)
            return;

        if (root.data != '$')
            System.out.println(root.data + ": " + str);

        printCodes(root.left, str + "0");
        printCodes(root.right, str + "1");
    }

    public static void HuffmanCodes(char data[], int freq[], int size) {
        Node left, right, top;

        PriorityQueue<Node> minHeap = new PriorityQueue<>(size, new Comparator<Node>() {
            public int compare(Node l, Node r) {
                return l.freq - r.freq;
            }
        });

        for (int i = 0; i < size; ++i)
            minHeap.add(new Node(data[i], freq[i]));

        while (minHeap.size() != 1) {
            left = minHeap.poll();
            right = minHeap.poll();

            top = new Node('$', left.freq + right.freq);
            top.left = left;
            top.right = right;

            minHeap.add(top);
        }
        printCodes(minHeap.poll(), "");
    }

    public static void main(String[] args) {
        char arr[] = {'a', 'b', 'c', 'd', 'e', 'f'};
        int freq[] = {5, 9, 12, 13, 16, 45};
        int size = arr.length;

        HuffmanCodes(arr, freq, size);
    }
}

3.3 Huffman Coding in JavaScript

class Node {
    constructor(data, freq) {
        this.data = data;
        this.freq = freq;
        this.left = this.right = null;
    }
}

function printCodes(root, str) {
    if (!root) return;

    if (root.data !== '$') console.log(root.data + ": " + str);

    printCodes(root.left, str + "0");
    printCodes(root.right, str + "1");
}

function HuffmanCodes(data, freq, size) {
    let minHeap = [];

    for (let i = 0; i < size; ++i) {
        minHeap.push(new Node(data[i], freq[i]));
    }

    minHeap.sort((a, b) => a.freq - b.freq);

    while (minHeap.length !== 1) {
        let left = minHeap.shift();
        let right = minHeap.shift();

        let top = new Node('$', left.freq + right.freq);
        top.left = left;
        top.right = right;

        minHeap.push(top);
        minHeap.sort((a, b) => a.freq - b.freq);
    }

    printCodes(minHeap[0], "");
}

let arr = ['a', 'b', 'c', 'd', 'e', 'f'];
let freq = [5, 9, 12, 13, 16, 45];
let size = arr.length;

HuffmanCodes(arr, freq, size);

3.4 Huffman Coding in Python

import heapq

class Node:
    def __init__(self, data, freq):
        self.data = data
        self.freq = freq
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.freq < other.freq

def printCodes(root, code):
    if root is None:
        return

    if root.data != '$':
        print(f'{root.data}: {code}')

    printCodes(root.left, code + '0')
    printCodes(root.right, code + '1')

def HuffmanCodes(data, freq, size):
    minHeap = []

    for i in range(size):
        heapq.heappush(minHeap, Node(data[i], freq[i]))

    while len(minHeap) != 1:
        left = heapq.heappop(minHeap)
        right = heapq.heappop(minHeap)

        top = Node('$', left.freq + right.freq)
        top.left = left
        top.right = right

        heapq.heappush(minHeap, top)

    printCodes(heapq.heappop(minHeap), '')

arr = ['a', 'b', 'c', 'd', 'e', 'f']
freq = [5, 9, 12, 13, 16, 45]
size = len(arr)

HuffmanCodes(arr, freq, size)

4. Advantages of Huffman Coding

  1. Optimality: Huffman coding is optimal for a set of symbols with known probabilities.
  2. Lossless Compression: Ensures no loss of data.
  3. Efficiency: Suitable for compressing data with variable-length encoding.

5. Disadvantages of Huffman Coding

  1. Complexity: More complex to implement compared to simpler methods like run-length encoding.
  2. Dependency on Data: Performance depends on the frequency distribution of the data.
  3. Decompression Speed: Decompression can be slower as each bit needs to be checked against the tree.

6. Time Complexity of Huffman Coding

The Overall Time Complexity of Huffman’s algorithm is O(n log n).

FAQs

What is Huffman Coding?

Huffman Coding is a common technique of encoding, including lossless data compression. Huffman Coding is based on the frequency of occurrence of a data item (pixel in images). The main aim of Huffman coding is to encode the data that occurs frequently using less number of bits.

What are the advantages of Huffman Coding?

Optimality: Huffman coding is optimal for a set of symbols with known probabilities.
Lossless Compression: Ensures no loss of data.
Efficiency: Suitable for compressing data with variable-length encoding.

What are the disadvantages of Bubble Sort?

Complexity: More complex to implement compared to simpler methods like run-length encoding.
Dependency on Data: Performance depends on the frequency distribution of the data.
Decompression Speed: Decompression can be slower as each bit needs to be checked against the tree.

Scroll to Top