This is the sixth day of my participation in the August More text Challenge. For details, see:August is more challenging

Huffman tree

Basic terminology

  • Path and path length
    • Path: A path in a tree between children or descendants accessible from one node down.
    • Node path length: The number of branches on the path from one node to another.
  • Node weight and weighted path length
    • Node weight: Assigning a value to a node in a tree that has some meaning.
    • Node weighted path length: the product of the length of the path from the root to the node and the weight of the node.
  • The weighted path length of the tree
    • The sum of the weighted path lengths of all leaves in the tree.
  • Huffman Tree
    • The binary tree with the least weighted path length is a Huffman tree.

Among all binary trees with n leaves and the same weight, there must be a tree whose weighted path length is the minimum value, which is called the “optimal binary tree”.

Construct the Huffman tree

Basic idea: make the node with big power close to the root

  • According to the given n weights {w1, w2… , wn}, construct the set of n binary trees F = {T1, T2… Tn}, where each binary tree contains only one root node with weight value wi, and its left and right subtrees are empty trees;
  • In F, two binary trees with the smallest weight of the root node are selected as the left,

The right subtree constructs a new binary tree, and the weight of the new binary root node is set as the sum of the weight of its left and right subtree roots.

  • Delete these two trees from F, and add

A newly generated tree;

  • Repeat these steps until F contains only one tree.

Realization of Huffman construction algorithm

A Huffman tree with n leaves has 2n-1 nodes

  • Adopt sequential storage structure – one dimensional structure array
  • Node type definition
    typedef struct {
    	ElemType elem;  / / node values
    	int weight;  / / has a weight
    	int parent, lch, rch;
    }HTNode, *HuffmanTree;
    Copy the code
  • Tectonic HuffmanTree
    • Enter the initial n leaves: set the weight value of HT[1..n]
    • Make the following n-1 merges to produce HT[I], I =n+1.. 2 n – 1:
      • Select HT[s1] and HT[s2] (parent = 0) from HT[1..i-1] (parent = 0)
      • Change the parent value of HT[s1] and HT[s2] to parent= I
      • HT[I] : weight=HT[s1]. Weight + HT[s2]. Weight, LCH =s1, RCH =s2
    Status CreatHuffmanTree(HuffmanTree HT, int n){
    	if (n <= 1)  // The number of nodes is invalid
    		return ERROR;
    	int m = 2 * n - 1;
    	int i;
    	int s1, s2;
    	HT = new HTNode[m + 1];  // Element 0 is not used, HT[m] represents the root node
    	for (i = 1; i <= m; ++i) { HT[i].lch =0;
    		HT[i].rch = 0;
    		HT[i].parent = 0;
    	}
    	for (i = 1; i <= n; ++i)// Enter weights
    		cin >> HT[i].weight;
    	for (i = n + 1; i <= m; ++i) {// Construct the Huffman tree
    		Select(HT, i - 1, &s1, &s2);  // In HT[k](1≤ K ≤i-1), select two nodes whose parent domain is 0 and whose weight is the smallest, and return their serial numbers S1 and S2 in HT
    		if(s1 ! =0&& s2 ! =0) {
    			HT[s1].parent = i;
    			HT[s2].parent = i;  // delete s1,s2 from F
    			HT[i].lch = s1;
    			HT[i].rch = s2;  // S1 and S2 are the children of I respectively
    			HT[i].weight = HT[s1].weight + HT[s2].weight;  // The weight of I is the sum of the weights of the left and right children}}return OK;
    }
    Copy the code

The application of Huffman trees

Huffman code

  • Algorithm implementation
    Status CreatHuffmanCode(HuffmanTree HT, HuffmanCode& HC, int n) {
    	if (n <= 1)
    		return ERROR;
    	int start, i;
    	int f = 0, c;
    	HC = new char* [n + 1];
    	char* cd = new char[n];
    	cd[n - 1] = '0';
    	for (i = 1; i <= n; ++i) {
    		while(f ! =0) {
    			// Start at the leaf and work backwards to the root
    			start = n - 1;
    			c = i;
    			f = HT[i].parent;
    			if (HT[f].lch == c) cd[start] = '0';
    			else cd[start] = '1';
    			c = f;
    			f = HT[f].parent;
    		}
    		HC[i] = new char[n - start];  // Encode the array
    		strcpy(HC[i], &cd[start]);
    	}
    	delete cd;
    	cd = NULL;
    	return OK;
    }
    Copy the code
  • Important conclusions
    • Huffman codes are unequal length codes
    • Huffman encoding is prefix encoding, that is, the encoding of any character is not the prefix of another character encoding
    • There is no node of degree 1 in Huffman code tree. If the number of leaf nodes is n, the total number of nodes in Huffman coding tree is 2n-1
    • Sending process: Send character data according to the encoding table obtained by Huffman tree
    • Receiving process: according to the provisions of left 0, right 1, from the root node to a leaf node, complete the decoding of a character. Repeat this process until the data is received