Huffman tree

Huffman tree is a special binary tree used in Huffman coding invented by American computer scientist David Albert Huffman in 1952. In honor of his achievement, it was called the Huffman tree, and his method of coding was called Huffman coding.

Branches from one node to another in a binary tree form a path between two nodes. The number of branches in a path is called the path length. The path length of a tree is the sum of the paths from the root to each node.

In a weighted node, the weighted path length of a node is the product of the length of the path from the node to the root and the weights on the node. The weighted path length of a tree is the sum of the weighted path lengths of all leaf nodes in the tree. The binary tree with the minimum weight path length is called Huffman tree, also known as optimal binary tree.

Building a Hoffman tree

1. First, arrange the leaf nodes with power values into an ordered sequence in ascending order. 2. Take the first two nodes with minimum weights as the two child nodes of a new node (T1). Note that the smaller node is the left node and the larger node is the right node. 3. Add the sum of T1 2 child nodes to the remaining ordered sequence, and build new nodes according to the rules of the second step. 4. Repeat step 3 until all nodes are connected. This tree is the Huffman tree.

Huffman code

Construct a Huffman tree according to the weights of the character set to be encoded. If the left branch of Huffman tree represents 0 and the right branch represents 1, then the sequence of 0 and 1 formed by the path branches from the root node to the leaf node is the encoding of the characters corresponding to this node, which is the Huffman encoding.

For example, now there is a piece of content “ACBDEDABDA” that needs to be transmitted. Suppose A letter has three bits, A=000,B=001,C=010,D=011,E=100. So this content encoding is 000010001011100011000001011000 (30).

Assume that the weights of ABCDE are 32,20,10,30,8. Then A=11,B=01,C=001,D=10,E=000 after constructing Huffman tree. Huffman tree coding content is 1100101100001011011011 (22 bits).

Built Huffman tree

According to the hofmann tree transformation code

If the encoding is not 0 or 1, it is very easy to confuse the encoding with different lengths. Therefore, if the encoding of any character is not the prefix of the encoding of another character, this kind of encoding is called prefix encoding.

In decoding, the Huffman tree is used again, meaning that the sender and receiver must agree on the same Huffman coding rules.

conclusion

Huffman tree is a binary tree constructed according to weights, which is also called optimal binary tree.

Its main function is to encode, but also can be used to compress data. It can also be used to encode and transmit to a third party to improve security and save network resources. Then, it can decode according to the Hoffman tree agreed by both parties, which can achieve lossless coding and error-free decoding.

PS: Clear mountains and clear waters begin with dust, knowledge is more valuable than diligence. I got wine. You got a story? Wechat official account: “Clean the dust chat”. Welcome to chat and talk about code.