Given N weights as N leaf nodes, a binary Tree is constructed. If the weight path length of the Tree reaches the minimum, such a binary Tree is called an optimal binary Tree, also known as Huffman Tree. Huffman tree is a tree with the shortest weight path length, and the node with higher weight is closer to the root.

It’s a little foggy… Here’s an example: Let’s say there’s a string of letters ACEFBBABFEAAAABCDEFGA… First count the frequency of each character! (can also be probability) // weights for example: frequency table

character frequency
A 60
B 45
C 13
D 69
E 14
F 5
G 3

Step 1: Find the two smallest characters, the small one on the left and the large one on the right, and form a binary tree. Delete the two numbers found this time in the frequency table, and add the frequency sum of the two smallest numbers.

F and G are minimal, so as shown, remove F and G from the string frequency count and return the sum of G and F to the frequency table

Repeat step 1


character frequency
A 60
B 45
C 13
D 69
E 14
FG 8

The smallest ones are FG: 8 and C: 13, as shown, and return the sum of FGC: 21 to the frequency table.

Repeat step 1: figure 1

character coding
A 10
B 01
C 0011
D 11
E 000
F 00101
G 00100

The weighted path length of the Huffman tree is: 60*2+45*2+13*4+69*2+14*3+5*5+3*5 = 482 so when I want to transmit ABC, the code is 10 01 0011

contrast

  • No Huffman encoding: Theoretical string size (assuming HTTP is transmitted in ASCII and each letter takes up one byte) : Transfer size = string length *1 byte = 209*1 byte = 209 bytes
  • Huffman encoding: Transmission size = Information of Huffman tree (frequency table) + weighted path length /8 ≈7*3 + 482/8 ≈21 + 61 = 82 bytes
character frequency
A 60
B 45
C 13
D 69
E 14
F 5
G 3

So 7*3 is just an assumption, we haven’t studied it, but we’re assuming one byte per character, two bytes for the frequency of storage; 482 needs to be divided by 8 because it needs to be divided by 8 to convert 48 bits into bytes and you can see that using Huffman encoding saves a lot of space, even though it requires additional transmission of the Huffman tree

In fact, the front end does not need to manually encode the data into Huffman and then transfer it to HTTP. Gzip compression (LZ77 algorithm and Huffman coding compression principle) has helped us to do this. The data compressed by GZIP saves the transmission size greatly. That’s why it’s so important to turn on Gzip.

Application of Huffman coding in front end

No! Huffman coding is generally used in the field of communication, as well as the field of image compression. The front-end usually doesn’t need to encode data intentionally with Huffman, because someone else has done it for us, such as JPEG images,gzip