“This is the tenth day of my participation in the First Challenge 2022. For more details: First Challenge 2022”

Huffman Coding, also known as Huffman Coding, is a Coding mode. Huffman Coding is a type of variable word length Coding (VLC). In 1952, Huffman proposed a coding method, which constructed the code word with the shortest average length of different prefix completely according to the probability of character occurrence, sometimes called the optimal coding, and generally called Huffman coding (sometimes also called Huffman coding).

This is an efficient coding method used to compress information during information storage and transmission.

The computer recognizes only 0 (low level) and 1 (high level). Simply put, converting information that humans can understand into binary forms that computers can understand is called coding. Like ASCII.

Isometric encoding: Binary encoding of equal length for each character, easy to design and easy to read and write. However, the storage space of computer and the broadband of network transmission are limited. The biggest disadvantage of iso-length coding is that the coding result is too long, which will occupy too many resources.

If there are only six characters A, B, C, D, E and F in A piece of information, we can design the binary code with length of 3 for each character if we use equal length encoding:

 

Given a piece of information “ABEFCDAED”, it can be encoded in binary “000 001 100 101 010 011 000 100 011” with a total length of 27.

 

But is this optimal design? What if we made different characters correspond to different encoding lengths? Such as:

Given the message “ABEFCDAED”, it can be encoded in binary “0 00 10 11 01 10 10 1”, with a total length of only 14.

Such A code design can greatly shorten the total length, but such A design will bring ambiguity, for example, A code is 0, B code is 00, then 000 May stand for AB or AAA, all such variable length code can not be arbitrarily designed.

Huffman Coding accomplishes two important goals:

1. Any character encoding is not a prefix of any other character encoding.

2. Minimum total length of information encoding.

Build Huffman trees

Given character set {a, B, C, D, e, F}, if the frequency of occurrence of each character is {6, 3, 8, 2, 10, 4}, then the Huffman encoding of each character in the corresponding character set may be:

1. Find the smallest two numbers (2 and 3), put them in the tree (small left and large right), each combination of a parent node (2+3=5)

2. Pick two smallest numbers — 4 and 6, because 4 is less than 5 and 6 is more than 5.

3. Continue to select the smallest two numbers 6 and 8, because 6 is less than 9, 8 is less than 9, so 6 and 8 combine themselves (small left and large right).

4. Extract the last 10,10 and the combination of the two child root nodes (9<14, so the combination of 9 and 9) (small left and large right)

5. Add 14 subtrees (small left and large right) so put left

6. Fill in the corresponding characters. From the root node, the characters are 0 to the left and 1 to the right. The path to the corresponding character is the Huffman encoding of that character (0 left, 1 right)

character Huffman coding
A 00
B 1011
C 01
D 1010
E 11
F 100

The resulting code has no ambiguity from the prefix problem. Since each character corresponds to the leaves of the Huffman tree, there is no relationship between the paths from the root to these leaves, and the resulting binary codes are not prefixes to each other.

The resulting code ensures a minimum overall length. The important property of Huffman trees is that the sum of all the leaves is the smallest. In the scenario of information encoding, the weight of leaf node corresponds to the frequency of character occurrence, and the path length of node corresponds to the encoding length of character. The minimum sum of all characters (frequency X encoding length) naturally means that the total encoding length is minimum.

application

  • Data compression
  • The compressed file
  • Image coding processing
  • Shorten the length of telegrams
  • Improve query efficiency
  • .

Huffman code is constructed using code

class Node {
  constructor(left, right, data) {
    this.left = left;
    this.right = right;
    this.data = data; }}class Huffman {
  constructor(str) {
    // A string to encode
    this.str = str;
    // Key and frequency mapping table
    this.keyCountMap = null;
    // Encode and key mapping table
    this.codeKeyMap = {};
    // Key and encoding mapping table
    this.keyCodeMap = {};
    // Huffman tree node list
    this.nodeList = null;
    // Huffman root node
    this.root = null;
    // The 01 sequence is encoded by Huffman
    this.code = null;
  }
  cal() {
    str = this.str;
    let map = {};
    let i = 0;
    while (str[i]) {
      map[str[i]] ? map[str[i]]++ : (map[str[i]] = 1);
      i++;
    }
    this.keyCountMap = map;
  }
  sort() {
    let map = this.keyCountMap;
    let result = [];
    for (let key in map) {
      if (map.hasOwnProperty(key)) {
        let obj = {
          key: key,
          val: map[key]
        };
        result.push(new Node(null.null, obj)); }}this.nodeList = result.sort(function (x, y) { return x.data.val > y.data.val });
  }
  makeTree() {
    let i = 0;
    let parentNode;
    let table = this.nodeList;
    while (table.length > 1) {
      parentNode = new Node(table[i], table[i + 1] and {key: null.val: table[i]['data'].val + table[i + 1] ['data'].val });
      table.splice(i, 2);
      table.unshift(parentNode);
      table.sort(function (x, y) { return x.data.val > y.data.val });
    }
    this.root = table[0] | |new Node();
    return this.root;
  }
  traversal(tree, code) {
    if(tree.left ! = =null) {
      this.traversal.call(this, tree.left, code + '0');
    } else {
      this.keyCodeMap[tree.data.key] = code;
    }
    if(tree.right ! = =null) {
      this.traversal.call(this, tree.right, code + '1');
    } else {
      this.keyCodeMap[tree.data.key] = code; }}encode() {
    this.cal();
    this.sort();
    let root = this.makeTree();
    this.traversal(root, ' ');
    let ret = this.keyCodeMap;
    let i = 0;
    let result = ' ';
    let str = this.str;
    while (str[i]) {
      result += ret[str[i++]];
    }
    this.code = result;
    console.log('encode:' + result);
    return result
  }
  reverseMap() {
    let ret = this.keyCodeMap;
    let result = {};
    for (let key in ret) {
      if(ret.hasOwnProperty(key)) { result[ret[key]] = key; }}this.codeKeyMap = result;
    return result;
  }
  decode() {
    let i = 0;
    let result = ' ';
    let data = ' ';
    let map = this.reverseMap();
    let str = this.code;
    while (str) {
      result += str[i++];
      if (result in map) {
        data += map[result];
        str = str.replace(new RegExp("^" + result), ' ');
        result = ' ';
        i = 0; }}console.log("decode:" + data)
  }
  encodeBase64() {
    try {
      let base64 = btoa(this.code);
      return base64;
    } catch (e) {
      return ' '; }}}let str = 'ew qew qd ef 24 gf ewr getElementsByTagName';
let huffman = new Huffman(str)
huffman.encode(str)
huffman.decode();
huffman.encodeBase64()
Copy the code

Reference article:

Baidu encyclopedia

Huffman coding

Huffman coding details – the diagram can really see the second to understand