background

The coding principle of Base64 is more explained online, but the decoding principle is less explained, and the internal implementation principle is not analyzed. Want to thoroughly understand the base64 encoding and decoding principle, please patiently read this article, you will gain.

Coding principle

Bit concept: In a binary number system, each 0 or 1 is a bit, which is the smallest unit of data storage. Eight bits are called a Byte.

Base64 encoding splits a string into four 6-bit subsequences of every three 8-bit byte subsequences (6-bit valid bytes, the leftmost two are always zeros, which is also 8-bit bytes, which is why the encoded size is 4/3 times larger). If the number of bytes is less than 4, use ‘=’ to complete. Why base64? Because the binary number of six bits is 2 to the sixth power (000000-111111), which is 64. The index table of Base64 selects 64 printable characters “A-z, A-z, 0-9, +, /” to represent the 64 binary numbers (00000000-00111111). namely

    let base64EncodeChars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
Copy the code

Let’s look at base64 encoding using the nice string as an example

The text n i c e
AscII code 110 105 99 101
binary 01101110 01101001 01100011 01100101
Binary (6 -) 011011 100110 100101 100011 011001 010000
Decimal (Base64 encoded index) 27 38 37 35 25 16
Base64 character b m l j Z Q

By short division, a decimal number can be continuously divided by 2 to get multiple remainders. Finally, the remainder is permutations from bottom to top to get the binary number. Let’s take the ascII code 110 corresponding to the character N as an example

110/2 = 55... 0 55/2 = 27... 1 27/2 = 13... 1 13/2 = 6... 1 6/2 = 3... 0 3/2 = 1... 1 1/2 = 0... 1Copy the code

That is, the 110-to-binary ascII code corresponding to character N is 1101110. Since a byte corresponds to 8 bits, 0 needs to be added to complement 8 bits to get 01101110. The other characters are equally valid.

The sum is expanded by weight, 8-bit binary numbers from right to left, increasing in order from 0 to 7, and summing from left to right, resulting in the corresponding decimal number

(01101110) = 0, 2 * 2 * 2 ^ ^ 0 + 1 1 + 1 + 1 * 2 * 2 ^ 2 ^ 3 + 0 * 2 ^ 4 + 1 + 1 * 2 ^ 5 * 2 * 2 ^ ^ 6 + 0. 7Copy the code

The corresponding ascII encoding of the string nice is base64 string bit bmljZQ obtained after converting every 3 8-bit binary numbers to 4 6-bit binary numbers. If less than 4 bits need to be completed with =, that is, bmljZQ==. Let’s implement the above conversion through code logic analysis.

Train of thought

Analyze the mapping: ABC -> xyzi

  • X: the first six digits of a => A (1-6)
  • Y: the last two digits of a + the first four digits of B => a(7-8)b(1-4)
  • Z: last four digits of b + first two digits of C => b(5-8) C (1-2)
  • I: last six digits of C => C (3-8)
  1. Converts the ascII encoding of a character to an 8-bit binary number
  2. Do the following for every three 8-bit binary numbers
    1. Shift the first number right by 2 bits to get the first 6-bit significant binary number
    2. Shift the first number & 0x3 by 4 bits to the left to get the first and second significant bits of the second 6-bit binary number, shift the second number & 0xf0 by 4 bits to the right to get the last four significant bits of the second 6-bit significant bit binary number, and get the second 6-bit significant bit binary number
    3. Shift the second number & 0xf by 2 bits to the left to get the first four significant bits of the third 6-bit significant binary number, shift the third number & 0xC0 by 6 bits to the right to get the last two significant bits of the third 6-bit significant binary number, and take and get the third 6-bit significant binary number
    4. Add the third number & 0x3f to get the fourth 6-bit significant binary number
  3. Convert the obtained 6-bit valid binary number to decimal and look for the base64 characters

Left shift: <<, signed shift operation when left shift operation, the binary code of the operand is moved to the left by a specified number of bits, the space after the left shift is filled with 0.

Right shift: >>, signed shift operation

The right shift operation is to move the binary code of the operand to the right by a specified number of digits. The space after the right shift is replaced by a sign bit. If it is positive, it is replaced by a 0, and if it is negative, it is replaced by a 1.

Code implementation

// Input string let STR = 'hao' // base64 string let base64EncodeChars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 + / / / define input and output bytes of the binary number let, char1, char2, char3, out1, out2, out3, out4, Char1 = str.charcodeat (0) &0xff // 104 01101000 char2 = str.charcodeat (1) &0xff // 97 01100001 char3 = str.charcodeat (2) &0xFF // 111 01101111 // Output the valid binary number of 6 bytes 6out1 = char1 >> 2 // 26 011010 out2 = (char1) & 0x3) << 4 | (char2 & 0xf0) >> 4 // 6 000110 out3 = (char2 & 0xf) << 2 | (char3 & 0xc0) >> 6 // 5 000101 out4 = char3 &  0x3f // 47 101111 out = base64EncodeChars[out1] + base64EncodeChars[out2] + base64EncodeChars[out3] + base64EncodeChars[out4] // aGFvCopy the code

Algorithm analysis

  1. out1: char1 >> 2
    01101000 - > 00011010Copy the code
  2. out2 = (char1 & 0x3) << 4 | (char2 & 0xf0) >> 4
    // And operation 01101000 01100001 00000011 11110000 -------- -------- 00000000 01100000 // After the shift operation 00000000 00000110 // or operation 00000000-00000110-00000110Copy the code

Same with the third character and the fourth character

Tidy up the above code to extend the multi-character string

// Enter string let STR = 'haohaohao' // base64 string let base64EncodeChars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 + / / / get the length of the string let len = STR. Length / / let the current character index index = Let out = "while(index < len) {let char1, char2, char3, out1, out2, out3, Char1 = str.charcodeat (index++) & 0xff // 104 01101000 char2 = str.charcodeat (index++) & 0xff // 97 01100001 char3 = str.charcodeat (index++) & 0xff // 111 01101111 // outputs the valid binary number of 6 bytes out1 = char1 >> 2 // 26 011010 out2 = (char1 & 0x3) << 4 | (char2 & 0xf0) >> 4 // 6 000110 out3 = (char2 & 0xf) << 2 | (char3 & 0xc0) >> 6 // 5 000101 out4 = char3 & 0x3f // 47 101111 out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + base64EncodeChars[out3] + base64EncodeChars[out4] // aGFv }Copy the code

Special handling is required when the original string length is not an integer multiple of 3

. char1 = str.charCodeAt(index++) & 0xff // 104 01101000 if (index == len) { out2 = (char1 & 0x3) << 4 out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + '==' return out } char2 = str.charCodeAt(index++) & 0xff // 97 01100001 if (index == len) { out1 = char1 >> 2 // 26 011010 out2 = (char1 & 0x3) << 4 | (char2 & 0xf0) >> 4 // 6 000110 out3 = (char2 & 0xf) << 2 out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + base64EncodeChars[out3] + '=' return out } ...Copy the code

All the code

Function base64Encode(STR) {// base64 let base64EncodeChars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 + / / / get the length of the string let len = STR. Length / / let the current character index index = Let out = "while(index < len) {let char1, char2, char3, out1, out2, out3, Char1 = STR. CharCodeAt (Index ++) & 0xFF out1 = char1 >> 2 if (index == len) {out2 = (char1)  & 0x3) << 4 out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + '==' return out } char2 = str.charCodeAt(index++) & 0xff out2 = (char1 & 0x3) << 4 | (char2 & 0xf0) >> 4 if (index == len) { out3 = (char2 & 0xf) << 2 out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + base64EncodeChars[out3] + '=' return out } char3 = STR. CharCodeAt (index++) & 0 XFF / / output binary number six effective bytes out3 = (char2 & 0 xf) < < 2 | (char3 & 0 xc0) > > 6 out4 = char3 & x3f out = 0  out + base64EncodeChars[out1] + base64EncodeChars[out2] + base64EncodeChars[out3] + base64EncodeChars[out4] } return out } base64Encode('haohao') // aGFvaGFv base64Encode('haoha') // aGFvaGE= base64Encode('haoh') // aGFvaA==Copy the code

The decoding principle

By backward derivation, the binary number of every 4 6-bit significant bits is merged into 3 8-bit binary numbers, which are mapped to corresponding characters according to THE ascII encoding and then concatenated strings

Train of thought

Analyze the mapping

  • A: The first two digits of x + y
  • B: last four digits of y + first four digits of Z
  • C: z + I
  1. Converts the index of a character’s Base64 character set to a 6-bit valid binary number
  2. Do the following for every four 6-bit significant binary numbers
    1. The first binary is shifted 2 bits to the left to get the first 6 bits of the new binary, and the second binary is shifted 4 bits to the right after &0x30 to get the first new binary
    2. The second binary number & 0xf is left shifted by 4 bits, and the third binary number & 0x3c is right shifted by 2 bits, and the or set yields the second new binary number
    3. The second binary number &0x3 is shifted 6 bits to the left, and the fourth binary number is taken or set to get the second new binary number
  3. Concatenate string after mapping to corresponding character according to ascII encoding

Code implementation

// base64 character set let STR = 'aGFv 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 + /'. The split (') / / retrieve index value, char1 = base64CharsArr.findIndex(char => char==str[0]) & 0xff // 26 011010 let char2 = base64CharsArr.findIndex(char => char==str[1]) & 0xff // 6 000110 let char3 = base64CharsArr.findIndex(char => char==str[2]) & 0xff // 5 000101 let char4  = base64CharsArr.findIndex(char => char==str[3]) & 0xff // 47 101111 let out1, out2, out3, The out / / bit operations out1 =, char1 < < 2 | (char2 & 0 x30) > > 4 out2 = (char2 & 0 xf) < < 4 | (char3 & 0 x3c) > > 2 out3 = (char3 & 0 x3) << 6 | char4 console.log(out1, out2, out3) out = String.fromCharCode(out1) + String.fromCharCode(out2) + String.fromCharCode(out3)Copy the code

When a ‘=’ is used to fill the position

Function base64decode(STR) {// base64CharsArr = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.split('') let char1 = base64CharsArr.findIndex(char => char==str[0]) let char2 = base64CharsArr.findIndex(char => char==str[1]) let out1, out2, out3, out if (char1 == -1 || char2 == -1) return out char1 = char1 & 0xff char2 = char2 & 0xff let char3 = Base64charsarr.findindex (char => char== STR [2]) Joining together the first String if only (char3 = = 1) {out1 =, char1 < < 2 | (char2 & 0 x30) > > 4 out = String. FromCharCode (out1) return out} let Char4 = base64Charsarr.findIndex (char => char== STR [3]) // If the third bit is not in the table, Joining together the first and the second string if only (char4 = = 1) {out1 =, char1 < < 2 | (char2 & 0 x30) > > 4 out2 = (char2 & 0 xf) < < 4 | (char3 & 0 x3c) > > 2 Out = String. FromCharCode (out1) + String. FromCharCode (out2) return out} / / bit operations out1 =, char1 < < 2 | (char2 & 0 x30) > > 4  out2 = (char2 & 0xf) << 4 | (char3 & 0x3c) >> 2 out3 = (char3 & 0x3) << 6 | char4 console.log(out1, out2, out3) out = String.fromCharCode(out1) + String.fromCharCode(out2) + String.fromCharCode(out3) return out }Copy the code

Decoded the entire string, cleaned up the code

Function base64decode(STR) {// base64CharsArr = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.split('') let i = 0 let len = str.length let out = ''  while(i < len) { let char1 = base64CharsArr.findIndex(char => char==str[i]) i++ let char2 = base64CharsArr.findIndex(char => char==str[i]) i++ let out1, out2, out3 if (char1 == -1 || char2 == -1) return out char1 = char1 & 0xff char2 = char2 & 0xff let char3 = Base64charsarr.findindex (char => char== STR [I]) I ++ Only joining together the first String out1 =, char1 < < 2 | (char2 & 0 x30) > > 4 if (char3 = = 1) {out = out + String. FromCharCode (out1) return out} Let char4 = base64 charsarr.findIndex (char => char== STR [I]) I ++ Only joining together the first and the second String out2 = (char2 & 0 xf) < < 4 | (char3 & 0 x3c) > > 2 if (char4 = = 1) {out = out + String. FromCharCode (out1) + String. FromCharCode (out2) return out} / / bit operations out3 = (char3 & 0 x3) < < 6 | char4 console. The log (out1 and out2, out3) out = out + String.fromCharCode(out1) + String.fromCharCode(out2) + String.fromCharCode(out3) } return out } base64decode('aGFvaGFv') // haohao base64decode('aGFvaGE=') // haoha base64decode('aGFvaA==') // haohCopy the code

The core of the decoding above is the mapping between characters and base64 character set index. The method of mapping Base64 character index using AccII encoding index has been seen on the Internet

let base64DecodeChars = [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 62, 1, 1, 1, 63, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 1, 1, 1, 1, 1, 1, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 1, 1, 1, 1, 1, 1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, // let char1 = 'hao'. CharCodeAt (0) // h -> 104 base64 decodechars [char1] // 33 -> base64 decodechars [char1] // 33 -> base64 decodecharsCopy the code

Thus, base64 decodechars control accII encoding table index storage is base64 encoding table of the corresponding character index.

conclusion

Base64 encoding may seem strange, because most encoding involves converting characters to binary, and the conversion from binary to character is called decoding. The Base64 concept is exactly the opposite, from binary to character called encoding, from character to binary called decoding. Base64 is a data encoding method, which can be used for simple encryption. We can change the Base64 encoding mapping order to form our own unique encryption algorithm for encryption and decryption.

Code table