【AAA goods 】 Front-end Base64 coding knowledge, a article to explore the origin, the pursuit of truth.

preface

This article is included in the front of the basic advanced column, welcome attention and collection, past classics:

【 dry goods 】 how many of these advanced tool functions do you own? 400+ star
Function. Prototype. Call 200+ star
Functions that you are familiar with but not familiar with 200+ star
Do you really know the difference between these 16 native functions and properties? Carefully collected, advanced front-end essential knowledge, quickly pack away 100+ Star

Collection does not star is playing rogue! ha

The outline

Easy to read on mobile terminals:

Base64 in the front end
Base64 Data encoding origin
Meaning of Base64 encoding 64
Base64 encoding has advantages and disadvantages
Some computer and front end basics
ASCII, Unicode, UTF-8
Base64 encoding and decoding
Other mature programs
Write in the last

Base64 in the front end

Base64 coding, of course, you know it, but let’s take a look at some common front-end applications: most scenarios are based on Data URLs

Canvas image generation

The Canvas toDataURL can convert the canvas contents of the Canvas into Base64 encoded format containing the data URI of the image display.

const ctx = canvasEl.getContext("2d");
/ /... other code
const dataUrl = canvasEl.toDataURL();

// data:image/png; base64,iVBORw0KGgoAAAANSUhE.........
Copy the code

You draw me a guess, new users who want to get the current newest drawing interface can also use Base64 messaging.

File to read

FileReader’s readAsDataURL can convert an uploaded file into a Base64 data URI. A common scenario is the clipping and uploading of a user’s avatar.

function readAsDataURL() {
    const fileEl = document.getElementById("inputFile");
    return new Promise((resolve, reject) = > {
        const fd = new FileReader();
        fd.readAsDataURL(fileEl.files[0]);
        fd.onload = function () {
            resolve(fd.result);
            // data:image/png; base64,iVBORw0KGgoAAAA.......
        }
        fd.onerror = reject;
    });
}
Copy the code

jwt

JWT consists of header, payload, and signature. After decoding, the first two can be seen in plain text. Take the most powerful JWT generated Token login verification explanation, read to ensure that you learn! Inside the token to do the test.

Site pictures and small pictures

Optimized mobile website ICONS

<link rel="icon" href="data:," /> <link rel="icon" href="data:; base64,=" />Copy the code

How to obtain the value data:,

<canvas height="0" width="0" id="canvas"></canvas>
<script>
    const canvasEl = document.getElementById("canvas");
    const ctx = canvasEl.getContext("2d");
    dataUrl = canvasEl.toDataURL();
    console.log(dataUrl);  // data:,
</script>
Copy the code

Small pictures

This has many scenes, such as img tags, background images and so on

Img tags:

<img src="data:image/png; base64,iVBORw0KGgoAAAA......." />
Copy the code

CSS Background:

.bg{
    background: url(data:image/png; base64,iVBORw0KGgoAAAA.......)}Copy the code

Simple data encryption

It’s not a good idea, but at least it’s hard to read.


  const username = document.getElementById("username").vlaue; 
  const password = document.getElementById("password").vlaue;  
  const secureKey = "%%S%$%DS)_sdsdj_66";
  const sPass = utf8_to_base64(password + secureKey);
  
  doLogin({
      username,
      password: sPass
  })
  
Copy the code

SourceMap

The “Mappings” field is the “mappings” field. The “Mappings” field is the “Mappings” field. The “Mappings” field is the “Mappings” field.

{version : 3.file: "out.js".sourceRoot : "".sources: ["foo.js"."bar.js"].names: ["src"."maps"."are"."fun"].mappings: "AAgBC,SAAQ,CAAEA"}Copy the code

Please see the official base64-vlq.js file for the specific implementation.

Obfuscation encryption code

The well-known code obfuscator library, javasjavascript -obfuscator, also uses base64 for a few codes. Take a look at the options: Webpack-obfuscator is also based on its encapsulation.

    --string-array-indexes-type '<list>' (comma separated) [hexadecimal-number, hexadecimal-numeric-string]
    --string-array-encoding '<list>' (comma separated) [none, base64, rc4]
    --string-array-index-shift <boolean>
    --string-array-wrappers-count <number>
    --string-array-wrappers-chained-calls <boolean>
Copy the code

other

X.509 public key certificates, Github SSH keys, MHT files, mail attachments, etc., all have a Base64 shadow.

Base64 Data encoding origin

Early mail transfer protocols were based on ASCII text and did not deal well with binary files such as pictures and videos. ASCII is primarily used to display modern English, and so far only 128 characters have been defined, including control characters and displayable characters. To solve these problems, Base64 encoding was born.

Base64 is the codecs, and its main function is not security, but error-free transmission of content across gateways, which is the core function of Base64 encoding.

In addition to Base64 data encoding, there are also Base32 data encoding and Base16 data encoding. See RFC 4648.

Meaning of Base64 encoding 64

64 is 64 characters.

Base64 mapping table, borrowBase64 principleA picture of:

A-Z 26
a-z 26
0-9 10
+ / 2

26 plus 26 plus 10 plus 2 is 64

And then of course there’s the character equals, which is the padding character, which we’ll talk about later, which is not in 64.

Reference table index values, note that base64 encoding and decoding are used later.

Base64 encoding has advantages and disadvantages

advantages

You can convert binary data (such as pictures) into printable characters for easy data transmission
Simple encryption of data, the naked eye is safe
If images are processed in HTML or CSS, you can reduce HTTP requests

disadvantages

When the content is encoded, it gets bigger by at least a third because it’s three bytes, it gets four bytes, and when it’s only one byte, it gets at least three bytes.
Encoding and decoding require additional work

Let’s get back to the point:

Our focus today is on uf8 encoding to Base64 encoding:

The basic flow

Char => Code point => UTF-8 encoding => Base64 encoding

Before I solve the knowledge of coding, to understand the knowledge of coding, but also to understand some basic knowledge of computers.

Some computer and front end basics

Bits and bytes

Bits are also called bits. In the computer world, information can only be represented by 0 and 1, which can represent two states. One bit binary can represent two states, and N bits can represent 2 to the N states.

A byte(Byte)There are eight(Bit)

So one byte can represent 2^8 = 256 states;

Gets the Unicode code point for a character

String. Prototype. CharCodeAt can obtain the character code points, to obtain range of 0 ~ 65535. Note that this place relates to the utF-8 bytes that follow.

"a".charCodeAt(0)  / / 97
"In".charCodeAt(0) / / 20013
Copy the code

Hexadecimal said

0bCan represent binary

Note that 0b10000000= 128,0b11000000=92, which will be used later.

0b11111111 / / 255
0b10000000 // 128 will be used later
0b11000000 // 192 will be used later
Copy the code

0xAt the beginning, can indicate the hexadecimal

0x11111111 / / 286331153
Copy the code

The beginning of 0o can indicate the base 8, so I won’t talk about it.

Hexadecimal conversion

Decimal turn other hexadecimal Number. The prototype. ToString (radix) can convert decimal into other base.

100..toString(2)  / / 1100100
100..toString(16) // 64, which is also ox64
Copy the code

ParseInt (String, Radix) can convert other radix to radix 10.

parseInt("10000000".2) / / 128
parseInt("10".16) / / 16
Copy the code

The unary operator + can turn a string into a number, and we’ll use it later, so all the 0b,0o, and 0x that we mentioned before will work here.

+"1000" / / 1000
+"0b10000000" / / 128
+"0o10" / / 8
+"0x10" / / 16
Copy the code

The displacement operation

I’m only going to talk about moving right, so I’m just going to talk about moving right, because moving right is the same thing as dividing by 2, and if it’s an integer, it’s just going to be the same thing as dividing by 10.

64 >> 2 = 16 So let’s look at the process

0 1 0 0 0 0 0 0       64
-------------------
   0 1 0 0 0 0 | 0 0  16
Copy the code

One yuan`&`Operations and unaries`|`operation

One yuan &

When both are 1, the value is 1. The role of this article can be used to remove high, specific look at the code. 3553&36 = 0b110111100001&0b111111 = 100001 Because the high level is missing and cannot be all 1, so they are all 0, while the low level is just copied again.

110111 100001
       111111
------------
000000 100001

Copy the code

One yuan | when any one is 1, output to 1. This paper is used to fill the 0. For example, 3 to fill into eight binary 3 11 | | 256 = 100000000 = 100000011

Substring (1) is equal to 8-bit binary 00000011

With these basics in mind, let’s start with the basics of coding.

ASCII, Unicode, UTF-8

ASCII

ASCII code is always the first digit 0, so the actual state can be expressed is 2^7 = 128 states.

ASCII is primarily used to display modern English, and so far only 128 characters have been defined, including control characters and displayable characters.

The ASCII numbers between 0 and 31 are used to control peripherals such as printers
The ASCII symbols between 32 and 127 can be found on our keyboards

For a complete table of ASCII correspondence, see basic ASCII and extended ASCII

Next comes Unicode and UTF-8 encodings, so keep this important knowledge in mind:

: the Unicode character set
Utf-8: indicates the encoding rule

Unicode

Unicode assigns a unique number (code point) to all characters in the world. There are more than a million of these numbers ranging from 0x000000 to 0x10FFFF (hexadecimal). Each character has a unique Unicode number, which is usually written in hexadecimal. I put U plus in front. For example, the Unicode of the dig is U+6398.

U + 0000 to U + FFFF

The first 65536 character bits have code points ranging from 0 to 216-1. All the most common characters are put here.

U plus 010000 all the way to U plus 10FFFF

The rest of the characters are here, from U+010000 all the way to U+10FFFF.

Unicode has the concept of planes, which I will not extend here.

Unicode only specifies the code point of each character. The encoding method is concerned with what byte order is used to represent the code point.

UTF-8

Utf-8 is one of the most widely used Implementations of Unicode on the Internet. There are implementations like UTF-16 (characters are represented in two or four bytes) and UTF-32 (characters are represented in four bytes).

Utf-8 is a variable length encoding, using a number of bytes from 1 to 4, the latest should be more than 4, this 1-4, is the key to the encoding and decoding.

Utf-8 encoding rules:

For symbols with only one byte, the first byte is set to0, the last 7 bits are the Unicode code for this symbol. At this point, theThe UtF-8 encoding of English letters is the same as the ASCII code.
fornByte symbol (n > 1), before the first bytenBit is set to1In the firstn + 1Bit is set to0, the first two bits of the following bytes are all set to10. The rest of the unmentioned bits are the Unicode code for this symbol, as shown in the following table:

Unicode code point range (hexadecimal)	Decimal range	Utf-8 encoding (binary)	The number of bytes
`0000 0000 ~ 0000 007F`	`0 ~ 127`	`0xxxxxxx`	1
`0000 0080 ~ 0000 07FF`	`128 ~ 2047`	`110xxxxx 10xxxxxx`	2
`0000 0800 ~ 0000 FFFF`	`2048 ~ 65535`	`1110xxxx 10xxxxxx 10xxxxxx`	3
`0001 0000 ~ 0010 FFFF`	`65536 ~ 1114111`	`11110xxx 10xxxxxx 10xxxxxx 10xxxxxx`	4

We may not have seen characters with 2 bytes or 4 bytes. For characters with 2 bytes go here to the Unicode mapping table, and for characters with 4 bytes go here to the Unicode® 13.0 Versioned Charts Index

The following points are at0000 0080 ~ 0000 07FF, utF-8 encoding requires 2 bytes

The following points are at0001 0000 ~ 0010 FFFF, utF-8 encoding requires 4 bytes

It may not be easy to understand here, let’s use English character A and Chinese character respectively to explain:

To verify the result, Convert UTF8 to Binary bits-online UTF8 Tools

The English characters`a`

Get its code point first,"a".charCodeAt(0)Is equal to the97
Refer to table, 0~127, need1bytes
97.. toString(2)Get coding1100001
According to the format0xxxxxxxTo fill in the final result

01100001
Copy the code

Chinese characters`dig`

Get its code point first,"Dig". CharCodeAt (0)Is equal to the25496
According to the table, 2048 ~ 65535 is required3bytes
25496.. toString(2)Get coding110 001110 011000
Eradicate format1110xxxx 10xxxxxx 10xxxxxxThe final result is as follows

11100110 10001110 10011000
Copy the code

Convert UTF8 to Binary bits-online UTF8 Tools Result: The match is complete

Abstract the method of converting characters to utF8 format binary

Based on the table above and the transformation process, let’s abstract a method that is essential for later Base64 encoding and decoding:

First look at the functionality, covering the UTF8 encoding range of 1-3 bytes

console.log(to_binary("A"))  / / 11100001
console.log(to_binary("س"))  / / 1101100010110011
console.log(to_binary("Dig")) / / 111001101000111010011000
Copy the code

Methods the following

function to_binary(str) {
  const string = str.replace(/\r\n/g."\n");
  let result = "";
  let code;
  for (var n = 0; n < string.length; n++) {
    // Get the pitting point
    code = str.charCodeAt(n);
    if (code < 0x007F) { // 1 byte
      // 0000 0000 to 0000 007F 0 to 127 1 byte
      
      // (code | 0b100000000).toString(2).slice(1)
      result += (code).toString(2).padStart(8.'0'); 
    } else if ((code > 0x0080) && (code < 0x07FF)) {
      // 0000 0080 ~ 0000 07FF 128 ~ 2047 2 bytes
      // The binary of 0x0080 is 10000000 with 8 bits, so the binary of greater than 0x0080 has at least 8 bits
      // Format 110xxxxx 10xxxxxx

      / / high 110 XXXXX
      result += ((code >> 6) | 0b11000000).toString(2);
      / / low 10 XXXXXX
      result += ((code & 0b111111) | 0b10000000).toString(2);
    } else if (code > 0x0800 && code < 0xFFFF) {
      // 0000 0800 to 0000 FFFF 2048 to 65535 3 bytes
      // The binary value of 0x0800 is 100000000000000012, so the binary value greater than 0x0800 has at least 12 bits
      // Format 1110XXXX 10xxXXXX 10xxXXXX

      // The highest digit is 1110xxxx
      result += ((code >> 12) | 0b11100000).toString(2);  
      // The second place is 10xxxxxx
      result += (((code >> 6) & 0b111111) | 0b10000000).toString(2);
      // The third place is 10xxxxxx
      result += ((code & 0b111111) | 0b10000000).toString(2);
    } else {
      // 0001 0000 to 0010 FFFF 65536 to 1114111 4 bytes
      / / https://www.unicode.org/charts/PDF/Unicode-13.0/U130-2F800.pdf
      throw new TypeError("Characters with code points greater than 65535 are not currently supported")}}return result;
}

Copy the code

There are three parts of the method that are a little more difficult to understand, but let’s read them together:

The second byte(code >> 6) | 0b11000000

Its function is to generate high level binary. Take س as an example. Its code point is 0x633, between 0000 0080 and 0000 07FF, occupying two bytes. Its binary code is 11 000110011, and its filling format is as follows.

110xxxxx 10xxxxxx
Copy the code

So for the sake of observation, let’s recalibrate 11000110011 to 11000110011.

(code >> 6) equals 00110011 >> 6, moves 6 bits to the right and kills the lower 6 bits directly. Why is it 6? Because you need 6 bits for the low position, and when you move 6 bits to the right, all that’s left is the bit for the high position.

11000000
   11000 | 110011 
--------------
11011000      
Copy the code

The second byte(code & 0b111111) | 0b10000000

Function, used to generate low binary. In order toسFor example,11000, 110011,, fill format

  110xxxxx 10xxxxxx
Copy the code

(code & 0b111111) The purpose of this step is to eliminate more than 6 high bits, leaving only the low 6 bits. An ampersand, it only becomes a one when both sides are one. Nice.

11000 110011
      111111
------------------
      110011  
Copy the code

Then to | 0 b10000000, mainly carried out in accordance with the format XXXXXX 10 digits fill, let its full 8 bits.

 11000 110011
       111111         (code & 0b111111) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --110011  
    10 000000         (code & 0b111111) | 0b10000000
-------------------
    10 110011
Copy the code

Base64 encoding and decoding

Utf-8 Encoding rules for converting to Base64

Gets the Unicode code for each character and converts it to UTF-8 encoding
A group of three bytes is a total of 24 binary bits. The number of bytes is not divisible by 3 and is supplemented by a 0 byte value at the end
They are grouped in groups of six bits, and the first two bits are added with zeros to make up eight bits
Calculate the value of each group
In the first4Step as the index, go to the ASCII code table to find the corresponding value
Replace the first2stepAdd the number of bytesthe=

Such as the first2Two bytes are added, followed by two bytes=

Utf8_to_binary = utF8 11100110 10001110 10011000 11000001

11100110
10001110
10011000
01000001
--------
00000000
00000000
Copy the code

6 a group is divided into four groups, high fill 0, separated with | the fill.

00 | 111001  => 57 => 5
00 | 101000  => 40 => o
00 | 111010  => 58 => 6
00 | 011000  => 24 => Y

00 | 110000  => 16 => Q
00 | 010000  => 16 => Q
00 | 000000  =>    => =
00 | 000000  =>    => =
Copy the code

The result: 5o6YQQ==, perfect.

Utf-8 to Base64 encoding rules

Based on the to_binary method above and the base64 conversion rules, it’s pretty simple: first look at the result, very good, exactly the same as in base64.us.

console.log(utf8_to_base64("a")); // YQ==

console.log(utf8_to_base64("Ȃ"));  // yII=

console.log(utf8_to_base64("Chinese")); // 5Lit5Zu95Lq6

console.log(utf8_to_base64(Coding Writing 标 签 : Coding Writing 标 签 : Coding Writing));
//Q29kaW5nIFdyaXRpbmcg5aW95paH5Y+s6ZuG5Luk772c5ZCO56uv44CB5aSn5YmN56uv5Y+M6LWb6YGT5oqV56i/77yMMuS4h+WFg+WlluaxoOetieS9oO aMkeaImO+8gQ==

Copy the code

The complete code is as follows:


const BASE64_CHARTS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
function utf8_to_base64(str: string) {
  let binaryStr = to_binary(str);
  const len = binaryStr.length;

  // The number to be filled =
  let paddingCharLen = len % 24! = =0 ? (24 - len % 24) / 8 : 0;

  / / 6 a group
  const groups = [];
  for (let i = 0; i < binaryStr.length; i += 6) {
    let g = binaryStr.slice(i, i + 6);
    if (g.length < 6) {
      g = g.padEnd(6."0");
    }
    groups.push(g);
  }

  / / evaluated
  let base64Str = groups.reduce((b64str, cur) = > {
    b64str += BASE64_CHARTS[+`0b${cur}`]
    return b64str
  }, "");

  / / fill =
  if (paddingCharLen > 0) {
    base64Str += paddingCharLen > 1 ? "= =" : "=";
  }

  return base64Str;
}

Copy the code

As for the decoding, it is the reverse process, I leave it to you to implement.

Other mature programs

Based on what we already have, of coursebtoaandatob.

However, unescape is not recommended

function utf8_to_b64( str ) {
  return window.btoa(unescape(encodeURIComponent( str )));
}

function b64_to_utf8( str ) {
  return decodeURIComponent(escape(window.atob( str )));
}
Copy the code

// Usage: utf8_to_b64(‘✓ à la mode’); // “4pyTIMOgIGxhIG1vZGU=” b64_to_utf8(‘4pyTIMOgIGxhIG1vZGU=’); // “✓ à la mode”

MDN的 rewriting atob() and btoa() using TypedArrays and UTF-8

It supports up to 6 bytes, but is not very readable.

The third-party libraries Base64-js and js-base64 are among the millions of libraries downloaded each week.

Although there are so many mature ones, we need to understand and implement them to understand the encoding principles of Base64 better.

As an extra

Coding diagram

Do you really know Unicode and UTF-8? A picture:

DOMString 是utf-16coding

Write in the last

Writing is not easy, your three consecutive (one praise, one comment, one collection), is my biggest motivation.

Click over 100 likes, then write an article about browser, DOM, JS, etc.

reference

Version – Specific Charts Unicode13.0.0 Unicode 13.0 Versioned ® Charts Index RFC 4648 | The Base16, Base32, And Base64 Data Encodings Base64 encoding and Decoding ASCII. In this article, you can read about the principles of Base64, the implementation of Base64, and the application of Base64 Do you really know Unicode and UTF-8? Utf-8 / UTF-16 / UTF-16 / UTF-8 / UTF-16