preface

Calm, suddenly look back, base64 but in the lights dim.

Looking through the old project today, I found a lot of image-related plug-ins that use Base64 to display images. When it comes to Base64, my mind is flying, and my thoughts are reverberating beneath Base64 warts and all. The purpose of this paper is to record the problems encountered in the work and summarize, if there is anything wrong, please correct ~

Base64 origin

Base64 is one of the encoding methods for network transmission of 8Bit byte code. It is a method of representing binary data based on 64 printable characters. In the case of payment systems, the base64 transcoding of plaintext is required for all packet interactions, and then it is signed or encrypted, and then transmitted (or base64 transcoding again). So what does Base64 do?

In the process of parameter transmission, there is a common situation: it is ok to use English strings, but when it comes to Chinese characters, there will be garbled characters. Similarly, not all characters transferred over the network are printable characters, such as binary files, pictures, and so on. Base64 is a way to represent binary data based on 64 printable characters to solve this problem.

When E-mail first came into being, it could only transmit English. However, as the number of users increased, Chinese, Japanese, Korean and Russian characters were also required. However, these characters could not be effectively processed by servers or gateways, so Base64 came into being. Later, Base64 was also used in urls, cookies, and web pages to transfer small amounts of binary files.

Encoding principle of Base64

Binary data is identified based on the 64 characters A-z, A-z, 0-9, +/, and the = symbol is used to fill in when bytes are missing.

Base64 encoding mapping table

The index
Corresponding character
The index
Corresponding character
The index
Corresponding character
The index
Corresponding character
0
A
17
R
34
i
51
z
1
B
18
S
35
j
52
0
2
C
19
T
36
k
53
1
3
D
20
U
37
l
54
2
4
E
21
V
38
m
55
3
5
F
22
W
39
n
56
4
6
G
23
X
40
o
57
5
7
H
24
Y
41
p
58
6
8
I
25
Z
42
q
59
7
9
J
26
a
43
r
60
8
10
K
27
b
44
s
61
9
11
L
28
c
45
t
62
+
12
M
29
d
46
u
63
/
13
N
30
e
47
v
14
O
31
f
48
w
15
P
32
g
49
x
16
Q
33
h
50
y

Base64 encoding conversion rules

Base64 requires that every three 8-bit bytes be converted into four 6-bit bytes (3*8 = 4*6 = 24), and then the 6-bit is added with two high-order zeros to form four 8-bit bytes (4*8=32).

Why use groups of 3 bytes? Since the least common multiple of 6 and 8 is 24, three bytes have exactly 24 binary bits, which are grouped in groups of six bits and can be divided into four groups.

Used to add two high zeros to each group, the converted string is theoretically 1/3 longer than the original character (24/32=1/3).

  • Step breakdown:

    1. The string to be converted is divided into groups of three characters, each character byte is 8 bits, so there are 24 binary bits.
    2. Divide 24 bits into groups of 6 bytes into 4 groups.
    3. Two zeros are added before each group of 6 bytes, and each group changes from 6 bytes to 8 bytes, making up a total of 32 binary bits, or four bytes.
    4. The corresponding values were obtained according to the Base64 encoding comparison table.
  • Take a chestnut

    • Take the standard 3 charactersLJYAs an example.
    1. LJYThe corresponding ASCII code values are 76, 74, and 89 respectively, and the corresponding binary values are 01001100, 01001010, and 01011001. This forms a 24-bit binary string.
    2. Divide a 24-bit binary string into four groups of six bits.
    3. Add two zeros before the string of four groups of six bits, expand each group to eight bits, and expand the four groups to 32 bits. In this case, the four groups are 00010011, 00000100, 00101001, and 00011001 respectively. The corresponding base64 encoding indexes are: 19, 4, 41, 25.
    4. The base64 encoding index value is used to search in the Base64 encoding table, corresponding to T, E, P and Z respectively.
    • soLJYAfter base64 encoding, it becomes: TEpZ.
Text | | | J L | Y | | ASCII | 76 | | 89 | 74 | | 01001100 | | 01011001 | 01001010 bits group binary | 010011 | | 000100 | 101001 | 011001 | | grouping binary fill 2 0 | 00010011 | 00000100 | 00101001 | 00011001 | | grouping index 19 4 41 25 | | | | | | base64 encoding | T | E | p | Z | mainly show: convert binary: before 01001100, 01001010, 01011001 converted binary: 00010011 00000100 00101001 00011001Copy the code
  • Insufficient number of characters

The above chestnut is for a group of exactly three characters. Not all the time, of course, there are enough characters, but there are not enough characters. So what do you do when you run out of characters?

Base64 provides the scheme that when each group of characters is less than three digits, the less than digit position needs to be filled with the = symbol.

Processing scenario of insufficient bits:

  • Bit missing one byte: A byte contains eight bits, still grouped according to the rules. At this time, there are 8 binary bits in total, and every 6 bits are in a group. Then the second group is missing 4 bits, and 0 is used to complement to get two Base64 encoding, while the latter two groups have no corresponding data, so they are both used=Catch up on.
  • Bits missing two bytes: two bytes are 16 bits, still grouped according to the rules. At this time, there are 16 binary bits in total, and each group is 6 bits. Then the third group is missing 2 bits, which is filled with 0 to get three Base64 encoding, and the fourth group is used when there is no data at all=Catch up on.

The lack of digits is illustrated as follows:

<! - lack of two characters, for A string for example A QQ = = after base64 - > text (1 byte) | A | | | | | binary | 01000001 | | | | grouping binary | 010000 | 010000 | | | | Fill grouping binary 0 | 00010000 | 00010000 | | | | grouping index 16 16 | | | | | | base64 encoding | = | = | | | Q Q <! - lack of one character, string AB, for example A QUI = after base64 - > text (1 byte) | | A | B | | | | | 01000010 | 01000001 bits | | grouping binary | | 010000 | 010100 001000 | | | grouping fill binary 0 00010000 | 00010100 | | 00001000 | | | grouping index 16 20 8 | | | | | | base64 encoding | | Q U | | = | ICopy the code

Enumerating the conversion of one character to three characters into Base64, it can be found that base64 is to convert binary into string according to the base64 encoding comparison table, so that the data can not be directly displayed in plaintext, but it is not encryption, which can be used in the transmission, storage, representation of binary field.

  • In addition, it is worth noting that there are many kinds of encoding in different languages such as Chinese (such as UTF-8, GB2312, GBK, etc.), and different encoding corresponding to Base64 encoding results are different.
  • Secondly, in the process of deduction, it can be found that base64 refers to characters with 6 bits (the sixth power of 2 is 64). Similarly, Base32 refers to characters with 5 bits, and Base16 refers to characters with 4 bits. You can follow the above steps to perform the evolution test.

Base64 pros and cons

Once you know what Base64 is, it’s time to figure out why. Why base64? It’s up to you to choose the right scenario based on its advantages and disadvantages.

  • Advantage:
    • Base64 is suitable for transmission of different platforms and languages.
    • Small base64 images embedded in the page can reduce the number of server visits;
    • Binary bit conversion base64 algorithm is simple and has little impact on performance.
  • disadvantages
    • Binary files convert to Base64, increasing their size by about a third;
      • In the actual test scenarios based on Android6.0 and the default browser below, it is found that the uploading of base64 pictures of some models such as zte will crash due to the large character size.
      • If the length of a base64 character is too large, it is not suitable for URL scenarios, because IOS browsers limit the URL length. If the URL length exceeds the threshold, the URL will be automatically deleted, resulting in data loss.
      • If the base64 characters are too large, the page loading speed slows down. Therefore, you are advised to use images smaller than 10kb.
    • Base64 cannot be cached. Only files containing Base64 can be cached, such as JS or CSS.
    • Large files consume a certain amount of CPU for codec

JavaScript base64 transcoding method

Web API binary and Base64 conversion

  • EncodedData (ATOB) : Decodes a Base64 encoded string.

EnCodedData, which is a string encoded by the btoa() method, contains base64 enCodedData for binary strings. And returns an ASCII string containing the decoded data from encodedData.

  • Btoa (stringToEncode) : Creates a bas64 encoded string.

StringToEncode is the binary stringToEncode. And returns an ASCII string containing the Base64 representation of stringToEncode.

Also in JavaScript, strings are represented using UTF-16 character encoding: in this encoding, strings are represented as sequences of 16-bit (2-byte) units. Each ASCII character can fit into the first byte of one of the cells, but many other characters cannot.

Base64 is designed to require binary data as its input. In terms of JavaScript strings, this means that each character takes up only one byte of the string. Therefore, if you pass a string to btoa() that contains characters that take up more than one byte, you get an error because this is not treated as binary data, so characters over 16 bits need to be transcoded to binary bits before using btoa().

// Simple data
const encodedData = btoa('Hello, world'); // encode a string
const decodedData = atob(encodedData); // decode the string

/* Complex data */
// convert a Unicode string to a string in which
// each 16-bit unit occupies only one byte
function toBinary(string) {
  const codeUnits = new Uint16Array(string.length);
  for (let i = 0; i < codeUnits.length; i++) {
    codeUnits[i] = string.charCodeAt(i);
  }
  const charCodes = new Uint8Array(codeUnits.buffer);
  let result = ' ';
  for (let i = 0; i < charCodes.byteLength; i++) {
    result += String.fromCharCode(charCodes[i]);
  }
  return result;
}
function fromBinary(binary) {
  const bytes = new Uint8Array(binary.length);
  for (let i = 0; i < bytes.length; i++) {
    bytes[i] = binary.charCodeAt(i);
  }
  const charCodes = new Uint16Array(bytes.buffer);
  let result = ' ';
  for (let i = 0; i < charCodes.length; i++) {
    result += String.fromCharCode(charCodes[i]);
  }
  return result;
}

// a string that contains characters occupying > 1 byte
const myString = '☸ ☹ ☺ ☻ ☼ ☾ ☿';

const converted = toBinary(myString);
const encoded = btoa(converted);
console.log(encoded); // OCY5JjomOyY8Jj4mPyY=

const decoded = atob(encoded);
const original = fromBinary(decoded);
console.log(original); / / ☸ ☹ ☺ ☻ ☼ ☾ ☿
Copy the code
  • Compatibility: The ATOB () method does not support IE9 or earlier versions of IE.

Base64 to binary

// Base64 encoding table
const map = {
  0: 52.1: 53.2: 54.3: 55.4: 56.5: 57.6: 58.7: 59.8: 60.9: 61.A: 0.B: 1.C: 2.D: 3.E: 4.F: 5.G: 6.H: 7.I: 8.J: 9.K: 10.L: 11.M: 12.N: 13.O: 14.P: 15.Q: 16.R: 17.S: 18.T: 19.U: 20.V: 21.W: 22.X: 23.Y: 24.Z: 25.a: 26.b: 27.c: 28.d: 29.e: 30.f: 31.g: 32.h: 33.i: 34.j: 35.k: 36.l: 37.m: 38.n: 39.o: 40.p: 41.q: 42.r: 43.s: 44.t: 45.u: 46.v: 47.w: 48.x: 49.y: 50.z: 51.'+': 62.'/': 63};function base64to2(base64) {
  let len = base64.length * 0.75; // Convert to int8array length
  base64 = base64.replace($/ / = *.' '); // remove the = sign (placeholder)

  const int8 = new Int8Array(len); // Set the int8array view
  let arr1,
    arr2,
    arr3,
    arr4,
    p = 0;

  for (let i = 0; i < base64.length; i += 4) {
    arr1 = map[base64[i]]; // Each loop converts base644 bytes to 3 INT8array directly
    arr2 = map[base64[i + 1]];
    arr3 = map[base64[i + 2]];
    arr4 = map[base64[i + 3]].// Assume data arR data 00101011 00101111 00110011 00110001
    int8[p++] = (arr1 << 2) | (arr2 >> 4);
    // Arr1 moves 2 bits to the left to become 10101100
    // ARR2 moves 4 bits to the right: 00000010
    / / | for 'and' operation: 10101110
    int8[p++] = (arr2 << 4) | (arr3 >> 2);
    int8[p++] = (arr3 << 6) | arr4;
  }
  return int8;
}
Copy the code

Base64 turns into a Blob

// base64 image transfer blob
function base64toBlob(base64) {
  var arr = base64.split(', '),
    mime = arr[0].match(/ : (. *?) ; /) [1] | |'image/png',
    bstr = atob(arr[1]), // Convert Base64 to Unicode rule encoding
    n = bstr.length,
    u8arr = new Uint8Array(n);
  while (n--) {
    u8arr[n] = bstr.charCodeAt(n); CharCodeAt can be used to find Unicode encoding after converting encoding
  }
  return new Blob([u8arr], { type: mime });
}

/* Optimized version */
function base64ToBlob(base64) {
  var arr = base64.split(', ');
  var mime = arr[0].match(/ : (. *?) ; /) [1] | |'image/png';
  // Remove the url header and convert it to byte
  var bytes = window.atob(arr[1]);
  // Handle exceptions to convert ASCII codes less than 0 to greater than 0
  var ab = new ArrayBuffer(bytes.length);
  // Generate view (directly for memory) : 8-bit unsigned integer, length 1 byte
  var u8arr = new Uint8Array(ab);

  for (var i = 0; i < bytes.length; i++) {
    u8arr[i] = bytes.charCodeAt(i);
  }

  return new Blob([u8arr], { type: mime });
}
Copy the code

Related literature

  • MDN-btoa
  • MDN-atob
  • Base64 website
  • The original reference