Hi everyone, I’m developer FTD. I believe that many students in the work, often use Base64 coding, then you know why there is Base64 coding? Why do we use it, and how does it work? Let’s take a closer look at Base64 encoding.

The family of the Base

Before we begin, let’s introduce you to the Base family. Although we use Base64 most in the work, but the Base family can not only Base64, in addition to Base64, Base family also Base32 and Base16.

We all know about ASCII encoding, and ASCII encoding is the way you encode binary data with 256 characters, same thing

  • Base64 encoding is a method of encoding binary data with 64 (2 ^ 6) characters

  • Base32 encoding is a method of encoding binary data with 32 (2 ^ 5) characters

  • Base16 encoding is a way to encode binary data with 16 (2 ^ 4) characters

Why Base64, of all the other encoding forms in the Base family?

  • Base64 encoding is to use 64 (2 ^ 6) specific ASCII characters to represent 256 (2 ^ 8) ASCII characters, that is to say, three ASCII characters after Base64 encoding into four ASCII characters display (common divisor is 24), after encoding the data length is increased by 1/3 than the original. If less than 3n, use “=” to make up for it.

  • Base32 encoding is to use 32 (2 ^ 5) specific ASCII characters to represent 256 (2 ^ 8) ASCII codes, that is to say, five ASCII characters will be displayed as eight ASCII characters after Base32 encoding (common divisor is 40), and the length of data after encoding increases by 3/5 compared with the original. If it is less than 8n, use “=” to make it up.

  • Base16 encoding is to use 16 (2 ^ 4) specific ASCII characters to represent 256 (2 ^ 8) ASCII characters, that is to say, an ASCII character will be displayed as two ASCII characters after Base16 encoding. After encoding, the data length will be doubled than the original, and if it is less than 2N, it will be supplemented by “=”.

It can be seen from the above that Base64 encoding, the length increase is the least, which is also an important reason for us to choose Base64.

Base64 profile

Base64, as its name suggests, is a way to represent binary data based on 64 printable characters, “noting that it is not an encryption algorithm.” For 64 print characters, we need only 6 bits to fully represent them. So how do we use eight bits to represent printable characters that only need six bits to be fully represented? Since 2 to the sixth power is equal to 64, we can divide every six bits into a cell corresponding to some printable character. Three bytes have 24 bits, corresponding to four Base64 units, that is, three bytes need to be represented by four printable characters.

Base64 is the process from binary data to characters. So all the content in your computer, including text, pictures, audio, video, and so on, can be represented using Base64 encoding.

Base64 encoding principle

Base64 encoding uses 64 characters as a base character set:

Lowercase letters A-z, uppercase letters A-z, numbers 0-9, symbols “+”, “/” (plus the padded “=”, which is actually 65 characters)

All other symbols are then converted to characters in this character set according to certain rules.

Specifically, the Base64 encoding conversion mode can be divided into the following four steps:

  • The first step is to set up groups of three bytes for a total of 24 binary bits
  • In the second step, the 24 bits are divided into four groups of six bits each
  • Third, add two 00’s to the front of each group to expand to 32 binary bits, or four bytes
  • The fourth step is to obtain the corresponding symbol of each byte after expansion according to the following table, which is the encoding value of Base64

The Base64 encoded character index table is as follows:

The numerical character The numerical character The numerical character The numerical character
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 49 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F 21 V 37 l 53 1
6 G 22 W 38 m 54 2
7 H 23 X 39 n 55 3
8 I 24 Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /

With this character index table, we can convert any binary to Base64 encoding. Here are a few examples to show the conversion process.

1. Suppose there is now a string “FTD” that needs to be converted to the base64 encoding format

base64
  • Step 1: the corresponding ASCII code values of F, T, and D are 70,84,68 respectively, and the corresponding binary values are 01000110, 01010100, and 01000100. As shown in the second and third lines of the figure, a 24-bit binary string is thus formed.
  • Step 2: Divide the 24-bit binary into four groups of six bits.
  • Step 3: Add two zeros to the front of each group to expand to 32 binary bits, which become four bytes: 00010001, 00100101, 00010001, 00000100. The corresponding values (Base64 encoded index) are: 17, 37, 17, 4.
  • Step 4: Use the above values to find in the Base64 character index table, respectively: R, I, R, E.

So the string “FTD” is Base64 encoded to become: RIRE.

2. The above example contains exactly three bytes. What if the number of bytes is less than three? The following examples are “F” and “FT” respectively:

base64-2

As shown in the above table, since the binary of character F is 01000110, it can only be divided into one group according to every 6 bits. The second group is missing 4 bits. If the number of bits is insufficient, 0 will be used to complete the group. The third and fourth groups have no data at all, and are filled with **= “. Thus, the Base64 encoded value of the character F is “Rg==**.

3, let’s see if there are only two characters:

base64-3

As shown in the above table, this also belongs to the situation of insufficient digits and need to be filled. The first and second groups are calculated according to the normal grouping, the third group has no data at all, and the fourth group has no data at all, and is filled with **= “. Therefore, the Base64 encoded value of the character FT is “RlQ=**.

About Base64 encoding of Chinese

As we all know, there are many kinds of Chinese encoding, such as “GB2312, GBK, GB18030”. After different Chinese characters are encoded with different encoding formats, their binary is different, so after Base64 encoding, their Base64 encoding values are also different. This requires us to pay attention to the original character set format when decoding, must be consistent in order to decode correctly.

Such as:

The Base64 encoded value in UTF-8 format of “[I am developer FTD] public number” is 44CQ5oiR5piv5byA5Y+R6ICFRlRE44CR5YWs5LyX5Y+3

The Base64 encoded value of GB2312 format of “[I am developer FTD] public number” is ob7O0srHv6q3otXfRlREob+5q9bausU=

Is Base64 an encryption algorithm?

Base64 is not mainly used for encryption. It is mainly used to convert some binary numbers into ordinary characters for network transmission. This is because some binary characters belong to control characters in the transmission protocol and cannot be directly transmitted on the network. In addition, some systems only use ASCII characters. Base64 encoding is a method used to convert non-ASCII data into ASCII characters. Base64 is not an encryption and decryption algorithm in the security domain, although it is sometimes common to see the so-called Base64 encryption and decryption algorithm. Base64 is really just an encoding algorithm that encodes data content for network transmission. Although after the Base64 encoding became unable to understand directly the original character format, but this kind of coding way more junior, is very simple, can easily be restored to the original, so if there is a need to encrypt the important information, be sure to use the article before we have encryption algorithm for data security and protection.

Base64 encoding implementation

There are multiple libraries in the Java language that implement Base64 encoding, and the end result is the same regardless of which library.

The Base64 encoding implementation provided by the JDK:

public static String encode(String data) {

    return Base64.getEncoder().encodeToString(data.getBytes());

}



public static String decode(String base64Data) {

    return new String(Base64.getDecoder().decode(base64Data));

}

Copy the code

Bouncy Castle provides Base64 encoding implementation:

public static String encode(String data) {

    return new String(Base64.encode(data.getBytes()));

}



public static String decode(String base64Data) {

    return new String(Base64.decode(base64Data));

}

Copy the code

Commons Codec provides a Base64 encoding implementation:

public static String encode(String data) {

    return Base64.encodeBase64String(data.getBytes());

}



public static String decode(String base64Data) {

    return new String(Base64.decodeBase64(base64Data));

}

Copy the code

Let’s use the Java language implementation to verify that our reasoning in Section 2 is correct:

public static void main(String[] args) {

    String ftd = "FTD";

    String ft = "FT";

    String f = "F";



    System.out.println("FTD Base64 encoding :" + encode(ftd));

    System.out.println("FT Base64 encoding :" + encode(ft));

    System.out.println("F Base64 encoding :" + encode(f));

}

Copy the code

The output is:

FTD Base64 encoding :RlRE

FT Base64 encoding :RlQ=

F Base64 encoding :Rg==

Copy the code

As you can see, it’s exactly the same as our analysis.

For the full code visit:

Github.com/ForTheDevel…

conclusion

Base64 is we often used in the work, but few would be in-depth study of its principle, if improper understanding, even someone may also use it as a encryption key position, use business system may lead to serious consequences, I believe you after watching the above content, should be to Base64 encoding has profound understanding.

Technical people, technical soul, liver a technical article every day, ヾ(◍°∇°◍) Blue ha ha ~

About the author
  • GitHub:github.com/ForTheDevel…
  • The Denver nuggets: juejin. Cn/user / 120472…
  • CSDN:blog.csdn.net/ForTheDevel…
  • Zhihu: www.zhihu.com/people/fort…
  • Segmentfault:segmentfault.com/u/for_the_d…
Contact the author
  • Wechat id: ForTheDeveloper

  • Public account: ForTheDevelopers