What is a Base64?
The specification document is shown in:RFC 2045
Let’s start with an online topic:
When transferring data over a network, it is often necessary to convert binary data into a printable string. The common printable character set contains 64 characters and is therefore called Base64 notation. There is a char array of length 12. To represent it as a Base64 string, the Base64 string requires at least ____ char; If the char array is 20 in length, ____ chars are required.
- What is Base64?
- Why Base64?
- What are printable characters?
- What are ASCII characters?
Let’s look at the problem backwards, and first let’s look at what ASCII is.
A, ASCII
The following excerpt from [baidu encyclopedia] (baike.baidu.com/item/ASCII/…
ASCII (American Standard Code for Information Interchange) is a computer coding system based on the Latin alphabet used to display modern English and other Western European languages. It is the most common information exchange standard and is equivalent to the international standard ISO/IEC 646.
It is a standard that specifies the binary values of characters commonly used in systems. The ASCII code table has a total of 128 bits. The corresponding **ASCII code table ** is described below.
-
**0 ~ 31 and 127(33 in total) are control characters or communication characters (the rest are displayable characters), ** such as control characters: LF (line feed), CR (carriage return), FF (page feed), DEL (delete), BS (backspace), BEL (ring bell); Special communication characters: SOH (header), EOT (end), ACK (acknowledgement), etc. ASCII values 8, 9, 10, and 13 are converted to backspace, TAB, newline, and carriage return characters, respectively. They do not have a specific graphical display, but have different effects on text display depending on the application [1].
-
32 to 126(a total of 95 characters) are characters (32 is a space). 48 to 57 are Arabic numerals ranging from 0 to 9.
-
There are 26 uppercase letters from 65 to 90, 26 lowercase letters from 97 to 122, and the rest are punctuation marks and operation symbols.
Therefore, all 95 characters in the ASCII code table from 32 to 126 are printable characters. Characters that can be transmitted over the network. This solves the problem of what is a printable character.
Why Base64
Anyone familiar with Web development knows that HTTP packets are composed of simple lines of strings. HTTP packets are plain text, not binary. So you’ve probably seen complaints about why the HTTP protocol uses text instead of binary.
In addition, we transmit data over HTTP, and in HTTP version 1.1 it is possible to transmit binary data. How do we transmit binary streams in HTTP0.9 or in AN ASCII-only text transfer protocol (SMTP/POP3)? This requires the corresponding transcoding method to transcode and transmit binary data. Base64 is one of them.
What is Base64
Base64 is one of the most common encodings for transmitting 8-bit bytecode on the network. Base64 is a method of representing binary data based on 64 printable characters. You can view RFC2045 to RFC2049 for detailed MIME specifications.
Base64 encoding is a binary to character process that can be used to pass long identity information in an HTTP environment
See the Base64 mapping table. We need to use the basic 64 characters to represent binary data. Each character has an index, and the maximum index value is 63. The binary value of 63 is 00111111, which can be represented by 6 bits. So if we use the binary of the index value of these characters to derive the normal 8bit binary code, we can use it to represent the value we want to encode.
Conversion rules:
- Turn every three 8bit bytes into four 6bit bytes. (Think about why there are 3 8bits)
- Add a newline every 76 characters.
- The final terminator is also processed.
- If there are less than 3 8-bit numbers, use 0’s complement and use “=” instead
It might be a bit vague to say this, but take my initials lSY for example:
lsy
The BINARY ASCII code is 01101100 01110011 01111001- Change three 8-bit bytes into four 6-bit bytes: 011011 000111 001101 111001
- Make up the 6bit 0:00 011011 00000111 00001101 00111001
- Get Base64 code index: 27, 7, 13, 57
- Get Base64 code: bHN5
What if the value to be encoded is a decimal 1:
- The binary ASCII representation of 1 is 00110001
- Change three 8-bit bytes into four 6-bit bytes, and the complement of less than three is 0: 001100 010000 000000 000000
- Make up the 6bit 0:00 001100 00010000 00000000 00000000
- Get Base64 code index: 12 16
- Base64 code: MQ== (where the last two digits are 0, add =)
We can take the result value and look it up in the corresponding Base64 transcoding tool, which should be the same.
So let’s go back to the original problem. Did your accountant calculate it? The char type occupies 1 byte, that is, 8 bits. The number of positive integers that can be stored is 0111 1111, that is, 127
The answer is at the end of the passage
ASCII code reference table
Bin(binary) | Oct(octal) | Dec(decimal) | Hex(Hex) | Abbreviation/character | explain |
---|---|---|---|---|---|
0000, 0000, | 00 | 0 | 0x00 | NUL(null) | Null character |
0000, 0001, | 01 | 1 | 0x01 | SOH(start of headline) | The title to start |
0000, 0010, | 02 | 2 | 0x02 | STX (start of text) | The text start |
0000, 0011, | 03 | 3 | 0x03 | ETX (end of text) | The body of the end |
0000, 0100, | 04 | 4 | 0x04 | EOT (end of transmission) | End of transmission |
0000, 0101, | 05 | 5 | 0x05 | ENQ (enquiry) | request |
0000, 0110, | 06 | 6 | 0x06 | ACK (acknowledge) | Receive notification |
0000, 0111, | 07 | 7 | 0x07 | BEL (bell) | Ring the bell |
0000, 1000, | 010 | 8 | 0x08 | BS (backspace) | backspace |
0000, 1001, | 011 | 9 | 0x09 | HT (horizontal tab) | Horizontal TAB |
0000, 1010, | 012 | 10 | 0x0A | LF (NL line feed, new line) | linefeed |
0000, 1011, | 013 | 11 | 0x0B | VT (vertical tab) | Vertical TAB |
0000, 1100, | 014 | 12 | 0x0C | FF (NP form feed, new page) | The page key |
0000, 1101, | 015 | 13 | 0x0D | CR (carriage return) | The enter key |
0000, 1110, | 016 | 14 | 0x0E | SO (shift out) | Without switching |
0000, 1111, | 017 | 15 | 0x0F | SI (shift in) | Enable switch |
0001, 0000, | 020 | 16 | 0x10 | DLE (data link escape) | Data link escape |
0001, 0001, | 021 | 17 | 0x11 | DC1 (device control 1) | Device Control 1 |
0001, 0010, | 022 | 18 | 0x12 | DC2 (device control 2) | Device Control 2 |
0001, 0011, | 023 | 19 | 0x13 | DC3 (device control 3) | Device Control 3 |
0001, 0100, | 024 | 20 | 0x14 | DC4 (device control 4) | Equipment Control 4 |
0001, 0101, | 025 | 21 | 0x15 | NAK (negative acknowledge) | Refused to accept |
0001, 0110, | 026 | 22 | 0x16 | SYN (synchronous idle) | Synchronous idle |
0001, 0111, | 027 | 23 | 0x17 | ETB (end of trans. block) | End transfer block |
0001, 1000, | 030 | 24 | 0x18 | CAN (cancel) | cancel |
0001, 1001, | 031 | 25 | 0x19 | EM (end of medium) | End of the medium |
0001, 1010, | 032 | 26 | 0x1A | SUB (substitute) | Instead of |
0001, 1011, | 033 | 27 | 0x1B | ESC (escape) | Escape code (overflow) |
0001, 1100, | 034 | 28 | 0x1C | FS (file separator) | File separator |
0001, 1101, | 035 | 29 | 0x1D | GS (group separator) | Grouping operators |
0001, 1110, | 036 | 30 | 0x1E | RS (record separator) | Record separator |
0001, 1111, | 037 | 31 | 0x1F | US (unit separator) | Cell separator |
0010, 0000, | 040 | 32 | 0x20 | (space) | The blank space |
0010, 0001, | 041 | 33 | 0x21 | ! | Exclamation point |
0010, 0010, | 042 | 34 | 0x22 | “ | Double quotation marks |
0010, 0011, | 043 | 35 | 0x23 | # | Well no. |
0010, 0100, | 044 | 36 | 0x24 | $ | The dollar sign |
0010, 0101, | 045 | 37 | 0x25 | % | percent |
0010, 0110, | 046 | 38 | 0x26 | & | And no. |
0010, 0111, | 047 | 39 | 0x27 | ‘ | Close your quotes |
0010, 1000, | 050 | 40 | 0x28 | ( | Opening parenthesis |
0010, 1001, | 051 | 41 | 0x29 | ) | Closing parenthesis |
0010, 1010, | 052 | 42 | 0x2A | * | The asterisk |
0010, 1011, | 053 | 43 | 0x2B | + | A plus sign |
0010, 1100, | 054 | 44 | 0x2C | . | The comma |
0010, 1101, | 055 | 45 | 0x2D | – | Minus/dash |
0010, 1110, | 056 | 46 | 0x2E | . | An end |
0010, 1111, | 057 | 47 | 0x2F | / | slash |
0011, 0000, | 060 | 48 | 0x30 | 0 | Character 0 |
0011, 0001, | 061 | 49 | 0x31 | 1 | Character 1 |
0011, 0010, | 062 | 50 | 0x32 | 2 | Character 2 |
0011, 0011, | 063 | 51 | 0x33 | 3 | Character 3 |
0011, 0100, | 064 | 52 | 0x34 | 4 | 4 characters |
0011, 0101, | 065 | 53 | 0x35 | 5 | 5 characters |
0011, 0110, | 066 | 54 | 0x36 | 6 | 6 characters |
0011, 0111, | 067 | 55 | 0x37 | 7 | 7 characters |
0011, 1000, | 070 | 56 | 0x38 | 8 | Character 8 |
0011, 1001, | 071 | 57 | 0x39 | 9 | 9 characters |
0011, 1010, | 072 | 58 | 0x3A | : | The colon |
0011, 1011, | 073 | 59 | 0x3B | ; | A semicolon |
0011, 1100, | 074 | 60 | 0x3C | < | Less than |
0011, 1101, | 075 | 61 | 0x3D | = | The equal sign |
0011, 1110, | 076 | 62 | 0x3E | > | Is greater than |
0011, 1111, | 077 | 63 | 0x3F | ? | The question mark |
0100, 0000, | 0100 | 64 | 0x40 | @ | E-mail symbol |
0100, 0001, | 0101 | 65 | 0x41 | A | Capital letter A |
0100, 0010, | 0102 | 66 | 0x42 | B | Capital B |
0100, 0011, | 0103 | 67 | 0x43 | C | Capital C |
0100, 0100, | 0104 | 68 | 0x44 | D | Capital D |
0100, 0101, | 0105 | 69 | 0x45 | E | Capital E |
0100, 0110, | 0106 | 70 | 0x46 | F | Capital F |
0100, 0111, | 0107 | 71 | 0x47 | G | Capital G |
0100, 1000, | 0110 | 72 | 0x48 | H | Capital H |
0100, 1001, | 0111 | 73 | 0x49 | I | Capital I |
0100, 1010, | 0112 | 74 | 0x4A | J | Capital J |
0100, 1011, | 0113 | 75 | 0x4B | K | Capital K |
0100, 1100, | 0114 | 76 | 0x4C | L | Capital LETTER L |
0100, 1101, | 0115 | 77 | 0x4D | M | Capital M |
0100, 1110, | 0116 | 78 | 0x4E | N | Capital letter N |
0100, 1111, | 0117 | 79 | 0x4F | O | Capital O |
0101, 0000, | 0120 | 80 | 0x50 | P | Capital P |
0101, 0001, | 0121 | 81 | 0x51 | Q | Capital Q |
0101, 0010, | 0122 | 82 | 0x52 | R | Capital R |
0101, 0011, | 0123 | 83 | 0x53 | S | Capital S |
0101, 0100, | 0124 | 84 | 0x54 | T | Capital LETTER T |
0101, 0101, | 0125 | 85 | 0x55 | U | Capital U |
0101, 0110, | 0126 | 86 | 0x56 | V | Capital V |
0101, 0111, | 0127 | 87 | 0x57 | W | Capital W |
0101, 1000, | 0130 | 88 | 0x58 | X | Capital X |
0101, 1001, | 0131 | 89 | 0x59 | Y | Capital Y |
0101, 1010, | 0132 | 90 | 0x5A | Z | Capital Z |
0101, 1011, | 0133 | 91 | 0x5B | [ | The open |
0101, 1100, | 0134 | 92 | 0x5C | \ | The backslash |
0101, 1101, | 0135 | 93 | 0x5D | ] | Close brackets |
0101, 1110, | 0136 | 94 | 0x5E | ^ | caret |
0101, 1111, | 0137 | 95 | 0x5F | _ | The underline |
0110, 0000, | 0140 | 96 | 0x60 | ` | Order quotes |
0110, 0001, | 0141 | 97 | 0x61 | a | Lowercase letter A |
0110, 0010, | 0142 | 98 | 0x62 | b | Lowercase B |
0110, 0011, | 0143 | 99 | 0x63 | c | Lowercase C |
0110, 0100, | 0144 | 100 | 0x64 | d | Lowercase D |
0110, 0101, | 0145 | 101 | 0x65 | e | Lowercase E |
0110, 0110, | 0146 | 102 | 0x66 | f | Lowercase F |
0110, 0111, | 0147 | 103 | 0x67 | g | Lowercase G |
0110, 1000, | 0150 | 104 | 0x68 | h | Lowercase H |
0110, 1001, | 0151 | 105 | 0x69 | i | Lowercase I |
0110, 1010, | 0152 | 106 | 0x6A | j | Lowercase J |
0110, 1011, | 0153 | 107 | 0x6B | k | Lowercase LETTER K |
0110, 1100, | 0154 | 108 | 0x6C | l | Lowercase LETTER L |
0110, 1101, | 0155 | 109 | 0x6D | m | Lowercase M |
0110, 1110, | 0156 | 110 | 0x6E | n | Lowercase letter N |
0110, 1111, | 0157 | 111 | 0x6F | o | Lowercase o |
0111, 0000, | 0160 | 112 | 0x70 | p | Lowercase P |
0111, 0001, | 0161 | 113 | 0x71 | q | Lowercase letter Q |
0111, 0010, | 0162 | 114 | 0x72 | r | Lowercase R |
0111, 0011, | 0163 | 115 | 0x73 | s | Lowercase S |
0111, 0100, | 0164 | 116 | 0x74 | t | Lowercase T |
0111, 0101, | 0165 | 117 | 0x75 | u | Lowercase U |
0111, 0110, | 0166 | 118 | 0x76 | v | Lowercase V |
0111, 0111, | 0167 | 119 | 0x77 | w | Lowercase W |
0111, 1000, | 0170 | 120 | 0x78 | x | Lowercase x |
0111, 1001, | 0171 | 121 | 0x79 | y | Lowercase y |
0111, 1010, | 0172 | 122 | 0x7A | z | Lowercase Z |
0111, 1011, | 0173 | 123 | 0x7B | { | Flowering brackets |
0111, 1100, | 0174 | 124 | 0x7C | | | vertical |
0111, 1101, | 0175 | 125 | 0x7D | } | Closing curly braces |
0111, 1110, | 0176 | 126 | 0x7E | ~ | The waves, |
0111, 1111, | 0177 | 127 | 0x7F | DEL (delete) | delete |
The Base64 Alphabet Mapping Table
The index | Corresponding character | The index | Corresponding character | The index | Corresponding character | The index | Corresponding character |
---|---|---|---|---|---|---|---|
0 | A | 17 | R | 34 | i | 51 | z |
1 | B | 18 | S | 35 | j | 52 | 0 |
2 | C | 19 | T | 36 | k | 53 | 1 |
3 | D | 20 | U | 37 | l | 54 | 2 |
4 | E | 21 | V | 38 | m | 55 | 3 |
5 | F | 22 | W | 39 | n | 56 | 4 |
6 | G | 23 | X | 40 | o | 57 | 5 |
7 | H | 24 | Y | 41 | p | 58 | 6 |
8 | I | 25 | Z | 42 | q | 59 | 7 |
9 | J | 26 | a | 43 | r | 60 | 8 |
10 | K | 27 | b | 44 | s | 61 | 9 |
11 | L | 28 | c | 45 | t | 62 | + |
12 | M | 29 | d | 46 | u | 63 | / |
13 | N | 30 | e | 47 | v | ||
14 | O | 31 | f | 48 | w | ||
15 | P | 32 | g | 49 | x | ||
16 | Q | 33 | h | 50 | y |
The topic answer
Since a char character takes one byte, that is, 8bit. So the original character binary code length is: 12 * 8 = 96; Because the original length is just a multiple of 3. Binary length converted to 6 bits: 96/6 = 16; If the length is 20:20 times 8/6, it’s not a multiple of 3. So we’re going to have to make it a multiple of 3. So 21 times 8/6 is 28;