What is Base64?

Base64, as the name suggests, is a 64 character set consisting of lowercase letters A-z, uppercase letters A-z, digits 0-9, symbols “+”, and “/”. Any symbol can be converted to a character in this character set, a process called Base64 encoding.

How to convert base64

First, the string (picture, etc.) is converted into a binary sequence, and then divided into several groups by every 6 binary bits, if less than 6 bits, the low zero. Each 6 bits form a new byte, with the high digit complement of 00, forming a new binary sequence, and finally finding the corresponding character according to the value in the base64 index table.

For example, suppose we had the string “ABC” and we wanted to base64 encode it. What would the result be?

The string ABC corresponds to 3 bytes with 24 bits in total, which can be divided into 4 groups according to 6 bits. 00 is added to the high position of each group. After transformation, the encoding of ABC base64 is YWJj, which is changed from 3 letters to 4 letters, so base64 is longer than the original string.

So the question is, what if the original string is not three bytes long, but only one or two bytes long?

Take two bytes as an example, according to the above conversion logic, after encoding conversion, the third byte only has 4 bits, need to add two zeros before and after the third group, the converted string is YWI. To make up the four bytes, a “=” sign is added to the end, resulting in the base64 encoding: “YWI=”

If the original character has only one byte, the principle is similar. The second byte is preceded by two zeros and followed by four zeros, resulting in the string YQ. The remaining two bytes are also padded with the equal sign “=”. So the base64 encoding of A is YQ==

The bottom line is that whenever the length of the original string is not divisible by 3, the following bits are filled with zeros.

Which scenarios are base64 used in

1. Images in HTML are represented in base64

What good is it to open up Google’s home page and see that the image in some style is not a resource address, but a Base64 encoded string? Of course, it reduces one HTTP request, but not all images are suitable for base64 processing, because the larger the image, the longer the string of base64 conversion, the higher the bandwidth requirements.

2. Mail transmission

In the early days of email, only ASCII characters were allowed, which made it impossible to send binary files such as non-ASCII characters and pictures. Therefore, in MIME, E-mail is extended. The extension protocol specifies the encoding format of content transmission, which can be Base64. Base64 encoding makes it possible to transmit pictures in E-mail.

Of course, you can transfer the content of Base64 in a URL.

In major programming languages, the base64 module is built in and can be called directly without having to reinvent the wheel yourself

The python example

# code
>>> base64.b64encode(b'abc')
b'YWJj'

# decoding
>>> base64.b64decode(b'YWJj')
b'abc'
Copy the code

In addition to basic Base64, there is a URL-safe encoding that replaces “+/” with “-_.” Since standard Base64 is not suitable for transmission directly in urls, the URL encoder will convert the “/” and “+” characters in standard Base64 into “%XX.” These “%” numbers need to be converted again when they are stored in the database.

>>> base64.b64encode(b'i\xcf\xbf')
b'ac+/'

Base64 encoding with "URL safe" replaces +/ with -_
>>> base64.urlsafe_b64encode(b'i\xcf\xbf')
b'ac-_'
Copy the code

Is Base64 an encryption algorithm?

Base64 is not an encryption algorithm, it is simply an encoding method that converts data from one form to another for transmission/storage.