This article is simultaneously published on my wechat public account. You can follow it by searching HelloWorld Jie Shao on wechat.

preface

With the rapid development of the Internet, more and more websites begin to use HTTPS to replace HTTP. Some well-known foreign Internet enterprises, including Apple, Google, Facebook and so on, have all used HTTPS. As websites gradually reduce the use of HTTP, HTTP is also beginning to fade away.

If someone asks you why do you want to replace HTTP with HTTPS? The simplest answer, of course, is to ensure the security of network data transmission, but we, as a client developer, obviously only know that this answer is not qualified, because we write code, HTTPS and HTTP are not worth paying attention to, so many of us client developers do not know it. We know that HTTPS encrypts network data, but we don’t know how it works.

However, if you want to be a qualified client developer, you must have a good understanding of HOW HTTPS works. In this way, when we encounter some problems in daily work, mastering HTTPS principles can help us solve them more effectively. Plus, when you’re applying for a job, HTTPS is a hot topic in an interview, and if you don’t understand the basics of how it works, the offer belongs to someone else.

The encryption algorithm

To better understand how HTTPS works, let’s get familiar with two concepts: what is symmetric encryption and what is asymmetric encryption.

Symmetric encryption: The same key is used for encryption and decryption. The client and server can negotiate a key to encrypt and decrypt data. Symmetric encryption has the advantage of high encryption and decryption efficiency, but the biggest disadvantage is the risk of key management and distribution, which may be intercepted by middlemen during network transmission. Typical symmetric encryption algorithms include DES and AES.

Asymmetric encryption: This encryption method uses two keys, a public key and a private key. The public key is stored on the client, and the private key is stored on the server. Only the private key can unlock the data encrypted with the public key, and only the public key can unlock the information encrypted with the private key. Asymmetric encryption has the advantage of high security, but the efficiency of data encryption and decryption is relatively low. Its representative encryption algorithm is RSA.

When you’re in a job interview, chances are the interviewer will ask you, “Does HTTPS use symmetric or asymmetric encryption?” What’s the answer? Don’t worry, you’ll find out after you read the following article.

Problems with HTTP

It is said that using HTTP for network data transfer is not secure, but what is the problem?

Because HTTP does not have the function of encryption, its information is sent in plaintext during data transmission, which is prone to data leakage, data tampering, traffic hijacking, phishing attacks and other security problems. The following figure is an example:

As you can see during HTTP transmission, middlemen can see and modify the content of HTTP communication, so using HTTP is very insecure. At this point, some people may think, since the content is in clear text, I use symmetric encryption to encrypt the packet so that the middleman can not see the clear text, so the modification is as follows:

In this way, it seems that the middleman cannot see the plaintext information, but in fact, the key will be exposed in the plaintext in the process of the first communication. If the first communication is intercepted by the middleman, the key will be leaked, and the middleman can still decrypt the subsequent communication content, as shown in the diagram below:

It may be some people say that I can make the server and the client privately agreed a key used for encryption, such key is offline, only at the ends of the developers know that the middleman crack is almost impossible, but the one thousand key leak, or client cannot be updated hot, want to change a key is need to spend a lot of time to redeploy, The damage in that time frame would be incalculable.

If we can’t avoid risk in our workflow with symmetric encryption, does that mean a computer problem is born? If you’re smart, you probably immediately think of asymmetric encryption algorithms. Let’s take a look at them:

It can be seen that in the client, we use the public key provided by the server to encrypt the data asymmetrically, and then the server decrypts the data with the private key after receiving the data. Since the middleman cannot decrypt the data transmitted through the network without the private key, the data transmission is absolutely safe. But is this really how HTTPS works? I said asymmetric encryption in the above, the disadvantage of asymmetric encryption is that the efficiency of encryption and decryption is very low, for network communication, or need to ensure the response speed, if all the data transmission to work in such a process, the communication process is bound to be affected, how to optimize it!

It have to be combined with symmetric encryption to use together, make full use of their advantages between, although symmetric encryption, decryption encryption efficiency is very high, but suffer from cannot guarantee the safety of key transmission, so our way of optimization is the first time the client and server communication using asymmetric encryption, the client randomly generated a symmetric encryption key, Then use the server to public key encrypt, once the server received the client’s message was encrypted with the private key for symmetric encryption key, so that the two sides through the safe way to obtain the symmetric encryption key, and then through symmetric encryption for network communication, so the communication efficiency is very high.

But you think that’s the end of the process? Is it really safe? It doesn’t, you might ask, but what about the public key server on the client side? Although the public key is the public data, if the middleman intercepts and tampers it into his own public key, the client receives the forged public key to encrypt the data, then the data may be decrypted by the middleman with his private key, and the security of data transmission cannot be guaranteed again.

It seems to be a dead end at this point, because there is no way to guarantee that we are getting a true public key anyway, so what should we do?

This is where CA comes in. CA is the authority responsible for issuing and managing digital certificates and verifying the validity of public keys in the public key system. So how does it help us solve such a historical conundrum? Don’t worry, let’s start the analysis step by step.

First of all, the server manager will apply to the CA mechanism, then your public key, organize information, domain name information submitted, CA agencies will use your private key to encrypt the information and generate a digital certificate, digital certificate included in the applicant’s public key, digital signature, organize information, domain name, etc., The CA then returns the digital certificate to the applicant. The applicant only needs to configure the obtained digital certificate on the server.

When the client requests the server, the server first returns the digital certificate to the client, and the client verifies the validity of the digital certificate with the CA public key. If the verification succeeds, the digital certificate is valid, and of course we have obtained our own public key. Usually, you can view the certificate information in the browser’s address bar, as shown in the following figure:

Once the public key is obtained, we can continue the work flow of using asymmetric encryption with symmetric encryption described above.

Failure to verify the digital certificate indicates that the digital certificate is a forgery, probably tampered with by a middleman, and the browser usually displays something like this:

Now, some people here are asking how do we get the public key of the CA?

This problem is easier to solve, because now the operating system will be built in all mainstream CA institutions of public key, when we decrypt, as long as we go through all the built-in CA institutions of public key system, as long as there is a CA institution of public key can decrypt digital certificate, it shows that the public key is legitimate.

The built-in certificates in Windows are as follows:

The built-in MAC certificates are as follows:

At this point, you can see that our network transport flow is very secure and reliable, but there is no mention of HTTPS, which is actually the HTTPS workflow. Now let’s go back to the interview question mentioned in the article. You already know the answer, that is, HTTPS uses a combination of asymmetric and symmetric encryption.

Plus HTTPS is really risk-free! However, this is not the case. For example, if the CA organization’s verification of the information of applicants is not so strict, and a hacker happens to know your server’s domain name and other relevant information, and then applies for a legitimate digital certificate, which can be used to intercept the communication between browser and server. So all of this is based on the conscience of the CA organization.

Well, that’s the end of this article. This article is written by me as I understand HTTPS, and the content is presented to you in an easy-to-understand way as far as possible. Welcome your criticism and correction.