preface

There are a lot of articles about HTTPS, everyone’s articles are really detailed, I really admire the skills. Although there have been such excellent articles, but it does not affect my enthusiasm for writing, learning knowledge, sharing is an essential part. The article is not copied, but their own understanding of the pure hand after processing.

HTTPS: HTTPS: HTTPS: HTTPS: HTTPS: HTTPS: HTTPS: HTTPS: HTTPS: HTTPS

The soul of torture

Have you ever thought that certificates are for encryption? 😄

Symmetric encryption

Both communication parties use the same key to encrypt and decrypt information. Only those who know the key can access the content.

However, the communication parties must agree on the key, and the key cannot be leaked. In fact, as long as network transmission is used, there is a risk of packet leakage. Besides, HTTP protocol is plaintext transmission, both sides agree on the process of key, is completely under the eyes of the public. There is no guarantee that your key will not be intercepted by a third party. Walls have ears, right? So we have non-heap encryption

Asymmetric encryption

Each of the communication parties has two keys. The public key can be known to anyone, and the private key is known only to oneself.

In addition, only the corresponding private key can unlock the contents encrypted with a public key, and only the corresponding public key can unlock the contents encrypted with a private key. The so-called asymmetry.

According to this feature, USER A sends A message to user B, uses USER B’s public key to encrypt the message, and user B decrypts the message using its private key to obtain the content.

Therefore, when communicating, the two parties first exchange public keys. After the communication between A and B starts, the middleman can intercept A’s public key, replace it with his own public key, and send it to B. As long as B uses the public key for encryption, the middleman can decrypt with his own private key to obtain the content. Although the communication between the two sides is not a public view, but also exposed to the middleman, is always not safe.

Well, you might ask, wouldn’t B be encrypted without the public key of the middleman?

In fact, B does not know whether the public key is A’s, and A is equally uncertain whether the public key it receives is from B.

So that begs the question. How does A determine that the public key you use for encryption is indeed B’s, and not A middleman’s? That is, how to determine whether the public key identity is valid?

Digital Certificates:

As a matter of fact, given that identification must be provided by a third party, we must trust it unconditionally. You can’t prove who you are. This third-party organization is usually a Certificate Authority (CA). The specific process for issuing certificates is as follows:

B sends his personal information and public key to the CA. , the CA organization uses hash algorithm to calculate the personal information and public key sent to generate a summary, and uses the CA’s own private key for encryption to generate a digital signature. Finally, the digital signature is combined with the original information (B’s public key and information) to generate a digital certificate.

The hash algorithm is used here, and its characteristic is that as long as the original information changes, the calculated value will change dramatically. To determine if the content has been tampered with

Here’s another picture:

Therefore, during communication, B first sends its certificate to A, who uses the same hash algorithm to calculate B’s information in the certificate and generate A summary. At the same time, decrypt the digital signature in the certificate, get the preencrypted abstract, and compare the content of the two abstracts to determine whether the public key is tampered with.

The fundamental question

  1. Can a public key be sent to a CA without being intercepted by a middleman?
  2. The CA’s private key is used to encrypt the digital signature. The CA’s public key is required for decryption. How do we obtain the CA’s public key? If the process of obtaining the public key is intercepted again, won’t it become an endless loop?

Root certificate

The following is excerpted from Baidu Baike

The root certificate is issued by the CA and is the start point of the trust chain. Installing a root certificate means trusting the CA authority. Technically speaking, a certificate actually contains three parts: the user’s information, the user’s public key, and the CA center’s signature of the information in the certificate. Verify a certificate of authenticity (that is, to verify the CA center’s signature on the certificate information effectively), to use public key to verify the CA center, and the public key exists in the CA center for signing certificate, the certificate so you need to download the certificate, but use the certificate validation and need to verify the authenticity of the certificate itself, so it is also use the certificate issued certificates to verify, So this is a chain of certificates, where does this chain end? The root certificate is a special certificate issued by itself. Downloading the root certificate indicates that you trust the certificates issued by the root certificate. Technically, it is to establish a chain to verify the certificate information, and the verification of the certificate traces back to the root certificate. So users must download the root certificate before using their own digital certificates.

Generally speaking, the root certificate is built into the operating system/browser and is released with the operating system. You trust the operating system and you trust the certificate. This is where the certificate chain ends.

HTTPS

Finally, HTTPS. We mentioned encryption algorithms and certificates earlier. How is HTTPS selected?

First, to be safe, we should use asymmetric encryption algorithms. But in fact the efficiency of this algorithm is not high enough, it can be imagined that there are two encryption and decryption process. Symmetric encryption algorithms, on the other hand, are fairly efficient. So this is the idea:

The information is transmitted using symmetric encryption algorithms, but the key is transmitted using asymmetric encryption algorithms. So keep in mind that in HTTPS, asymmetric encryption algorithms are only used to pass the keys of symmetric encryption algorithms.

handshake

The HTTPS handshake is divided into four stages:

  1. The browser initiates an HTTPS request, generating a random number along with a list of supported encryption algorithms and protocol version numbers.
  2. The server checks whether the protocol version is consistent. If the protocol version is consistent, the server generates a random number and sends the certificate to the client. Otherwise, the client rejects the request.
  3. The browser uses the root certificate to verify whether the server certificate is valid. If yes, the server public key in the certificate is valid. Generate a random number, regenerate it into a hash of all contents (there are three random numbers), and send it encrypted using the server’s public key. If the certificate is invalid, the browser displays a risk message.
  4. After receiving the message, the server decrypts the message using the private key, performs the same hash calculation on all the messages, and compares them with the received hash values. If the contents are consistent, three random numbers are used to generate the session key (that is, the symmetric encryption key) according to the agreed encryption algorithm, and the client shakes hands.

Notice that the first two handshakes are transmitted in clear text. Next, the client and server enter encrypted communication using the plain HTTP protocol, with the session key used to encrypt the content

Why do we need three random numbers? These three random numbers are also called pre-master keys

To quote d520:

Both the client and the server need random numbers so that the generated key is not the same every time. Because SSL certificates are static, it is necessary to introduce a randomness factor to ensure the randomness of the negotiated keys. For the RSA key exchange algorithm, the pre-master-key itself is a random number, plus the randomization in the Hello message, three random numbers through a key exporter finally exported a symmetric key. The existence of Pre Master is that SSL protocol does not trust each host to generate a completely random random number. If the random number is not random, pre Master Secret may be guessed, so it is not appropriate to only apply Pre Master Secret as the key. Therefore, it is necessary to introduce a new random factor, so it is not easy to guess the key generated by the client and server together with the three random numbers pre Master secret. One pseudo-random may not be random at all, but three pseudo-random will be very close to random. Every increase of one degree of freedom, the randomness will not increase by one.

How to recover after disconnection:

  1. Session ID: Each session has a ID. This ID is required for reconnection after a session is interrupted. The server does not need to reconnect if it has this record. This approach is not very friendly to distributed clusters
  2. Session ticket: Only the server can decrypt session information. It was sent to the user in the last session.

reference

  1. [Ruan Yifeng] Overview of SSL/TLS protocol operation mechanism
  2. A story over HTTPS