How Https works

preface

As you all know, HTTP is more secure than HTTPS. Why would it be safer?

HTTP itself is transmitted in clear text without any security processing. Under HTTP, intermediaries can sniff at what users are searching for, steal privacy and even tamper with web pages.
HTTPS == HTTP+TLS/SSL, that is, the TLS/SSL layer is added under HTTP.

Typically, HTTP communicates directly with TCP. Once SSL is used, it evolves to communicate with SSL first, and then with SSL and TCP. HTTPS security is based on TLS/SSL. Both the server and client are encrypted through TLS/SSL. Therefore, the transmitted data is encrypted.

When it comes to encryption, we might wonder: How does HTTPS encrypt? What is the workflow? The following answers both questions! Bear with me and read on

Encryption – Addresses the problem of content being eavesdropped

Symmetric encryption

Also known as shared key encryption. An encryption algorithm that uses the same key for encryption and decryption. Common symmetric encryption algorithms include DES, AES, RC4, and IDEA.

How do browsers and servers have symmetric encryption aloneThe secret key AThe steps might look like this:** Disadvantages: ** can only be encrypted and decrypted with a key, how to safely transfer? When the keys are forwarded over the Internet, middlemen may get hold of them.

Asymmetric encryption

What is asymmetric encryption

There are two keys, a public key and a private key. While a private key cannot be known to anyone else, a public key can be freely distributed and available to anyone. What is encrypted with a public key can only be unlocked with a private key, and what is encrypted with a private key can only be unlocked with a public key. In this way, there is no need to send the private key, and there is no need to worry about the private key being eavesdropped and stolen by an attacker.

The disadvantages of asymmetric encryption

The public key is public, so hackers can use the public key to decrypt the information encrypted by the private key.
The public key does not contain information about the server, and the asymmetric encryption algorithm cannot ensure the legitimacy of the server’s identity. Therefore, there is a risk of man in the middle attack. The public key sent by the server to the client may be intercepted and tampered with by a man in the process of transmission.
Asymmetric encryption consumes a certain amount of time in the process of data encryption and decryption, which reduces the data transmission efficiency.

Two schemes using asymmetric encryption

Let’s think about, if we were to design HTTPS, what would be the encryption scheme? Two scenarios might come to mind:

Asymmetric encryption schemes

Given the mechanism of asymmetric encryption, we might have the following idea:

The server first transmits the public key directly to the browser in plaintext.
The browser uses this public key to encrypt data before sending it to the server.

The security of this data seems to be guaranteed! Because only the server has the appropriate private key to unlock this data. However, the ** question remains: ** How do you secure the path from server to browser?

Improved asymmetric encryption scheme

We already understand that a single set of public and private keys can guarantee the security of a single direction transmission, but using two sets of public and private keys can guarantee the security of two-way transmission? Look at the process below:

A web server has a server for asymmetric encryptionPublic key A and private key A; Browsers have tools for asymmetric encryptionPublic key B, private key B.
The browser requests it like the web server, which sends the public key A in plaintext to the browser.
The browserPublic key BPlaintext transmission to the server.
Everything the browser sends to the server is then encrypted with public key A, which the server uses when it receives itThe private key is ADecryption. Because only the server owns thisThe private key is AIt can be decrypted, so it keeps this data safe.
Everything the server sends to the browserPublic key BEncrypted for use after the browser receives itThe private key BDecryption. The same as above can also ensure the security of this data.

Why HTTPS doesn’t work this way: First of all, it works! The main reason is that asymmetric encryption is time-consuming, especially when it comes to encrypting and decrypting large data, while symmetric encryption is much faster.

Symmetric encryption + Asymmetric encryption (USING HTTPS)

Asymmetric encryption is time-consuming and symmetric encryption is much faster, so HTTPS takes this approach. Look at the process below:

A web server has a server for asymmetric encryptionPublic key A.The private key is A;
The browser sends a request to the web server;
The server willPublic key APlaintext transmission to the browser;
The browser randomly generates one for symmetric encryptionThe key of X, the use ofPublic key AEncrypted and sent to the server;
Server usageThe private key is AAfter decryptionThe key of X;
So both sides gainThe key of XAnd no one else can know it, and then all the data on both sides is usedThe key of XEncrypt and decrypt.

** Disadvantages: using symmetric + asymmetric encryption alone still has a vulnerability: man-in-the-middle attacks. So in addition to symmetry + asymmetry, HTTPS of course uses other techniques, which will be covered below.

Man-in-the-middle attack

Middlemen really can’t get browser-generatedThe key of X, this key isPublic key AIt’s encrypted. Only the server has itThe private key is ATo decrypt it. But the middleman doesn’t need to get itThe private key is AYou can do bad things! Please look at:

A web server has a server for asymmetric encryptionPublic key A and private key A;
The browser requests the server;
The serverPublic key ATo the browser;
Man-in-the-middle generationPublic key B, private key B. Hijacked toPublic key A, save the packet inPublic key AReplace it with your own fakePublic key B. And pass it to the browser.
What the browser actually receives is a middleman forgeryPublic key BIt does not know that the public key has been replaced. The browser randomly generates one for symmetric encryptionThe key of X, the use ofPublic key BEncrypted and sent to the server.
Middleman hijacked after usePublic key BAfter decryptionThe key of XAnd use it toPublic key AEncrypted and sent to the server.
Use the server when you get itPublic key ADecryption isThe key of X.

This way the middleman gets the key X without either side of the browser server noticing anything unusual. The reason: the browser cannot confirm that the public key it receives is the site’s own. To solve this problem, you need to use digital certificates. 天安门事件

Digital certificates – Addresses the issue that the communicator’s identity may be disguised

Solving the problem that the identity of the communicator may be disguised boils down to confirming whether the public key is trusted or whether the server side is disguised by a middleman. We can use digital certificates to prove it. A digital certificate applies to a CA, so let’s look at what a CA is. ** Certification Authority (CA) – Digital certificate Certification Authority. It’s like a government department in real life.

The CA issues a digital certificate

The server operator submits the public key, organization information, and personal information (domain name) to the third-party CA and applies for authentication.
CA verifies the authenticity of the information provided by the applicant through various online and offline means, such as whether the organization exists, whether the enterprise is legal, whether it has the ownership of the domain name, etc.
If the information is approved, CA will issue a certificate to the applicant. A certificate contains the following information: the public key of the applicant, the organization information and personal information of the applicant, the information about the issuing authority (CA), the validity time, and the serial number of the certificate, and a signature. The algorithm of signature generation is as follows: first, hash function is used to calculate the summary of the public plaintext information, and then CA’s private key is used to encrypt the summary of the information, ciphertext is signature;

Contents of a digital certificate

The information of the CA
Applicant information
Digital signature of the certificate itself
Hash algorithm used to sign the certificate
Certificate Holder public Key
Validity period of the certificate
Certificate serial number
…

After using the digital certificate

After applying for a digital certificate, the server sends it to the browser, which takes the public key from the certificate and, like an ID card, proves that “the public key matches the website”. At this point, the HTTPS flow looks like this:** Still has the problem: ** middlemen may have tampered with the certificate or swapped it out directly, so the certificate needs to be verified. ** Consider: ** What is the role of signatures in digital certificates? How can I prevent digital certificates from being corrupted?

Digital signature: Addresses the problem that packets may be tampered with

** To solve the problem that messages may be tampered with, in the final analysis, that is, to confirm whether the digital certificate is trusted. ** A signature in a digital certificate is actually a digital signature. I’m going to separate out the signature here. The following figure shows the generation of the signature in the certificate and the browser validating the certificate.

Generation of digital signatures

The CA has private and public keys that are asymmetrically encrypted.
The CA hashes the plaintext of the certificate to retrieve the message.
The message digest is encrypted with a private key to obtain a digital signature.

Plaintext and digital signature together constitute a digital certificate. This digital certificate can then be sent to the website.

Browser Authentication certificate

When the browser gets the certificate, it getsClear T.Digital signature S.
Use the CA’s public key pairSDecrypt (the browser keeps its public key because it is trusted by the browser), and getIn this paper, S '.
Use the hash algorithm pairs specified in the certificateClear THash to getIn this paper, T '.
To compareS 'Whether is equal to theT‘, the certificate is trusted.
In addition to the above steps, the browser also verifies the domain name information and validity period of the certificate. The client has the certificate information (including the public key) of the trusted CA built in. If the CA is not trusted, the corresponding CA certificate cannot be found and the certificate is judged to be invalid.

Https workflow:

The client initiates an HTTPS request:

The user enters an HTTPS url into the browser and connects to port 443 of the server.

The server returns the pre-configured public key certificate.
Client verification certificate:

For example, whether it is within the validity period; Whether the purpose of the certificate matches the site requested by the client; Whether a certificate is in the CRL revocation list and whether its upper-level certificate is valid is a recursive process until the Root certificate (built-in Root certificate of the operating system or built-in Root certificate of the Client) is verified. If the verification is successful, continue; Otherwise, a warning message is displayed.

Generate and send the symmetric key after encryption: The client uses the pseudo-random number generator to generate the symmetric key for encryption, encrypts the symmetric key with the public key of the certificate, and sends the symmetric key to the server.
The server obtains the symmetric key: The server decrypts it using its own private key to obtain the symmetric password. At this point, both the server and client have the same symmetric key, and the subsequent data can be encrypted using this symmetric key.
The server encrypts plaintext A with the symmetric key and sends it to the client.
The client uses symmetric key decryption to obtain plaintext A.
The client initiates an HTTPS request again and encrypts plaintext B using the symmetric key. The server then decrypts the ciphertext using the symmetric key to obtain plaintext B.

Q&A

Is it possible for the middleman to tamper with the certificate?

Suppose the middleman has altered the text of the certificate. Since he does not have the private key of the CA, he cannot obtain the encrypted signature and modify the signature accordingly. After receiving the certificate, the browser finds that the hash value of the original text is different from the hash value of the decrypted signature. As a result, the authentication fails and the certificate is not trusted. In this case, the browser stops transmitting information to the server to prevent information leakage to the middleman.

Ponder: Can’t change, so what about swapping the entire certificate?

Is it possible for the middleman to switch the certificate?

Suppose the middleman intercepts the certificate sent by the server to the browser, replaces it with a forged certificate, and sends it to the browser. The browser then mistakenly gets the forged certificate’s public key. But that’s not going to happen. Middlemen cannot easily forge certificates. First, when applying for a certificate, middlemen must also submit their own information to the CA, and this information must be true. Second, even if the middleman successfully requests and replaces the certificate, the browser only needs to compare the information to see if the certificate is valid.

Why do I need to hash a digital signature once?

The most obvious problem is performance. We have already said that asymmetric encryption is inefficient, and the certificate information is usually long and time-consuming. The hash results in a fixed length of information (for example, the MD5 algorithm hash results in a fixed 128-bit value), which makes encryption and decryption much faster. And, of course, the security reasons, this part is relatively deep, interested can look at the answer: crypto.stackexchange.com/a/12780

How to prove that the public key of a CA organization is trusted?

You may have noticed that WHEN I mentioned the public key of the CA, I almost said, “the browser keeps its public key.” What is the holding method? How do I prove that the public key is trustworthy? The digital certificate is used to prove that the public key of the website server is reliable, that is, “is the public key corresponding to the website/organization”. Can the public key of the CA organization also be proved by the digital certificate? Yes, the operating system and the browser themselves pre-install some of their trusted root certificates, and if they have the CA’s root certificate, they can get its corresponding trusted public key. In fact certificate authentication between can also more than one layer, can trust B, C, B trust by analogy, we call it the trust chain or digital certificate chain, which is A series of digital certificate, by the root certificate as A starting point, through the layers of trust, make the terminal entity certificate holders can get switched to trust, to prove identity. In addition, I wonder if you have encountered a website access failure, prompting to install the certificate situation? This is where the root certificate is installed. The browser does not recognize the authority that issued the certificate to the site, so without the authority’s root certificate, you will have to manually download and install it (XD at your own risk). Once you install the root certificate for that organization, you have its public key, which you can use to verify that certificates sent from the server are trusted.

Must HTTPS first handshake transfer keys at the SSL/TLS layer on every request?

The first time. The server maintains a session ID for each browser (or client software) and sends it to the browser during the TSL handshake phase. After the browser generates the key and sends it to the server, the server stores the key in the corresponding session ID. After that, the browser will carry the Session ID with each request. The server will find the corresponding key according to the session ID and decrypt the encryption operation, so there is no need to make and transfer the key every time!

conclusion

Https uses the following encryption schemes: symmetric and asymmetric encryption.
Https uses digital certificates to solve the problem of communication parties being masqueraded (in other words, preventing middlemen from masquerading as servers).
Https uses digital signatures to prevent packet tampering (that is, to prevent the digital certificate from being tampered or swapped).