Introduction: HTTPS involves related knowledge, it is always difficult to summarize it, this article aims to take you to learn the details of HTTPS related knowledge, after reading this article, you will know the following related knowledge;

  • How HTTPS works
  • Why is HTTPS designed this way
  • Related principles of cryptography

This article is a long one, so it is recommended to collect it and eat it slowly. Then let’s start our learning journey of HTTPS!

This chapter directory

  • 1. What is HTTPS?
  • 2. What is encryption?
  • 3. What is symmetric encryption?
  • 4. What is asymmetric encryption?
  • 5. Digital certificates
  • 6. What is SSL/TSL?

Structure diagram of this chapter


1. What is HTTPS?

1.1 introduction to HTTPS

HyperText Transfer Protocol Secure (HTTPS);

HTTPS is a secure transport protocol extended from HTTP. Long ago, HTTP was criticized for its insecurity. In some financial fields, HTTP is far from meeting the needs of sensitive information communication, so a secure transport protocol is imminent.

HTTPS was invented to protect data privacy and integrity over the Internet. It was first proposed by Netscape in 1994 and has since been extended to the Internet.

1.2. HTTP pain points

HTTP protocol, we currently use the most, is also the most extensive protocol, almost all the time we use it, but HTTP has a fatal disadvantage, is not safe transmission;

The world of the network is very complex, if someone wants to steal or tamper with some of the data you want to transmit, then it is very easy to appear on the HTTP protocol, such as login, transfer these extremely sensitive operations, if you do not use a secure transmission protocol, it is easy to be used by criminals, resulting in heavy losses;

Therefore, secure transmission protocols are naturally born, and to ensure security, encryption is essential, and HTTPS is based on these encryption algorithms to ensure secure transmission;

1.3. How does HTTPS work?

HTTPS works on the basis of HTTP. It encrypts traffic using an encryption protocol called Transport Layer Security (TLS), formerly known as SSL.

The protocol ensures the security of data transmission by means of negotiation key and authentication of identity.

And this involves a lot of algorithms, such as hash algorithm, symmetric encryption algorithm, asymmetric encryption algorithm, digital certificates and other related knowledge;

Let’s break it down step by step and explore it further;

2. What is encryption?

2.1, encryption,

Converting content into unrecognizable ciphertext is a process called encryption;

For example, there is a string of Chinese: I am the flower of the motherland.

Then the encryption might be: HSUUI&&*768SASKD&7980%8SHOS% $hushHHD &6788

Therefore, the above string of ciphertext cannot be read directly. The content can be seen only after decryption.

There are many encryption methods. Here are some of the main encryption methods used;

2.2, the hash

Hashing algorithm is an irreversible encryption algorithm.

Why is it called irreversible?

Because the encrypted content can not be converted to the original content, so here you will have a question?

What’s the use of encryption if it can’t be converted to the original?

The use of irreversible encryption, most are used for data verification, irreversible encryption algorithm has a great feature is, and the original content is strongly related, what does it mean?

That is, the original content, the value generated by the irreversible algorithm, is closely related to the original content, and if the original content is tampered with a character or a few characters, the value generated will be very different from the original;

Therefore, this feature is very suitable for data verification in the process of transmission, judging whether the data is tampered with in the process of transmission;

However, data tampering cannot be avoided, but the tampering situation can be observed.

3, symmetric encryption

3.1. What is symmetric encryption

Symmetric encryption, in short, uses the same key for encryption and decryption;

That is to say, both parties using symmetric encryption hold the same key for encryption and decryption.

Below we can see from a picture of symmetric encryption algorithm encryption decryption process;


In symmetric encryption, there are several common mathematical algorithms, the purpose is to make the encrypted data as complex as possible, avoid easy Po solution, so what are the common mathematical algorithms?

3.2. What are the mathematical calculations involved in symmetric encryption?

(1) Shift and circular shift shift is a number in a certain order to move, for example, 12345678, then move right to 23456781, move left to 81234567; (2) the displacement of the digital data according to the displacement table, the data after the move will become disorderly; For example, a substitution table is: 2,4,1,5,3,6; So the number 123456, according to the substitution table, after the substitution, the number is 315246;

Of course, I’m just giving you an example, but the actual substitution table has 64 bits, which is much more complicated than that;

(3) Expansion

To expand the data in the digital to longer than the original data, you can use the replacement table to expand;

(4) compression

Compress the data in the number into shorter data than the original, and similarly can use the displacement table to compress;

(5) Xor is binary Boolean algebra operation

(6) Iteration

Performing multiple repetitions, common in encryption algorithms, can make data more complex and difficult to solve;

3.3. Working principle of DES algorithm

Here we will analyze the most common symmetric encryption algorithm DES, and its corresponding working principle;

(1) : DES description

DES algorithm, also known as Data Encryption Standard, is a symmetric Encryption algorithm developed by IBM. DES algorithm is a typical block Encryption algorithm and is the most widely used symmetric Encryption algorithm.

(2) : WORKING principle of DES

Let’s start with a flow chart:


Here involves too many complex operations, a half will not finish, here is the first thumbnail, interested in the big guy to see the article: algorithm popular science: mysterious DES encryption algorithm

DES asymmetric encryption algorithm is the most common group encryption algorithm, its core lies in the displacement and displacement of the mathematical operation, because its encryption algorithm is open, so the secret key is very key, as long as the key leaked, then the data will be easy to Po solution;

4. Asymmetric encryption

4.1. What is Asymmetric encryption?

Asymmetric encryption, simple to understand, is the encryption and decryption key is not the same, as the name suggests asymmetric;

Let’s take RSA asymmetric encryption as an example.

Asymmetric encryption has two sets of keys, one is the public key and the other is the private key. This is related to the principle of asymmetric encryption, please continue to read!

Asymmetric encryption key format is (a, b), a and b can be any integer, such as public key (1234,12), private key (1234,34), etc. Of course, THIS is just an example, and there is no relationship between the public key and the private key;

Why does it say that what I just wrote here is irrelevant? Because the public key and private key are one-to-one correspondence, that is to say, encrypted with the public key, can only be unlocked with the corresponding private key.

Do you have a question at this point, why asymmetric encryption and decryption of the secret key is inconsistent? Unlike symmetric encryption, the algorithm can encrypt and decrypt the content;

4.2 RSA Asymmetric Encryption Algorithm

RSA encryption and decryption algorithms;

Suppose our encrypted public key is (n, e);

Encryption algorithm:

( )

RSA encryption algorithm is the above formula, the so-called encryption, is to find this formula c, where m stands for plaintext;

The remainder of m to the power e divided by n is c.

Suppose our plaintext M is 12 and our public key is (3233, 17);

Then after plugging into the formula: ( )

So the result C is: 1730, so 1739 is the result of using the public key encryption;

Now let’s look at the decryption algorithm;

Suppose the private key we decrypt is (n, d);

Decryption algorithm:

( )

RSA decryption algorithm for the above formula, the so-called decryption, is to find the formula m, where C represents the ciphertext;

Interpretation of this formula: the remainder of c ^ D divided by n is m, find m;

Here, the decrypted private key is 3233,2753, and the ciphertext c encrypted by the public key is 1730.

Then after plugging into the formula: ( )

So the result is: 12;

To here RSA encryption and decryption algorithm is finished, RSA encryption and decryption and symmetric encryption algorithm DES is not the same, not as complex as DES operation, all kinds of data displacement, iteration, inverse displacement and so on, RSA encryption and decryption is a simple and rough n power calculation, the remainder;

Here are only two simple keys, but the actual key can not be so short. Currently, there are public Po solutions of 768 bits, relatively secure ones are 1024 bits, and super secure ones are 2048 bits.

So, have you seen why RSA asymmetric encryption is time-consuming compared to symmetric encryption?

This is because the encryption and decryption of the RSA algorithm are squared multiple times. The longer the key is, the more squares are squared, which slows down the operation.

4.3 Principle of RAS asymmetric encryption algorithm

RSA asymmetric encryption and decryption algorithm, so you have a question?

Two sets of different secret keys, through different algorithms to confirm the plaintext encryption, normal decryption to the original plaintext, how to do this? So let us continue with the question;

Before we begin, let’s talk about the mathematics involved in asymmetric encryption;

1. Mutual relationship

What is a reciprocal relationship?

If two positive integers have no common factors other than 1, they are said to be coprime.

For example, 15 and 11,24 and 19, which I believe is not difficult to understand;

2. Euler function

What is euler function?

Consider this question: given a positive integer, how many mutual primes are less than the positive integer?

Here’s how the above question worksThe euler function.The euler functionIs used to find the number of prime numbers less than a certain number, useTo represent;

Such as, the prime numbers of 10 are 9,7,5,3,1, so they add up to 5.

Euler’s theorem

Euler’s theorem states that if two integers A and B are mutually prime, the following formula holds;

( )

Interpretation: AThe remainder of the power over b is 1;It’s euler’s function;

The proof of Euler’s theorem is a little bit more complicated, so we don’t want to get too far ahead of ourselves;

4. Modular antielements

The modular inverse element is: if two positive integers A and B are mutually prime, then the integer K must be found such that AK -1 is divisible by B, or the remainder of ak divided by B is 1.

Formula is: ( )

Euler’s theorem can be used to prove that the above formula must be true;

Such as: ( )

Now that the math is done, let’s talk about how public and private keys are generated for asymmetric encryption;

Key generation steps:

Step 1: Randomly select two mutual prime numbers P and Q;

Step 2: If you want to multiply these two prime numbers, you get n.

Step 3: Find the Euler function of this result, i.e

Step 4: Randomly select an integer e that is greater than 1 and less than 1;

Step 5: Calculate e to phiModulo inverse element D, which is the formula: ( );

Finally, we have six data points: P, Q, N,, e, d;

Here, n and D are encapsulated as public keys (n,e), and n and D are encapsulated as private keys (n, D). Of course, this is only an example. The actual situation is represented in ASN.1 format.

The public key is (n,e), and the public key is (n,e). Can we deduce the private key in the case of the public key?

Given n, we want to solve for d;

According to the formula in step 5 above, ( To figure out what d is, you need to knowAnd e to figure it out;

And according to the formula in step 3 above,.It is worth knowing p and Q before we can solve it;

However, the decomposition of large integers is extremely difficult. Currently, the longest RSA key solved by Po is 768 bits, while 1024 bits are relatively secure, 2048 bits are extremely secure, and 4096 bits are abnormal security.

For the proof of the private key decryption formula, here is not more in-depth exploration, interested in the giant wrote two articles, super detailed;

Principle of RSA Algorithm (1) Principle of RSA Algorithm (2)

5. Digital certificates

5.1 What is a digital signature?

Having looked at asymmetric encryption, let’s look at digital signatures;

Before we read, we to think about A problem, the communication of both sides, if there is A third party, namely the hackers to intercept the data sent host A to host B, and tampering with, after it is sent to the host B, and host B to get at this moment has not host A hair past the raw data, the data has been tampered with, So how do we avoid that?

The answer: digital signatures

So there might be a question here, right? Why don’t I use asymmetric encryption? Even if a third party intercepts my data, it still can’t decrypt it.

Yes, asymmetric encryption can prevent Po from solving, but it still cannot prevent tampering. If an attacker modifies the intercepted ciphertext by a few characters, then the contents decrypted by the recipient through the private key will not be the original content, so this scheme is still not rigorous;

Let’s take a look at what is a digital signature?

In fact, digital signature is equivalent to human signature. Digital signature is to sign the data during communication. The function of signature is that the two sides of communication can identify the identity of the data, so as to avoid being forged.

So how does a digital signature work?

If hosts A and B are communicating, host A has A private key, and host B has the public key of host A.

First, host A uses the Hash algorithm to generate A digest value for the data, and then encrypts the digest value with the private key. Host A attaches the digest value to the data and sends it to Host B.

After receiving the data, host B uses the public key to decrypt the abstract, and then uses the Hash algorithm to generate the abstract value of the data. The generated abstract value is compared with the decrypted abstract value. If the generated abstract value is consistent, the owner of the data is considered to be the right person, rather than the fake third party.

In the first case, encrypt with public key and decrypt with private key:


In the second case, encrypt with private key and decrypt with public key:


Now, if you look at this, do you wonder, are there still risks in these two situations?

If the first case, the public key is stolen by others, so others can use this public key to fake identity communication, and the communication of the other party can not identify;

In the second case, if the private key is compromised, someone else can use the private key to communicate with a false identity.

So how to solve these two problems?

Then it is our turn to the protagonist, that is: digital certificate;

5.2. Digital Certificates

What is a digital certificate?

It is easy to understand that digital certificate is equivalent to id card, used to prove that you are you, rather than being impersonated by others, and the role of digital certificate, used for the server to show the certificate to verify identity;

The role is to avoid third-party attacks, that is, others pretend to be you to communicate with each other, in the communication play the role of identity;

How does a digital certificate play a role in identity authentication?

Digital certificates are issued by an authority called CA, which is also known as certificate Authority.

The CA center, similar to the public Security Bureau, gives us an authoritative ID card that other organizations can trust.

The CA center is the principle of issuing certificates that can be trusted by browsers and clients;

So what does a certificate contain?

In fact, there is a very simple method, open the browser, input the address of Baidu, and then click the lock in the upper left corner to see the certificate used by Baidu website, as shown in the picture:


As can be seen from the figure above, the main contents of the certificate are as follows:


Let’s take a look at how digital certificates verify identity;

Assume that the browser and server are communicating over HTTPS:

First, the server sends its digital certificate (pictured above) to the browser, but the contents of the certificate are encrypted with the CA’s private key.

The browser has a CA embedded certificate containing the CA public key, which is called the root certificate. After receiving the digital certificate from the server, the browser uses the CA public key to decrypt the certificate. If the decryption succeeds, the certificate is trusted and the server is considered normal.

Then the browser will validate decrypted digital certificates, verifies the validity of the certificate, and the server address is correct (avoid phishing site), extract the certificate in the server’s public key to verify the signature, if all test passed, so would like to see the normal communication, if verification, however, it can establish the connection fails, or prompt the user, the site is not credible, Please visit carefully and give users the right to choose;

Do you have any questions at this point? Can’t a hacker copy this certificate? Isn’t that a way to fool the browser?

The hacker can’t get the CA’s private key, so he can’t forge the certificate, so this problem is not valid;

So whether you will have a question, the hacker to the authority to apply for a certificate is not on the line?

The answer is no. Why? Because the certificate application requires authentication of identity, hackers can not fake identity to apply for a certificate;

Assuming that browsers and servers use asymmetric algorithms to encrypt data, let’s look at a flow chart to see how browsers and servers verify identity through digital certificates.


6, SSL/TSL

6.1. What is SSL?

SSL is the guarantee of HTTPS, which is equal to HTTP+SSL.

The SSL protocol is located between the application layer and the transport layer to ensure secure communication between the two sides.

As shown in the figure:


TSL is an upgraded version of SSL. Although we call the security protocol SSL most of the time, TSL is mostly used.

The scheme we mentioned above uses asymmetric encryption for secure transmission, but in fact the algorithm used in asymmetric encryption is time-consuming, so it is impossible to use this asymmetric algorithm for data encryption and decryption in real commercial use.

So how to fix this defect?

Asymmetric encryption algorithm is time-consuming, while symmetric encryption algorithm is not time-consuming, but the key of symmetric encryption algorithm is not secure, so we can combine the two encryption algorithms for secure transmission.


As shown in the figure:

1. Use symmetric encryption to encrypt data; 2. Use asymmetric encryption to encrypt symmetric encryption keys;

So how is the SSL protocol implemented?

The actual implementation of SSL protocol is not as simple as the above implementation. In fact, the key negotiation is carried out through several handshakes, and the symmetric encryption key used by the communication parties is finally negotiated.

6.2 SSL Protocol

SSL protocols are classified as follows:


SSL system is divided into SSL handshake protocol layer and SSL recording protocol layer.

The SSL handshake layer contains the SSL handshake protocol, SSL password change protocol, and SSL warning protocol, which are used for SSL information exchange, encryption algorithm negotiation, and key generation.

The SSL recording protocol layer is specially designed for the application layer protocol HTTP, so that HTTP can run normally at the SSL layer, mainly used for encryption and decryption, MAC verification and other security operations.

Having said that, how does SSL guarantee secure transmission?

Don’t hurry, but listen to my details;

6.3 How is the CONNECTION established at the SSL Protocol Layer?

In the last article, I showed that TCP connections are established by shaking hands;

Android network programming TCP, UDP details

SSL enables both parties to establish secure communication channels through SSL handshakes. So how does SSL enable HTTP to achieve the purpose of secure transmission?

Before we start, let’s think about a couple of questions;

(1) How many SSL handshakes are there? (2) How does SSL handshake negotiate the key? (3) Is SSL secure channel reusable?

Next, let’s learn the SSL handshake with these questions in mind;

Let’s take a look at the flow chart of the SSL handshake:

Suppose we visit Baidu website through the client, and the client sends a request to the server. At this time, the two sides have not established secure communication.

First handshake:

The client sends a ClientHello message to the server. This message contains a random number Random1 (for subsequent key generation), information about which encryption the client supports (encryption suite), SSL version, etc.

When the server receives the message, it will return a ServerHello message containing a random number Random2, and information about which encryption the server uses (encryption suite), then the client and the server have two random numbers: Random1 + Random2;

As can be seen from the above, the main role of the first handshake is to communicate and negotiate the encrypted information supported by the communication parties, as well as generate random numbers;


Second handshake:

Second handshake initiated by the server, the server Certificate messages sent to the client, the message contains a digital Certificate, and then forwards the client authentication server Certificate, after verification, take out the public key Certificate, here how the client through the digital Certificate authentication, as has been said above, there is no longer say more;

In this case, the client verifies the identity of the server. If the server requires the client to also verify the identity, the client also needs to send the digital certificate to the server.


Third handshake:

The third handshake is initiated by the Client, which sends a Client Key Exchange message to the server. The Client generates a random number Random3, and then encrypts this random number using the server’s public Key to generate a pre-master, The encrypted pre-master is then sent to the server;

The server receives the pre-master encrypted using the public key, and the server decrypts the pre-master using the private key to get Random3, so at this time both the server and the client hold Random1 + Random2 + Random3;

So what is this thing for?

The answer is: symmetric secrecy

The server and client use this string of random numbers and use the same algorithm to generate symmetric encryption keys for data encryption of both sides of communication.

The generation of a series of random numbers avoids the problem of symmetric encryption key disclosure.


Fourth handshake:

Fourth handshake The client sends an encrypted message to the server, encrypts the message with the symmetric key, encrypts the symmetric key with the asymmetric key, and then signs the message with the asymmetric key.

The server uses the same operation to send an encrypted message to the client.

The main function of the fourth handshake is to check whether the key negotiated by both parties is valid.


Please see the specific flow chart below:


Of course, the actual handshake is even more complicated than that, because there’s a lot of computer jargon involved, and I’m not going to go into it too much, because it’s confusing, but you can understand the process;

If you want to know more about it, you can check out this article, which is very detailed:

SSL/TLS protocol details

6.4. Optimization of SSL protocol

See here, then, would you have any questions, if the client and the server every time to communicate the HTTPS, requires four times handshake to establish secure channel, so it will cause a lot of overhead, and HTTPS for commercial communication mechanism, so it must be already considered this problem, so how to solve?

The answer is: SSL session reuse;

Session multiplexing, which means that two parties that have established an HTTPS connection can reuse the established channel when they communicate with each other again.

So how do you reuse it?

There are two options:

The first scheme: session ID mechanism;

1. If the client and server have successfully established a connection, the server will return a session ID to the client, and the server will save the communication information related to this session ID.

2. When the client initiates an HTTPS request again, it sends the session ID to the server.

3. After receiving the session ID, the server obtains the local cache based on the session ID to determine whether the cache has expired. If not, the server continues to reuse the communication information corresponding to the session ID, so it directly skips the first three handshakes and enters the fourth handshake.

4. If the data verified by both parties in the fourth handshake is successful, it means that the fourth handshake is successful and the two parties can communicate.

See here if you have found a problem, if each server to save every client session ID information to the server, so will the problem of excessive occupancy resources, if the client is less, so the problem doesn’t exist, but the reality is the client’s quantity is huge, the server must consider performance problems;

So how can this problem be solved? Look at the second option;

Second option: Session ticket;

1. If a connection has been established between the client and the server, the server sends the session ticket encrypted with the public key to the client for the client to save.

2. If the client needs to re-establish a connection with the server, the encryption information related to the session ticket is sent to the server.

3. If the server can decrypt the session ticket properly, it indicates that the session ticket is valid, and the server enters the fourth handshake. If the decryption fails, follow the normal handshake process.

4. If the client decrypts the server data successfully, the fourth handshake succeeds.

The difference between scheme 2 and Scheme 1 is that the corresponding communication information is stored on the client and the asymmetric key is used to ensure the security, which greatly reduces the pressure on the server.

Nice to see you again!

We’ve talked a lot about Hash, symmetric encryption, asymmetric encryption, digital certificates, SSL, etc. How does HTTPS work?

Then look at the final summary below

conclusion

HTTPS implementation principle

HTTPS implementation, equal to + SSL, HTTP and SSL protocol, the main job is to help people on both sides of the communication, to establish a communication security channel, by shaking hands, four times for authentication and key agreement, etc, have the server and the client through symmetric encryption algorithm to encrypt data, and then through the signature of the asymmetric encryption algorithm for data, Ensure the integrity of data, in order to achieve the purpose of safe transmission;

Please look at the flow chart of the final summary:


As you can see from this, the most important step is the SSL protocol, which is the cornerstone of HTTPS;

If you have a better idea, or if there is something wrong with this article, feel free to leave a comment in the comments section.

Reference & Thanks

Digital signature, digital certificate, and HTTPS Certificate Authority SSL/TLS protocol Description SSL/TLS handshake procedure Description HTTPS series HTTPS mechanism Details SSL/TLS handshake process

other

Android you have to learn HTTP related knowledge of Android network programming TCP, UDP detailed explanation

About me

Brother Dei, if my article is helpful to you, please give me a thumbs up at ️, you can also follow my Github and blog;

This article is formatted using MDNICE