preface

In recent years, the Internet has undergone earth-shaking changes. In particular, the HTTP protocol that we have always been used to is gradually replaced by HTTPS. Under the joint promotion of browsers, search engines, CA organizations and large Internet enterprises, the Internet has ushered in the “HTTPS encryption era”. HTTPS will completely replace HTTP as the mainstream transport protocol in the next few years.

After reading this article, I hope you understand:

  • What are the problems with HTTP communication
  • How HTTPS improves HTTP What are the problems
  • How does HTTPS work

If you want to read more great articles, please click on the GitHub blog. Fifty great articles a year are waiting for you!

What is HTTPS

HTTPS is the secure version of HTTP, which establishes SSL encryption layer over HTTP and encrypts transmitted data. It is now widely used for security-sensitive communications on the World Wide Web, such as transaction payments.

HTTPS provides the following functions:

(1) Encrypt the data and establish an information security channel to ensure the data security in the transmission process;

(2) Carry out real identity authentication to the website server.

We often use HTTPS to communicate on Web login pages and shopping checkout screens. When using HTTPS for communication, use https:// instead of http://. In addition, when a browser visits a Web site with valid HTTPS communication, a lock symbol appears in the browser’s address bar. The display of HTTPS varies from browser to browser.

Why HTTPS

Security problems such as information theft and identity disguise may exist in HTTP protocol. Using HTTPS can effectively prevent these problems. Next, let’s take a look at the problems of HTTP protocol:

  • Communications use clear text (not encrypted) and the content can be eavesdropped

HTTP itself does not have the function of encryption, so it cannot be used to encrypt the communication as a whole (the content of requests and responses communicated using HTTP). That is, HTTP packets are sent in plaintext (unencrypted packets).

The defects of HTTP plaintext protocol are important reasons for data leakage, data tampering, traffic hijacking, phishing attacks and other security problems. HTTP cannot encrypt data, and all communication data is streaked in plaintext on the network. Through network sniffer devices and some technical means, HTTP message content can be restored.

  • The integrity of the message could not be proved and may have been tampered with

Completeness refers to the accuracy of information. Failure to prove its completeness usually means it is impossible to determine whether the information is accurate. The HTTP protocol cannot prove the packet integrity of communication. Therefore, it is impossible to know whether the content of a request or response has been tampered with until the request or response is received. In other words, there is no way to confirm that the request/response sent and the request/response received are the same.

  • The identity of the communicating party is not verified, so it is possible to encounter camouflage

Requests and responses in HTTP do not acknowledge the communicator. In HTTP communication, anyone can initiate a request because there is no step to confirm the communicator. In addition, whenever the server receives a request, it will return a response regardless of who the sender is (but only if the IP address and port number of the sender are not restricted by the Web server).

HTTP protocol can not verify the identity of the communication party, anyone can forge a fake server to deceive users, to achieve “phishing fraud”, users can not detect.

In contrast, HTTPS has the following advantages over HTTP (more on this later) :

  • Data privacy: The content is symmetrically encrypted, with each connection generating a unique encryption key
  • Data integrity: The content transfer has been verified for integrity
  • Identity authentication: A third party cannot forge a server (client) identity

How does HTTPS solve the above problems of HTTP?

HTTPS is not a new protocol at the application layer. The HTTP communication interface is replaced by SSL and TLS protocols.

Typically, HTTP communicates directly with TCP. When SSL is used, it evolves to communicate with SSL first and then with SSL and TCP. In short, HTTPS is HTTP in the shell of THE SSL protocol.

With SSL, HTTP has the encryption, certificate, and integrity protection features of HTTPS. That means HTTP plus encryption and authentication and integrity protection is HTTPS.

The HTTPS protocol basically relies on TLS/SSL, and TLS/SSL relies on three basic algorithms: Hash function, symmetric encryption and asymmetric encryption, which uses asymmetric encryption to realize identity authentication and key negotiation, symmetric encryption algorithm uses the negotiated key to encrypt data, based on the hash function to verify the integrity of information.

1. Address the problem of content being eavesdropped — encryption

Method 1. Symmetric encryption

This way encryption and decryption use the same key. Encryption and decryption use keys. A password cannot be decrypted without a key; conversely, anyone with a key can decrypt it.

If symmetric encryption is used, you must also send the key to the peer party. But how do you do it safely? When a key is forwarded over the Internet, if the communication is monitored the key can fall into the hands of an attacker, and the point of encryption is lost. You also have to secure the keys you receive.

Method 2. Asymmetric encryption

Public-key encryption uses a pair of asymmetric keys. One is called a private key and the other is called a public key. As the name implies, a private key cannot be known to anyone else, whereas a public key can be freely distributed and available to anyone.

In public-key encryption mode, the sender uses the public key of the other party to encrypt the ciphertext. After receiving the encrypted message, the other party uses its private key to decrypt the encrypted message. In this way, there is no need to send the private key for decryption, and there is no need to worry about the key being eavesdropped and stolen by an attacker.

Asymmetric encryption is characterized by one-to-many information transmission. The server only needs to maintain a private key to encrypt communication with multiple clients.

This approach has the following disadvantages:

  • The public key is public, so hackers can use the public key to decrypt the information encrypted by the private key.
  • The public key does not contain information about the server, and the asymmetric encryption algorithm cannot ensure the legitimacy of the server’s identity. Therefore, there is a risk of man in the middle attack. The public key sent by the server to the client may be intercepted and tampered with by a man in the process of transmission.
  • Asymmetric encryption consumes a certain amount of time in the process of data encryption and decryption, which reduces the data transmission efficiency.

Method 3. Symmetric encryption + asymmetric encryption (HTTPS adopts this method)

The advantage of using symmetric key is that the decryption efficiency is relatively fast, and the advantage of using asymmetric key is that the transmitted content can not be cracked, because even if you intercept the data, but without the corresponding private key, the content can not be cracked. Let’s say you grab a safe, but you can’t open it without the key. Then we will combine symmetric encryption with asymmetric encryption, make full use of their respective advantages, use asymmetric encryption in the key exchange link, and then use symmetric encryption in the stage of establishing communication exchange messages.

The sender uses the public key of the other party to encrypt the symmetric key, and the other party uses its private key to decrypt the symmetric key. In this way, the exchanged key is secure and the communication is carried out in symmetric encryption mode. Therefore, HTTPS uses a hybrid encryption mechanism that combines both symmetric and asymmetric encryption.

2. Solve the problem that packets may be tampered with digital signatures

In the process of network transmission, there are many intermediate nodes. Although the data cannot be decrypted, it may be tampered with. How to verify the integrity of the data? —- Verifies the digital signature.

Digital signatures work in two ways:

  • You can be sure that the message was actually signed and sent by the sender because no one else can impersonate the sender’s signature.
  • A digital signature can determine the integrity of a message and prove that data has not been tampered with.

How to generate a digital signature:

A Hash function is used to generate a message digest, which is then encrypted with the sender’s private key to generate a digital signature and sent to the recipient along with the original text. The next step is for the receiver to verify the digital signature.

Process of verifying digital signature:

The receiver can decrypt the encrypted digest only with the sender’s public key, and then use the HASH function to generate a digest of the received text and compare it with the digest obtained in the previous step. If they are the same, the received information is complete and has not been modified during transmission. Otherwise, the received information has been modified. Therefore, the digital signature can verify the integrity of the information.

Let’s say the message comes between Kobe and James. James sends the message with the digital signature to Kobe. Kobe receives the message and verifies the digital signature to verify that the received message is from James. Of course, this process assumes that Kobe knows James’ public key. The point was that, like the message itself, the public key could not be sent directly to Kobe over an insecure network, or how the public key could prove to be James’s.

At this point, the Certificate Authority (CA) needs to be introduced. There are not many CA’s. The Kobe client has all the certificates of trusted CA’s built in. The CA digitally signs James’ public key (and other information) to generate the certificate.

3. Solve the problem that the identity of the communication party may be disguised — digital certificate

A digital certificate Authority is in the position of being a trusted third party organization for both client and server.

  • The server operator submits the public key, organization information, and personal information (domain name) to the third-party CA and applies for authentication.
  • CA verifies the authenticity of the information provided by the applicant through various online and offline means, such as whether the organization exists, whether the enterprise is legal, whether it has the ownership of the domain name, etc.
  • If the information is approved, CA will issue a certificate to the applicant. A certificate contains the following information: the public key of the applicant, the organization information and personal information of the applicant, the information about the issuing authority (CA), the validity time, and the serial number of the certificate, and a signature. The algorithm of signature generation is as follows: first, hash function is used to calculate the summary of the public plaintext information, and then CA’s private key is used to encrypt the summary of the information, ciphertext is signature;
  • When the Client sends a request to the Server, the Server returns a certificate file.
  • The Client reads the plaintext information in the certificate and uses the same hash function to calculate the summary of the information. Then, the Client decrypts the signature data using the public key of the corresponding CA. If the summary of the information is the same, the validity of the certificate can be confirmed, that is, the public key of the server is trustworthy.
  • The client also verifies the domain name and validity period of the certificate. The client has the certificate information (including the public key) of the trusted CA built in. If the CA is not trusted, the corresponding CA certificate cannot be found and the certificate is judged to be invalid.

4. HTTPS workflow

1. The Client has launched a HTTPS requests (such as https://juejin.cn/user/4283353031252967), according to the regulation of RFC2818, know the Client needs to connect Server port 443 (the default).

2. The Server returns the pre-configured public key certificate to the client.

3. The Client verifies the public key certificate: For example, whether the certificate is within the validity period, whether the certificate is used to match the site requested by the Client, whether the certificate is in the CRL revocation list, and whether its upper certificate is valid is a recursive process until the Root certificate (the Root certificate built into the operating system or the Root certificate built into the Client) is verified. If the verification passes, continue; if the verification fails, a warning message is displayed.

4. The Client uses the pseudo-random number generator to generate a symmetric key for encryption, encrypts the symmetric key with the public key of the certificate, and sends the symmetric key to the Server.

5. The Server decrypts the message using its private key to obtain the symmetric key. At this point, both the Client and Server have the same symmetric key.

6. The Server encrypts plaintext A with the symmetric key and sends it to the Client.

7. The Client uses the symmetric key to decrypt the ciphertext of the response and obtains plaintext A.

8. The Client initiates an HTTPS request again and encrypts plaintext B using the symmetric key. The Server decrypts the ciphertext using the symmetric key to obtain plaintext B.

The difference between HTTP and HTTPS

  • HTTP is a plaintext transmission protocol. HTTPS is a network protocol constructed using SSL and HTTP to encrypt transmission and authenticate identity. It is more secure than HTTP.

  • HTTPS is more secure than HTTP, friendlier to search engines, and better for SEO. Google and Baidu index HTTPS pages first.
  • HTTPS requires an SSL certificate, but HTTP does not.
  • HTTPS standard port 443, HTTP standard port 80;
  • HTTPS is based on the transport layer and HTTP is based on the application layer.
  • HTTPS displays a green security lock in the browser, but HTTP does not.

Why don’t all websites use HTTPS

If HTTPS is so secure, why don’t all Web sites use HTTPS?

First of all, many people still feel that there is a threshold to HTTPS implementation, which is the need for an SSL certificate issued by an authoritative CA. In traditional mode, the selection, purchase, and deployment of certificates are time-consuming and labor-intensive.

Second, HTTPS is generally considered to be more performance costly than HTTP because encrypted communication consumes more CPU and memory resources than plain text communication. If every communication is encrypted, it consumes a considerable amount of resources, and the number of requests that can be processed on a single computer is bound to decrease. This is not the case, and users can solve this problem by optimizing performance and deploying certificates on SLB or CDN. For example, during “Double Eleven”, HTTPS taobao and Tmall still ensure smooth and smooth operation of accessing, browsing and trading on websites and mobile terminals. Many of the optimized pages performed as well or even slightly better than HTTP, so HTTPS is not slow after optimization.

In addition, the desire to save on the cost of purchasing certificates is also a reason. Certificates are essential for HTTPS communication. The certificates used must be purchased from a certification authority (CA).

Finally, safety awareness. Compared with The domestic Internet industry, the security awareness and technology application in foreign countries are relatively mature, and the HTTPS deployment trend is jointly promoted by the society, enterprises, and the government.

To recommend a good BUG monitoring toolFundebug, welcome free trial!

Welcome to pay attention to the public number: front-end craftsmen, we will witness your growth together!

Refer to articles and books

  • The illustration of HTTP
  • Qomolangma Architecture Course (recommended)
  • What is a digital signature? (recommended)
  • How HTTPS works
  • HTTPS principle in detail
  • How does HTTPS ensure security?
  • [Information Security] 3.HTTPS workflow
  • Why is HTTPS more secure than HTTP