10 minutes to understand HTTP and HTTPS protocols

I am not sure about the difference and connection between HTTP and HTTPS recently, so I looked it up

1. What is an agreement?

Network protocol is a kind of “agreement” or “rule” reached between computers for network communication. With this “agreement “, devices produced by different manufacturers and different operating systems can communicate between computers.

2. What is HTTP protocol?

HTTP Protocol is the abbreviation of Hyper Text Transfer Protocol. It is the markup language (HTML) that transfers text from the WEB server to the local browser.

HTTP was originally designed to provide a transport protocol for publishing and accepting HTML.

There are several versions of HTTP, with HTTP/1.1 being the most widely used.

3. The principle of HTTP

HTTP is a protocol based on TCP/IP communication protocol to penetrate the data, the transmission of data types for HTML files, pictures, files, query results and so on.

HTTP is generally used in B/S architecture. As an HTTP client, the browser sends a request to the HTTP server, namely the WEB server, through the URL. The following uses Baidu as an example:

[Process of accessing Baidu]

4. HTTP features

HTTP supports client/server mode and is a request/response mode protocol.
Simple and fast: when a client requests services from the server, it only needs to send the request method and path. The commonly used request methods are GET, HEAD and POST.
Flexibility: HTTP allows the transfer of any type of data object. The Type of transport is marked by content-Type.
Connectionless: Limit processing to one request per connection. After the server processes the request and receives the reply from the client, it disconnects, but it is not good for the client to maintain the Session connection with the server. In order to make up for this deficiency, two technologies for recording THE HTTP state are produced, one is called Cookie and the other is called Session.
Stateless: Stateless means that the protocol has no memory for transaction processing. If the previous information is required for subsequent processing, it must be retransmitted.

5. Differences between URIs and urls

HTTP uses Uniform Resource Identifiers (URIs) to transfer data and establish connections.

URI: Uniform Resource Identifier specifies the Uniform Resource Identifier
URL: Uniform Resource Location Uniform Resource locator

A URI is used to identify a specific resource, and we can know what a resource is by using a URI.

Urls are used to locate specific resources, marking the location of a specific resource. Every file on the Internet has a unique URL.

6. Composition of HTTP packets

Request message Composition

Request line: includes request method, URL, protocol/version
Request Header
Request body

Request message Composition

Response message Composition

The status line
Response headers
In response to the body

Response message Composition

7. Common request methods

GET: Requests the specified page information and returns the entity body.
POST: Submits data to a specified resource for processing requests (such as submitting a form or uploading a file). The data is contained in the request body. POST requests may result in the creation of new resources and/or the modification of existing resources.
HEAD: Similar to a GET request, except that there is no concrete content in the response returned, which is used to retrieve the header
PUT: Data sent from the client to the server replaces the contents of the specified document.
DELETE: requests the server to DELETE the specified page.

A get request

A post request

The difference between POST and GET:

Both contain the request header request line, post plus the request body.
Get is mostly used for queries. Request parameters are placed in the URL and do not affect the content on the server. Post is used for submission, such as putting an account password into a body.
GET is directly added to the URL to view the content in the URL, while POST is placed inside the packet and cannot be directly viewed.
The length of data submitted by GET is limited because the URL length is limited, and the exact length is browser-dependent. POST doesn’t.

8. Response status code

When you visit a web page, the browser makes a request to the Web server. The server on which the page is located returns a header containing an HTTP status code in response to a browser request.

Status code classification:

1XX- Informational. The server receives the request and requires the requester to continue.
2XX- Success. The request was received, understood, and processed successfully.
3XX – Redirect, further action is required to complete the request.
4XX – Client error, request contains syntax error or cannot complete request.
5XX – Server error. The server encountered an error while processing the request.

Common status codes:

200 OK – The client request succeeded
301 – Resources (web pages, etc.) are permanently transferred to another URL
302 – Temporary jump
400 Bad Request – The client Request has syntax errors and cannot be understood by the server
401 Unauthorized – The request is not authorized. This status code must be used with the WWW-Authenticate header field
404 – Requested resource does not exist, possibly the wrong URL was entered
500 – An unexpected error occurred inside the server
503 Server Unavailable – The Server cannot process requests from clients at this time and may become normal after a period of time.

9. Why use HTTPS?

In practice, most websites now use HTTPS protocol, which is the future trend of the Internet. The following is a wireshark login request for a blog site.

Therefore, HTTP is not suitable for transmitting sensitive information, such as various accounts and passwords. HTTP is not secure for transmitting privacy information.

General HTTP has the following problems:

Request information is transmitted in plain text, easy to be intercepted by eavesdropping.
Data integrity is not verified and is easy to be tampered with
The identity of the peer is not verified and there is a risk of impersonation

10. What is HTTPS?

To solve the problems with HTTP, HTTPS is used.

HyperText Transfer Protocol over Secure Socket Layer (HTTPS) : commonly known as HTTP+SSL/TLS, the SSL certificate is used to authenticate the server and encrypt the communication between the browser and the server.

So what is SSL?

Secure Socket Layer (SSL) : Developed by Netscape in 1994, SSL is between TCP/IP and various application-layer protocols and provides security for data communication.

TLS (Transport Layer Security) Its predecessor is SSL. Its first several versions (SSL 1.0, SSL 2.0, SSL 3.0) were developed by Netscape, and were standardized and renamed by IETF since 3.1 in 1999. Up to now, there are three versions of TLS 1.0, TLS 1.1, TLS 1.2. SSL3.0 and TLS1.0 are rarely used due to security vulnerabilities. TLS 1.3 will be greatly changed and is still in the draft stage. Currently, TLS 1.1 and TLS 1.2 are most widely used.

History of SSL (Encrypted Communication over the Internet)

Secure Sockets Layout version 1.0 was designed by NetSpace in 1994, but was not released.
NetSpace released SSL/2.0 in 1995 and soon discovered serious bugs
In 1996, version SSL/3.0 was released and applied on a large scale
In 1999, TLS/1.0 was released, the most widely used version of SSL
In 2006 and 2008, versions TLS/1.1 and TLS/1.2 were released

11. What is the process for the browser to transfer data using HTTPS?

First, the client accesses the server through the URL to establish an SSL connection.
Upon receiving the request from the client, the server sends a copy of the certificate information supported by the website (including the public key) to the client.
The client server starts to negotiate the security level of the SSL connection, that is, the level of information encryption.
The browser on the client establishes the session key according to the mutually agreed security level, then encrypts the session key using the website’s public key and transmits it to the website.
The server decrypts the session key using its own private key.
The server uses the session key to encrypt communication with the client.

12. The shortcoming of the HTTPS

Multiple HTTPS handshakes prolong the page loading time by nearly 50%.
HTTPS connection caching is not as efficient as HTTP, which increases data overhead and power consumption.
Applying for an SSL certificate costs money, and more powerful certificates cost more.
The security algorithms involved in SSL consume CPU resources and server resources.

13. Summarize the differences between HTTPS and HTTP

HTTPS is a secure version of HTTP. HTTP data is transmitted in plain text, which is insecure. HTTPS uses SSL or TLS for encryption.
HTTP and HTTPS use different connection modes, and the default port is different. HTTP is 80, and HTTPS is 443.

14. https2.0

Multiplex = Multiplexing

Multiplexing allows multiple request-response messages to be sent simultaneously over a single HTTP/2 connection. In HTTP/1.1, a browser client has a certain number of requests for the same domain name at the same time. Requests exceeding the limit are blocked. This is one of the reasons why some sites have multiple static resource CDN domains, such as Twitter, twimg.com, to solve the problem of blocking browser requests for the same domain. HTTP/2 Multiplexing allows multiple request-response messages to be sent simultaneously over a single HTTP/2 connection. Therefore, HTTP/2 can easily implement multi-stream parallelism without relying on establishing multiple TCP connections. HTTP/2 reduces the basic unit of HTTP protocol communication to a frame that corresponds to a message in a logical flow. Messages are exchanged bidirectionally over the same TCP connection in parallel.

2. Binary frame division

HTTP/2 adds a binary framing layer between the application layer (HTTP/2) and the transport layer (TCP or UDP). Without changing the semantics, methods, status codes, URIs, and header fields of HTTP/1.x, it solves the performance limitations of HTTP1.1, improves transmission performance, and achieves low latency and high throughput. At the binary framing layer, HTTP/2 splits all transmitted information into smaller messages and frames and encodes them in binary format, where the headers of HTTP1.x are encapsulated in HEADER frames. The corresponding Request Body is wrapped inside the DATA Frame.

HTTP/2 traffic is done over a single connection that can host any number of two-way data streams. In the past, the key to HTTP performance optimization was not high bandwidth, but low latency. TCP connections tune themselves over time, limiting the maximum speed of the connection at first and increasing the speed of the transfer over time if the data is successfully transferred. This tuning is called TCP slow start. For this reason, HTTP connections that are inherently abrupt and short become very inefficient. HTTP/2 enables more efficient use of TCP connections by having all data flows share the same connection, allowing high bandwidth to truly serve HTTP’s performance gains.

This mode of single connection and multiple resources reduces the link pressure of the server, occupies less memory, and has higher connection throughput. In addition, the reduction of TCP connections improves network congestion, and the reduction of slow start time makes the recovery of congestion and packet loss faster.

3. Header Compression

HTTP/1.1 does not support HTTP header compression, which is why SPDY uses the generic DEFLATE algorithm and HTTP/2 uses the HPACK algorithm designed specifically for header compression.

4. Server Push

Server push is a mechanism for sending data before the client requests it. In HTTP/2, the server can send multiple responses to a single request from the client. Server Push makes http1.x era optimizations using embedded resources meaningless; If a request is made from your home page, the server will probably respond with the home page content, logo, and style sheet because it knows the client will use those things. This is like having all the resources in one HTML document, but compared to that, server push has another big advantage: it can be cached! It also makes it possible to share cached resources between different pages while following homology.

15. Why is HTTPS safe

Data transmitted over HTTP is unencrypted, that is, inscribed, so using functional HTTP to transmit private information is very insecure. To ensure that these private data could be encrypted, Netscape designed the Secure Sockets Layer (SSL) protocol to encrypt data over HTTP, giving birth to HTTPS. Today’s HTTPS uses the TLS protocol, but SSL is still synonymous with HTTPS because it emerged earlier and is still supported by modern browsers.

HTTPS Requires a handshake between the client (browser) and the server (website) before data transmission. During the handshake, the password of the two parties for encrypting data transmission is established. TLS/SSL is not only an encrypted transport protocol, but also uses asymmetric encryption, symmetric encryption and HASH algorithms in TLS/SSL.

The process of shaking hands is briefly described as follows:

The browser sends its own set of encryption rules to the site
The web site selects a set of encryption and HASH algorithms and sends its identity information back to the browser in the form of a certificate. The certificate contains information such as the website address, encrypted public key, and how the certificate is constructed.
The browser that acquires the site certificate does some work
Verify the validity of the certificate (whether the authority is legitimate, whether the website address contained in the certificate is the same as the address you are visiting, etc.), if the certificate is trusted, how the browser will display a small lock, otherwise it will display the certificate is not trusted prompt
If the certificate is trusted, or the user accepts an untrusted certificate, the browser generates a password with a random number. And used to encrypt the public key provided in the certificate
The handshake information is calculated using the agreed HASH, and the generated random number is used to encrypt the message, and finally the generated information is sent to the website.
The web site does the following when it receives the data from the browser:
Use your own key to decrypt the message. Remove the password. Use the password to decrypt the handshake message sent by the browser, and verify that the HASH is the same as that sent by the browser.
Encrypt a handshake message with a password and send it to the browser
The browser decrypts and computes the HASH of the handshake message. If the HASH is the same as that sent by the server, the handshake is complete. All subsequent communication data is encrypted using the random password generated by the previous browser using the symmetric encryption algorithm.

In this case, the browser and the website send an encrypted handshake message to each other and verify that they both have the same password and can encrypt and decrypt data properly. The asymmetric encryption algorithm is used to encrypt the generated password during the handshake, the symmetric encryption algorithm is used to encrypt the transmitted data, and the HASH algorithm is used to verify data integrity. Because the browser-generated password is the key to data encryption, asymmetric encryption algorithms are used to encrypt it during transmission. Asymmetric encryption algorithms generate public and private keys. The public key can only be used to encrypt data, so it can be transmitted at will, while the private key of a website is used to decrypt data, so websites are very careful to keep their private key to prevent leakage.

Any error during the TLS handshake disconnects the encrypted connection, preventing the transfer of private information. Because HTTPS is very secure, attackers cannot find a place to attack. Therefore, they often use fake certificates to deceive clients and obtain plaintext information. The default HTTP port number is 80 and HTTPS port number is 443.

See ten Minutes to understand HTTP and HTTPS protocols

End