This is the 20th day of my participation in the More text Challenge. For more details, see more text Challenge

If ❤️ my article is helpful, welcome to like, follow. This is the greatest encouragement for me to continue my technical creation. [More on this series on my blog] Coderdao.github. IO /

Summary: HTTP protocol

If you want to work in Internet technology, HTTP is something you should deal with almost every day, but I find that most people just scratch the surface of IT and don’t know much more about the details and principles, which makes it very difficult in interviews.

The purpose of this article is to help you establish a complete knowledge of HTTP and to reach the depth to handle the various development and interview questions.

GET/POST the difference?

  • fromThe cacheBy default, GET requests are cached by the browser, leaving a history, while POST requests are not.
  • fromcodingGET can only be URL-encoded and can only accept ASCII characters, while POST has no restrictions.
  • fromparameterGET is generally placed in the URL, so it is not secure. POST is placed in the body of the request, which is more suitable for transmitting sensitive information.
  • fromidempotenceThe point of view,GETisPower etc., andPOSTIt isn’t. (Power etc.Using the same conditions, a request and a repeat request have the same impact on the resources of the same system.
  • fromTCPA GET request sends the request packet at one time, while a POST packet is divided into two TCP packets, sending the header first and then the body if the server responds with 100(continue). (firefoxExcept for browsers, which send only one TCP packet for POST requests.)

Except for GET and POST, HTTP /1.1 specifies the following request methods:

  • GET: Usually used to obtain resources
  • HEAD: Obtain the meta information of the resource
  • POST: Submit data, that is, upload data
  • PUT: modifies data
  • DELETE: DELETE resources (rarely used)
  • CONNECT: Establishes a connection tunnel for the proxy server
  • OPTIONS: Lists the request methods that can be applied to resources for cross-domain requests
  • TRACE: TRACE the transmission path of the request-response

Differences between HTTP 1.0, 1.1, and 2.0

The HTTP 1.0:

Conclusion:

  • There is no connection
  • stateless

First used in web pages in 1996, the content is simple. Each browser request requires a TCP connection to the server, which immediately disconnects the TCP connection after processing (no connection), and the server does not track each client or record past requests (stateless).

The HTTP 1.1:

Conclusion:

  • Default persistent connection
  • Request pipelining
  • Added cache processing (new fields like cache-control)
  • Add Host field, support for breakpoint transfer, etc.

Widely used in 1999, Connection: close was used by default in HTTP/1.0. In HTTP/1.1, Connection: keep-alive is already used by default to avoid the overhead of establishing and releasing connections. However, the server must respond in the order requested by the client to ensure that the client can distinguish the response content of each request. The content-Length field is used to determine whether all the current request data has been received. Multiple parallel responses at the same time are not allowed.

The HTTP 2.0:

Conclusion:

  • Binary frame parallel transmission
  • multiplexing
  • The head of compression
  • Server push

HTTP/2 introduces the concept of binary data frames and streams, where frames sequentially identify data, as shown below.

  • Stream — a bidirectional stream of bytes on an established connection
  • Message – a complete series of data frames corresponding to a logical message
  • Frame – the minimum unit of HTTP2.0 communication. Each frame contains a frame header and at least identifies the stream ID to which the current frame belongs.

Each request is a data stream, which is sent in the form of a message, and the message is divided into multiple frames. The stream ID is recorded in the frame header to identify the data stream to which it belongs. Frames of different genera can be randomly mixed in the connection. The receiver can reassign the frame to a different request based on the stream ID.

Multiplexing:

1. All HTTP2.0 communication is done over a TCP connection that can host any number of two-way data streams.

2. Each data stream is sent as a message, which consists of one or more frames. These frames can be sent out of order and then reassembled based on the stream ID at the head of each frame.

For example, each request is a data stream. The data stream is sent as a message, and the message is divided into multiple frames. The stream ID in the frame header is used to identify the data stream to which it belongs. The receiver can reassign the frame to a different request based on the stream ID.

3. In addition, multiplexing (connection sharing) may cause critical requests to be blocked. In HTTP2.0, each data stream can set priorities and dependencies. High priority data streams are processed by the server and returned to the client first. Data streams can also rely on other sub-data streams.

As you can see, HTTP2.0 implements true parallel transport, it can make any number of HTTP requests on a TCP. This powerful feature is based on the “binary frame” feature.

The head of compression

In HTTP1.x, header metadata is sent as plain text, typically adding 500 to 800 bytes per request.

HTTP2.0 uses encoder to reduce the size of the header that needs to be transferred. Both sides of the communication cache a header fields table, avoiding duplication of header transfers and reducing the size of the transfer. Efficient compression algorithms can compress headers very large, reducing the number of packets sent and thereby reducing latency.

Server push:

In addition to the server’s response to the initial request, the server can also push additional resources to the client without the client explicitly requesting it.

Differences between HTTP and HTTPS

Introduction of HTTPS

To solve the problem that HTTP sends content in plaintext, which is not good for sensitive data, HTTPS adds the SSL protocol to HTTP. SSL authenticates the identity of the server based on certificates and encrypts the communication between the browser and the server.

The HTTPS protocol serves two functions:

  • A kind of offerInformation security channelTo ensure the safe transmission of data;
  • A confirmation siteauthenticity.

The difference between

  • HTTPS has an SSL layer compared with HTTP. You need to apply for a certificate from the CA (for a fee).
  • HTTPS encrypts data compared with HTTP for identity authentication (cognitive user and server). Prevent data from being stolen and changed.
  • The current architecture is the most secure but it takes a lot of time, the cache is not very good,
  • HTTPS Uses port 443. HTTP uses port 80. Note that HTTP and HTTPS are compatible

HTTPS Transmission Process

Prerequisites: The server generates the TSL/SSL public key and private key. Issue a certificate with the public key to the CA.

  1. The Client initiates an HTTPS request. According to RFC2818, the Client knows that it needs to connect to port 443 (default) of the Server.
  2. The Server has a pre-configured key pair (public key and private key). The public key certificate is returned to the client.
  3. Client Verifies the public key certificate: For example, whether the certificate is within the validity period, whether the purpose of the certificate matches the site requested by the Client, whether it is in the CRL revocation list, and whether its upper-level certificate is valid. This is a recursive process until the Root certificate (the built-in Root certificate of the operating system or the built-in Root certificate of the Client) is verified. If the authentication succeeds, the system continues. If the authentication fails, a warning message is displayed.
  4. The Client uses a pseudorandom number generator to generate the session key for encryption, encrypts the session key with the public key of the certificate, and sends it to the Server.
  5. The Server uses its private key to decrypt the message and obtain the session key. At this point, both the Client and Server hold the same session key.
  6. The Server encrypts plaintext A using the session key and sends it to the Client.
  7. The Client uses the session key to decrypt the ciphertext of the response to obtain plaintext A.
  8. The Client sends an HTTPS request again and uses the session key to encrypt the plaintext B of the request. The Server then uses the session key to decrypt the ciphertext to obtain plaintext B.

Enter the URL into the page rendering process

Overall 6 steps:

  • Enter url
  • DNS domain name Resolution
  • Establishing a TCP Connection
  • An HTTP request is sent, and the server processes the request and returns a response
  • Closing a TCP Connection
  • Browser rendering

What exactly did you do at each step? There are also the following answers:

DNS domain name Resolution

Essence: Convert the domain name juejin.cn into a specific IP address

Application scenario: When the crawler request data is empty, yespingRequest domain name. Whether or not it works.www.baidu.comSuccess stories;www.juejin.cnMay beBan ping/ tie hosts/the website is down and so on. This can be combinedcurl / Browser access. And more methods combined with judgment

The DNS resolution process is a recursive query between the browser and the local DNS. If no, iterate between the local DNS and the root domain server, top-level DNS server, and authoritative DNS server.

An HTTP request is sent, and the server processes the request and returns a response

After the TCP connection is established, the browser can send requests to the server using the HTTP/HTTPS protocol. When the server receives the request, it parses the header. If the header contains cache information such as if-none-match and if-modified-since, it verifies that the cache is valid and returns a status code of 304.

If the value is 301/302, it indicates that the server has changed the domain name and needs to be redirected. In this case, the network process reads the redirected address from the Location field in the response header, then initiates a new HTTP or HTTPS request, and jumps back to step 4. If the value is 200, check the Content-Type field. If the value is text/ HTML, it is an HTML document. If the value is application/octet-stream, it is a file download.