preface

HTTP knowledge, in the interview process will inevitably be asked about, as a protocol knowledge, but also involves the whole HTTP protocol knowledge. So this article is more like some of the little notes I’ve been reading about HTTP.

Ok, let’s start with a map:

HTTP protocol

1.1. What is HTTP?

1.1.1 request-response type

HTTP is a protocol used for communication between the client and server at the application layer. It is request-response. The request is sent by the client and the server responds to the request.

(One drawback is that the server cannot actively send requests to the client. If there is any state change on the server that causes the client to change, the client needs to send requests first, which uses polling, but takes a lot of time. Therefore, there are websocket protocol and other technologies to achieve real-time communication, that is, to reach the server can also actively send requests to the client)

1.1.2 stateless

HTTP is a stateless protocol that does not persist requests or responses that have been sent. In fact HTTP is designed to be stateless because it can process a large number of transactions more quickly. But statelessness doesn’t allow us to save the user’s state for the most part.

(HTTP/1.1(also stateless) introduced cookie technology for state management)

2. HTTP packet information

2.1. HTTP packets

HTTP packets can be roughly divided into a header and a body (not necessarily a header), which are separated by blank lines.

  • Header: The content of the request or response from the service and attribute maker or client
  • Empty line: carriage return + line feed
  • Message body: The data to be sent

There are some fields and differences in the packet headers of request and response that need to be understood:

  • Request header:

  • Response header:

The differences are as follows:

  1. Request line: request method + request URI+HTTP version
  2. Status line: HTTP version + status code + cause phrase
  3. Header fields: Various headers (common headers, request headers, response headers, entity headers, and possibly other headers such as cookies) containing various conditions and attributes of the request response

HTTP method

USES the example of HTTP / 1.1

3.1, the GET

Access to resources

  1. The GET method is used to request access to theURIIdentified resources
  2. If the requested resource is text, it is returned as is

3.2、 POST

Transport entity body

Get and POST

  1. Transmission mode: GET transmits data through the address bar, and POST transmits data through packets.

  2. Transfer length: The GET argument has a length limit (limited by the LENGTH of the URL), while the POST argument has no limit

  3. Security: GET is less secure because the data being sent is part of the URL. POST is more secure than GET because parameters are not saved in browser history or Web server logs.

  4. GET can be cached, POST cannot be cached

  5. There’s another big difference between GET and POST,

  • GET generates a TCP packet:

For GET requests, the browser sends both HTTP headers and data, and the server responds with 200 (return data).

  • POST generates two TCP packets:

For POST, the browser sends a header, the server responds with 100 continue, the browser sends data, and the server responds with 200 OK (returns data).

3.3, the PUT

Transfer files

  1. However, this method itself does not have the verification mechanism, there are security problems
  2. The returned status code can be 204 No Content has been added from the server. No data is returned

3.4, the HEAD

Get the header of the packet

  1. Does not return the packet body content
  2. Used to verify the validity of the URI and the date and time of the resource update
  • DELETE DELETE a file
  1. As opposed to the PUT method
  2. It is as insecure as PUT, but can be used with other authentication mechanisms
  3. The returned status code can be 204 No Content Has been deleted from the server. No data is returned

3.5, the OPTIONS

  1. Query supported for the resource specified by the requestmethods
  2. Allow: GET,POST,HEAD,OPTIONS (cross-domain precheck)

HTTP status code

1 * *

Informational status code – Received content is being processed

  • 101 Protocol Upgrade protocol

Application scenario: For example, webSocket is upgraded from HTTP to WebSocket

2 * *

Success status code – Request processed

  • 200 succeeded and returned data
  • 204 The request is successful, but no data is returned
  • 206 Content-range Specifies the Content of the specified Range

3 * *

Redirect status code – Additional action is required to complete the request

  • The 301 permanent redirection has reassigned the new URIafterThe URI indicated by the resource restriction should be used
  • The 302 temporary redirection has reassigned the new URItheThe new URI should be used for access
  • 304 Matches the negotiation cache

4 * *

Client Error status code – The server failed to process the request

  • 400 Syntax error parameter
  • 401 not certification
  • 403 is rejected
  • 404 No requested resource on the server

5 * *

Server-side error status code – The server failed to process the request

  • 500 Bugs or temporary faults
  • The method parameters of the 501 client are not supported
  • 503 out of load in repair

HTTP header

There are a lot of HTTP headers, so I’m just going to write some of the ones I see most often. You can look it up here

5.1 Request header:

The header field notifies the server of what my client probably supports:

  • Some media types for supported data: Accept
  • Supported character set: Accept-charset
  • Supported content Encoding: accept-encoding
  • Supported Language set related: accept-language

Or tell the server something from the client:

  • Authentication information: Authorization

Tells the server the Host name and port number of the requested resource: Host tells the server the Range of requests

5.2 Entity Header:

  • The main body of an entity is content-encoding
  • Size of entity part: Content-Length
  • Which part of the entity complies with the Range request: Content-range (specifies that this may return a 206 status code because only a certain Range of data is returned)
  • The media Type of the object in the entity body: Content-Type

HTTP version overview

HTTP / 6.1, 0.9

This version of the HTTP protocol does not support request headers, only GET methods

HTTP / 6.2, 1.0

Several changes have been added from HTTP/0.9:

  • Added to the request headerThe HTTP version number
  • HTTP began to haveRequest headers and response headersthe
  • Added some related onesStatus code
  • There’s also the Content-Type, which can be transmittedOther types ofThe file

(But HTTP/1.0 has a big performance problem, which is that every resource request requires a new TCP connection, and it is a serial request.)

TCP connection:

The TCP connection is because the HTTP request response is based on the TCP connection, and each time the request data is completed, the TCP connection will be disconnected (four waves), if the need for multiple HTTP transmission, then TCP also need to be disconnected, very consumption of performance.

HTTP / 6.3, 1.1

HTTP/1.1 addresses the network performance issues of HTTP/1.0 and has added some new things:

  • Set up thekeep-aliveLet HTTP reuse TCP connections (persistent, long)
  • increasecache controlMechanism (cache related)
  • Officially on boardOPTIONS Method, which is mainly used in CORS applications (cross-domain correlation)
  • Increase the HOST headerBecause multiple domain names can be resolved to the same IP address, to distinguish the requested domain name, you need to add the domain name information to the HTTP protocol instead of the IP address information translated by DNS.
  • Protocol headers increasedLanguage, Encoding, TypeEqual header, allows better negotiation between client and server
  • Once a request is sent, a second request can be sent without waiting for a response.

HTTP / 6.4, 2.0

  • HTTP/2.0 is a binary protocol that increases the efficiency of data transfer
  • By removing serial requests from HTTP/1.1, HTTP/2.0 can make concurrent requests over a TCP connection
  • Will be compressed first

(This means that if you make multiple requests at the same time and their heads are the same or similar, the protocol will help you eliminate duplicates.)

HTTP / 6.5, 3.0

HTTP/2.0 has improved performance over previous versions, but there are still areas where it can be improved. For example, because HTTP is multiplexing a TCP connection, the underlying TCP protocol does not know how many HTTP requests there are at the top.

Once packet loss occurs, the problem is that all HTTP requests must wait for the lost packet to be returned, even if the lost packet is not my HTTP request

HTTP/2 multiple requests reuse a TCP connection and block all HTTP requests once packet loss occurs

Finally, HTTP/3 changed HTTP’s underlying TCP protocol to UDP! — — — QUIC agreement

  • Because UDP does not care about the order, regardless of packet loss
  • QUIC has its own protocol for packet loss retransmission and congestion control

Seven, the HTTP – > HTTPS

7.1. Features of HTTP

To transition from HTTP to HTTPS, we need to first understand some features of the HTTP protocol. For example, a big feature of HTTP is that it is not secure:

  • The conversation between the server and client is not encrypted and can be eavesdropped
  • The client and server cannot authenticate each other
  • Data transferred between the client and server may be modified

Based on these points, there are three areas where HTTP needs to be improved:

  • Data transmission encryption
  • The authentication
  • Data integrity protection

7.2. How do I Use HTTPS

Let’s take a look at these issues before WE talk about HTTPS;

7.2.1 Data transmission encryption

Symmetric encryption and asymmetric encryption are used

  • Symmetric encryption

For example, Xiao Ming has a key, he can use this key to lock a box (encryption), he can also use this key to open the box (decryption).

It’s a key that encrypts a piece of information and decrypts the encrypted message.

If we encrypt it this way. First, both the client and the server have the same key, and only the two sides know it. In this case, if no one can crack the key, the communication between the two sides is secure.

However, in order to ensure that their keys are the same, one party should first generate the keys and send them to the other party. In this process, how can we only let the transmitting parties know, but at the same time keep it secret from others?

Symmetric encryption does not seem to meet our needs, let’s take a look at asymmetric encryption:

  • Asymmetric encryption

Analyzing this process may be a bit lengthy, but it gives a clearer picture of the use of asymmetric encryption

Two keys, one is usually called a public key and the other is called a private key. The contents encrypted with the public key can be unlocked only by the private key, and the contents encrypted with the private key can be unlocked only by the public key.

Version 1: Server side tries public and private keys

  1. The server first puts its ownPublic key S is in plaintextTransfer to the browser. (Note the plaintext here).
  2. The browser uses the public key S to encrypt data before sending it to the server.
  3. The server uses its own private key S to decrypt the data encrypted with the public key

(Browser-to-server communication can ensure the reliability of data)

What if the server needs to send data to the browser? 4. The server uses its private key to encrypt the data and sends it to the browser. 5.

Problem: The public key is originally transmitted to the browser in clear text, which is not secure and will not be secure if stolen.

Version 2: Two pairs of keys

The above uses a set of public and private keys generated on the server side, but can only be secured in a single direction. So can we use two pairs of keys? (That is, both the server and the client have their own public and private keys.)

  1. The browser generates public key L and private key L, and the server generates public key F and private key F
  2. The browser transmits the public key L in plaintext to the server
  3. The server transmits the public key F in plaintext to the browser
  4. The browser encrypts the content with the public key S before sending it to the server, and the server decrypts the content with the private key S after receiving it
  5. The server encrypts the content with the public key L before sending it to the browser, and the browser decrypts the content with the private key L after receiving it

The problem: But asymmetric encryption is very time consuming, while symmetric encryption is much faster

Version 3: Asymmetric encryption + symmetric encryption

  1. The server generates public and private keys S for asymmetric encryption
  2. The browser makes a request to the server, and the server transmits the public key S plaintext to the browser
  3. The browser randomly generates a key M for symmetric encryption, encrypts it with the public key S, and sends it to the server
  4. The server decrypts the key M with the private key S
  5. So both sides have the key and M gets.

In this way, the subsequent data of both parties can be encrypted and decrypted by key M

The problem: Hybrid encryption can be used to encrypt communication, but the client cannot actually know that it is being sent from the server

7.2.2 Authentication

You are using a digital certificate (CA) and other public key certificates issued by your home country

This means that the certificate is a trusted third party on both the client and server.

  1. The server applies for a public key from an authentication authority
  2. The certification authority digitally signs the requested public key and places the signed public key in the public key certificate
  3. The server sends this important certificate to the client for public key communication
  4. The client verifies the digital signature on the certificate. If the authentication succeeds, the public key of the server is trusted.

7.2. HTTPS Communication Process

HTTPS is HTTP in an SSL shell

(Ha ha ha, with the original title of the book oh)

Let’s see how HTTP compares to HTTPS:

The HTTPS protocol directly adds part of the communication interface layer between the transport layer and the application layer, and uses SSL. TLS implements encryption, certificate protection, and integrity protection.

Take a look at the final HTTPS communication process based on the solution mentioned above.

(See this picture, these 12 steps, forcible withdrawal… Ha ha ha, kidding. Although there are many steps, it is necessary to think about the decomposition and understand the process. Let’s continue.)

First of all, I want to break down the steps in the above figure into two chunks, because we said that SSL layer does encryption, certificates, and integrity protection, and it does that before it goes into the actual HTTP request, so everything from 1 to 9 is SSL processing, and everything from 10 to 12 is what we know about HTTP communication.

  1. The client sends a packet to the server to start SSL communication

(The packet contains the specified SSL version supported by the client, the list of encryption components – encryption algorithm and key length)

  1. If the server determines that SSL can be used for communication, it sends a response packet to the client

(The message contains the SSL version and the encryption component – filtered from the receiving client’s encryption component)

  1. The server sends the packet containing the public key certificate to the client.

  2. The server sends a packet to notify the client that the INITIAL SSL handshake negotiation is complete

  3. The client generates a random password string, encrypts it with the previous public key, and then sends it to the client through a packet [Random password string, equivalent to the key M demonstrated before (asymmetric encryption and symmetric encryption)]

  4. The client sends a Change Cipher Spec packet to inform the server that the data sent by the client is encrypted using the random password string

  5. The client sends an end packet containing the checksum of all the packets that have been connected so far. If the server can decrypt the packet correctly, the handshake negotiation succeeds

  6. The server also sends a Change Cipher Spec packet to inform the client that the data received by the server is encrypted using random password strings

  7. The server sends an end packet containing the checksum of all the packets that have been connected so far. If the client can decrypt the packet correctly, the handshake negotiation succeeds

  8. After end packets are exchanged between the client and server, the SSL connection succeeds. The system starts to communicate with application layer protocols and sends HTTP requests

To summarize some key points of the process:

  1. HTTPS uses a hybrid encryption algorithm combining asymmetric and symmetric encryption
  2. The public key sent from the server to the client is provided by a trusted third party.
  3. When SSL negotiation is complete, the familiar HTTP request response phase occurs

The above process is solved

  • Data transmission encryption
  • The authentication

However, data integrity protection is not mentioned, because the application layer sends the summary of packets with MAC attached to the data. The MAC can detect whether the packets are tampered and ensure the integrity of packets.

8. Browser cache sorting and practice

8.1. Strong cache: No HTTP request is required

The header field involved: Expires Cache-Control One big difference to know is that:

  1. Expires Specifies the world in which the resource ExpiresSpecific point in time on the server; (If you change the local time, the cache will be invalid.)

Here’s an example:

  1. Cache-ControlIs set byA time period, takes precedence over Expires
Cache-control: Max - age =300
Copy the code

Cache-control contains a number of other cache-control fields.

  • No-store: no cache is performed
  • No-cache: does not cache expired resources. The cache processes the resources after the expiration date is determined by the source server
  • Public: Other users can also take advantage of the cache
  • Private: The response is only targeted at a specific user
  • Max-age: specifies the cache time
  • S-maxage: Similar to max-age, but s-maxage is more suitable for common cache servers that can be used by multiple users.

Problem: The strong cache determines whether a certain time or period has passed, but it does not care whether the file on the server side has been updated and may not be able to receive the latest, so it needs to use the following negotiation cache:

8.2 Negotiation cache: AN HTTP request is required

The first fields involved: last-modified and if-modified-since/ETag and if-none-match

  • The last-modified and If – Modified – Since
  1. The browser sends a request to the server, and the client adds a last-Modified header to the response header, whose value is the Last time the resource was Modified on the server
  2. The next time the browser requests the resource, it detects the last-Modified header it received earlier and adds the if-modified-since header with the last-modified value
  3. The value of if-modified-since received by the server is compared with the value of last-modified previously sent by the server
  4. If there is no change, 304 reads directly from the cache
  5. If the time of if-modified-since is less than the time of the last modification of the resource on the server, the file has been updated, so the new resource file and 200 are returned.
  • The ETag and If – None – Match
  1. The browser sends a request to the server side to return a unique representation Etag generated by the server side
  2. The next time the browser sends a request to the server, it puts the last received Etag value into the if-none-match header
  3. After receiving the message, the server compares if-none-match with the ETag of the resource on the server
  4. If 304 is returned, the resource is not modified
  5. If inconsistent, return 200 (including new Etag)

8.3. The whole caching process

1. The browser checks whether there is a cache.

If not, send a request to the server directly, return the request result and cache representation, store in the cache, and load the page

If yes, determine whether it has expired

2. Expires and Cache-control

If not, read cache directly, return cache, and the page loads

If it expires, send a request to the server with if-modified-since, if-none-match to see If the resource has been updated

3. Check whether the resource has the update negotiation cache

If updated, return the resource and cache id, 200, and store it in the cache

If there is no update, 304 continues to use the cache, reads the cache, returns the cache, and loads the page

conclusion

HTTP knowledge really a lot of knowledge, involving a lot of knowledge, this article is a few days to read a book to see a note and summary of the article, there must be a lot of knowledge and not involved in the clear, it is best to read a book. If there are any mistakes, please give suggestions in the comments section. Thanks a million 💗

The understanding and practice may not be very good for the time being. Keep learning. 😹 😹 😹

I am Jingda, a junior. I hope to learn and make progress together with you. 🙆 🙆 🙆

Come on!

The article must have written bad place, welcome the comment section to correct.

Write incomplete place, also suggest everyone to read the following reference ah 👇👇👇

【 References 】

The illustration of HTTP

HTTP

Have an in-depth understanding of the browser caching mechanism

In-depth understanding of HTTPS principles, processes, and practices