Original is not easy, if you need to reprint please [contact the author] or [signed author and indicate the source of the article]Copy the code

The tcp-IP + SSL/TLS handshake is the first thing that comes to mind when talking about HTTP. Today from another point of view for you to sort out some of the knowledge of HTTP.

This paper mainly has seven aspects, respectively

1. HTTP Evolution History We will briefly introduce the updated content of each version, the problems encountered and the improvement scheme at that time. 2, HTTP cache strategy 3, cross-domain policy 4, HTTP concurrency problems 5, GET/POST differences and common port numbers 6, some open problems 7, finally, we will list the shoulders of giants referred to in the process of writing this article.

The evolution of HTTP

1, the HTTP 0.9

  • The client is allowed to send only GET requests and does not support headers
  • Supports only one type of content, plain text, supports HTML but cannot insert images

2, HTTP 1.0

  • Request and response support header fields
  • The response object starts with a response status line
  • Response objects are not limited to hypertext
  • The client can submit data to the Web server through POST, and the GET, HEAD, and POST methods are supported
  • Support for long connections (but still short connections by default), caching mechanisms, and identity authentication

3, the HTTP 1.1

New features:
  • The default is long connection

HTTP 1.1 supports both long-Connection and Pipelining processing that delivers multiple HTTP requests and responses over a SINGLE TCP connection, reducing the cost and latency of establishing and closing connections. Connection: keep-alive is enabled by default in HTTP1.1, somewhat compensating for the fact that HTTP1.0 creates a Connection on every request.

  • Provides range request functionality (broadband optimization)

HTTP1.0, there are some waste of bandwidth, such as the client only needs a part of an object, and the server will send the whole object, and does not support resumable breakpoint function, HTTP1.1 in the request header introduced in the range header field, which allows only a part of the resource request. The return code is 206 (Partial Content), which makes it easy for developers to make the most of bandwidth and connections. This is the basis for supporting file breakpoint continuation.

  • Provides virtual HOST function (HOST domain)

HTTP1.0 assumes that each server is bound to a unique IP address, so the URL in the request message does not pass the hostname. However, with the development of virtual hosting technology, there can be multiple virtual hosts (multi-homed Web Servers) on a physical server, and they share the same IP address. HTTP1.1 both Request and response messages should support the Host header field, and an error (400 Bad Request) will be reported if there is no Host header field in the Request message.

  • More cache processing fields

HTTP/1.1 adds some new cache features over 1.0, including the introduction of entity tags, commonly known as e-tags, and a more powerful cache-control header.

  • Management of error notifications

Add 24 error status response codes in HTTP1.1. For example, 409 (Conflict) indicates that the requested resource is in Conflict with the current state of the resource. 410 (Gone) Indicates that a resource on the server is permanently deleted.

Question:
  • High latency — Head Of Line Blocking
  • Stateless feature – Blocking interaction protocols have no memory of connection state, and the server does not know how it relates to the last request, in other words, the login state is dropped
  • Plaintext transmission – Insecure transmitted content is not encrypted and may be tampered with or hijacked.
Improvement plan:
  • For queue head blocking:

1. Allocate resources on the same page to different domain names to increase the connection upper limit. Although it is possible to share a TCP pipe, only one request can be processed at a time in a pipe, and other requests can be blocked until the current one is completed. Reduce the number of requests 3. Inline some resources, such as CSS and Base64 images 4. Merge small files to reduce the number of resources

  • For insecurity:

HTTPS 2. Token verification 3. Customization/Convention: data encryption scheme

4, HTTP, 2

New features:
  • Binary frame transmission:
  1. All frames in HTTP 2.0 are binary encoded
  2. A frame is the smallest unit of data, and each frame identifies which stream it belongs to. A stream is a data stream composed of multiple frames. Multiplexing means that multiple streams can exist in a TCP connection
  3. Frames: Clients and servers communicate by exchanging frames, the smallest unit of communication based on this new protocol.
  4. Message: A logical HTTP message, such as a request, response, etc., consisting of one or more frames.
  5. Flow: A flow is a virtual channel in a connection that can carry bidirectional messages; Each stream has a unique integer identifier (1, 2… N);
  • Multiplexing – Eliminates queue head congestion

Multiplexing allows multiple request-response messages to be sent simultaneously over a single HTTP/2.0 connection. With the new framing mechanism, HTTP/2.0 no longer relies on multiple TCP connections to handle more concurrent requests. Each data stream is broken up into discrete frames that can be interleaved (sent out of order) and prioritized. Finally, they are recombined at the other end according to the stream identifier at the beginning of each frame. HTTP 2.0 connections are persistent, and only one connection (one connection per domain name) is required between the client and server.

  • Header compression – Addresses large HTTP headers

HTTP/1.1 comes with a lot of information in the header and is sent repeatedly each time. HTTP/2.0 requires each communication party to cache a header field table, thus avoiding repeated transmission.

  • Request priority – Get important data first

Browsers can dispatch requests as soon as they discover resources, assign priority to each stream, and let the server determine the optimal order of response. This way requests don’t have to queue, saving time and maximizing the use of each connection.

  • Server push – Fill gaps and improve request efficiency

Request index.html can return the first dependent JS, CSS directly, the method is in nginx configuration on the line

  • Improved security

Based on the HTTPS

Question:
  • TCP and TCP+TLS connection establishment delay: A TCP connection requires three handshakes with the server, that is, 1.5 RTT (round-trip Time).
  • TCP header blocking is not completely resolved:
    • TCP has a timeout retransmission mechanism to ensure reliable transmission. Lost packets must wait for retransmission confirmation.
    • HTTP2 packet loss causes the entire TCP to wait for retransmission, blocking all requests in the TCP connection.
  • Multiplexing causes server pressure to rise
  • Multiplexing is easy to Timeout

5. HTTP 3 [Google QUIC protocol based on UDP]

New features:
  • Improved congestion control and reliable transmission
  • Quick to shake hands
  • TLS 1.3 encryption is integrated
  • multiplexing
  • Connect the migration
Question:

NAT On some NAT networks (such as some campus networks), the UDP protocol is disabled by intermediate network devices, such as routers. In this case, the client directly degrades and selects alternative channels, such as HTTPS, to ensure normal service requests.

Second, the cache

1. Cache type

  • 200 Form memory cache: does not access the server. Usually, the resource has been loaded and cached in the server’s memory. When the browser closes, the data will not exist (the resource has been freed), and when the same page is opened again, the from Memory cache will not appear.
  • 200 From Disk cache: does not access the server. The resource has been loaded at a previous time. The cache is directly read from the hard disk.
  • Memory cache is accessed first, disk cache is accessed second, and network resources are requested last

2. Strong/negotiated cache

1) Strong cache Expires/cache-control: max-age=600

  • Expires: Expires. If you set a time, the browser will simply read the cache within the set time and will not request anything again
  • Cache-control: when max-age=300, it means that the strong Cache will be hit if the resource is reloaded within 5 minutes of the correct return time of the request, which is also recorded by the browser.
    • Private None Indicates that the response can only be cached by a single user and cannot be cached as a shared cache (that is, the proxy server cannot cache it). Public Omitted Indicates that the response can be cached by any object (including: The client that sends the request, proxy server, etc.) cache no-cache can omit the validity of the request before the cache. No-store No content of the request or response is cached. Max-age =[s] Maximum value of the required response
    • (1) Max-age: used to set how long resources can be cached, in seconds.
    • (2) s-maxage: the same as max-age, but only for proxy server cache;
    • (3) public: indicates that the response can be cached by any cache; The response can be cached by any object, including the client that sent the request, the proxy server, and so on
    • (4) private: no parameter. It can only be used for individual users and cannot be cached by proxy servers.
    • (5) no-cache: It can be omitted to force the client to send requests directly to the server, that is, each request must be sent to the server. The server receives the request and determines if the resource has changed, returns new content if it has, or 304 unchanged if it has not. This can be very misleading and can be mistaken for a response that is not cached. Cache-control: no-cache is actually cached, but the Cache evaluates the validity of the cached response to the server each time it provides response data to the client (browser).
    • (6) no-store: no parameter, no cache.

2) Negotiate cache

  • The last – the modify + the if – the modify – since http1.0
    • Last-modified: The time when the browser sent the Last modification of the resource to the server
    • The if-modified-since:
    • When a resource expires (the browser determines the max-age expiration of the cache-control identifier) and finds a last-Modified statement in the response header, the browser sends a request to the server again with if-Modified-since, indicating the request time. If the request is received, the server finds if-modified-since and compares it with the Last modified time of the requested resource (last-modified). If the Last modified time is new (large), it indicates that the resource has been modified again, and returns the latest resource, HTTP 200 OK; If the last modification time is earlier (smaller), the resource is not updated, and the response is HTTP 304.
  • ETag + if-not-match HTTP 1.1
    • Etag is an HTTP 1.1 attribute that is generated by the server (Apache or another tool) and returned to the front end to help the server control caching validation on the Web side. In Apache, the value of ETag is hashed by INode, Size, and last modified time.
    • If-not-match If the resource expires and the browser finds an Etag in the response header, the browser matches the request header if-none-match(the value is the value of the Etag). The server receives the request for comparison and decides to return either 200 or 304

Pragma: no-cache/catch-control: no-store 4) Cache priority

  • Pragma: no-cache, cache-Control, ETag, last-Modified, Pragma: no-cache, cache-Control, ETag, last-Modified, http1.0 Pragma: no-cache, cache-Control, ETag, last-Modified, http1.0 Pragma: no-cache, Cache-Control, ETag, last-Modified
  • Strong cache
    • Expires is more of a response header in an ancient browser designed to support HTTP /1.0, and is a specific point in time that can be inconsistent between the client and server, or network latency
    • Cache-control: max-age specifies the number of seconds. If both occur at the same time, the value is subject to max-age
  • Negotiate the cache
    • Generally, ETag is rarely used in distributed environments (such as CDN), because ETag depends on the hash algorithm of The Web Server. Different Web servers, versions, and configurations may result in different ETags for the same file. Of course, if you can limit all of the above information, you can also use ETag, but not always.
    • Last-modified time accuracy is a matter of seconds. If it is Modified within 1s, last-Modified will not be changed. ETag uses the summary algorithm, which can be updated in time

3. Caching caused by different behaviors

  • Enter in the URI input field and press Enter/Access through bookmarks
    • The browser finds that the resource is cached and not expired (via the Expires header or cache-control header), and uses the browser’s cached content without confirming with the server. The response content is the same as the previous response content. For example, the Date time is the time of the last response. So we can also see that the Size of the resource is from cache
  • F5/ Click refresh button in toolbar/right-click menu to reload
    • Does F5 cause the browser to send an HTTP Request to the Server anyway, even if there was an Expires header in the previous response
  • Ctl+F5
    • Ctrl+F5 wants to get a new resource from the Server completely, so not only does it send the HTTP request to the Server, but it also has no if-modified-since/if-none-match in it, forcing the Server not to return 304. Instead, the entire resource is returned intact, so that Ctrl+F5 takes longer to transfer and naturally the page Refresh process is slower. We can see that this operation returns 200 and refreshes the associated cache control time.
    • Ctrl+F5 not only removes the if-modified-since/if-none-match, but also adds some HTTP Headers to ensure that you get the latest HTTP Headers from the Server. According to HTTP/1.1 protocol, Cache not only exists in Browser terminal, but intermediate nodes (such as Proxy) between Browser and Server may also act as Cache. In order to prevent these intermediate nodes from obtaining only Cache, we need to tell them, 110: Don’t fool me with your Cache and ask for an updated copy from Upstream.
    • In Chrome 51 there are two headers that invalidate the request in the middle Cache and return the absolute fresh resource. Cache-Control: no-cache Pragma: no-cache

3. Cross-domain issues

1, the principle of protocol | | your domain port number Have a different is cross-domain server 2, performance of cross-domain request resources cannot be Shared As long as the different data source will not be able to share the localStorage 3, solution

  1. JsonP, which only supports GET requests, has the advantage of supporting older browsers and being able to request data from sites that do not support CORS.
  2. Cros, CORS supports all types of HTTP requests
  3. Nested iFrame
  4. PostMessage: The postMessage(Data, Origin) method allows scripts from different sources to communicate asynchronously, enabling cross-text file, multi-window, and cross-domain messaging
    • Safari, the parent page cannot transmit information to the iframe in the cross-domain page, using the method of URL value to achieve cross-domain storage function, can use the page URL parameter (Safari browser can support more than 64K character length)
  5. New cross-domain policy: cross-domain isolation COOP, COEP

Fourth, concurrency problems

1. Do modern browsers disconnect after an HTTP request is completed after establishing a TCP connection with the server? When will it be disconnected?

  • In HTTP/1.0, a server breaks the TCP connection after sending an HTTP response
    • Although not specified in the standard, some servers support Connection: keep-alive headers.
  • HTTP/1.1 writes the Connection header into the standard and, by default, enables persistent connections. Unless Connection: Close is specified in the request, a TCP Connection between the browser and the server is maintained for some time

2. A TCP connection can send multiple HTTP requests. 3. Can HTTP requests sent in a TCP connection be sent together (e.g., three requests sent together and three responses received together)?

  • One problem with HTTP/1.1 is that a single TCP connection can only handle one request at a time, and the start and end times of any two HTTP requests cannot overlap within the same TCP connection.
  • Pipelining is defined in the HTTP/1.1 specification
    • A client that supports persistent connections can send multiple requests within a connection (without waiting for a response from any request). The server receiving the request must send the response in the order the request was received.
    • But this feature is turned off by default in the browser. Because HTTP/1.1 is a text protocol, and the content returned does not distinguish which request to send, the order must be consistent
    • Modern browsers do not turn HTTP Pipelining on by default
    • The problem
      • Some proxy servers cannot handle HTTP Pipelining correctly.
      • Proper pipelining implementation is complex.
      • Head-of-line Blocking connection header
    • To optimize the
      • Maintains an established TCP connection with the server and processes multiple requests sequentially over the same connection.
      • Multiple TCP connections are established with the server.
  • HTTP2 provides Multiplexing Multiplexing features
    • Multiple HTTP requests can be completed simultaneously in a SINGLE TCP connection

4. Why do not need to re-establish SSL connection to refresh the page sometimes?

  • TCP connections are sometimes maintained between the browser and the server for a period of time. TCP does not need to be re-established, SSL will naturally use the previous.

5. Does the browser limit the number of TCP connections to the same Host?

  • Chrome allows up to six TCP connections to the same Host
  • If the images are all HTTPS connections and under the same domain name, the browser will discuss HTTP2 with the server after the SSL handshake
    • Multiplex using the Multiplexing function over this connection if possible
    • Cannot use HTTP2 or HTTPS
      • The browser establishes multiple TCP connections on the same HOST, and the maximum number of connections depends on the browser Settings
      • These connections are used by the browser to send new requests when idle
      • What if all the connections are sending requests? Then the other requests will have to wait.

V. Request relevance

1, get post difference

* The Get method requests a resource from the server. This resource can be static text, pages, images, videos, etc. * Post submits data to the resource specified by the URI. The data is stored in the body of the message. * GET methods are secure and idempotent, while POST is not secure and idempotent: * In the HTTP protocol, "secure" means that the request method does not "destroy" resources on the server. * The so-called "idempotent" means that the same operation is performed many times and the result is "the same".Copy the code

2. Common port numbers

  • 200
    • “204 No Content” is also a common success status code, essentially the same as 200 OK, but with No body data in the response header.
    • 206 Partial Content is applied to HTTP block download or resumable HTTP. It indicates that the body data returned by the response is not the entire resource, but a part of it. It is also the status that the server successfully processed.
  • 300
    • 301 Moved Permanently: Permanently redirects. In this case, the requested resource no longer exists and you need to use a new URL to access it again.
    • 302 Found “indicates a temporary redirect, indicating that the requested resource is still available but needs to be accessed at a different URL for the time being.
    • Both 301 and 302 use the Location field in the response header to indicate the next URL to jump to, and the browser will automatically redirect to the new URL.
    • 304 Not Modified does Not indicate a jump. It indicates that the resource has Not been Modified and redirects the existing buffer file. It is also called cache redirection and is used for cache control.
  • 400
    • 403 Forbidden Indicates that the server forbids access to resources, but the request from the client fails.
  • 500
    • 500 Internal Server Error and 400 are general Error codes. We do not know what happened to the Server.
    • 501 Not Implemented indicates that the functionality requested by the client is Not yet supported. It is similar to “opening soon, please look forward to it.”
    • 502 Bad Gateway is an error code returned when the server functions as a Gateway or proxy, indicating that the server works properly and an error occurs when accessing the back-end server.
    • 503 Service Unavailable indicates that the server is busy and cannot respond to the server temporarily, similar to the message Network Service is busy, please try again later.

6. Boundary issue

  • After the connection between A and B is normal, B suddenly restarts. What is the TCP status of A
  • How many HTTP requests can a TCP connection send

7. Reference links

  • [Manual finishing] hardcore! 30 illustrations of common HTTP interview questions
  • Development and evolution of HTTP
  • How many HTTP requests can a TCP connection send
  • Browser HTTP caching mechanism
  • Mandatory cache and negotiated cache
  • Cache priority
  • HTTP differences
  • HTTP 0.9 HTTP 1.0 HTTP 1.1 HTTP 2.0 Differences
  • Summary of HTTP cache control

LAST:

Welcome to scan code to pay attention to my public number, your attention is my motivation.

Original is not easy, if you need to reprint please [contact the author] or [signed author and indicate the source of the article]Copy the code