How does HTTP speed up network loading

This article has participated in the third “topic writing” track of the Denver Creators Training Camp. For details, check out: Digg Project | Creators Training Camp third is ongoing, “write” to make a personal impact

1. Introduction

When it comes to how HTTP improves the speed of network loading, we have to talk about HTTP/2.

The main goals of HTTP/2 are to reduce latency by enabling full request and response reuse, to minimize protocol overhead by effectively compressing HTTP header fields, and to increase support for request priority and server push.

HTTP/2 does not change the application semantics of HTTP. The core concepts of HTTP methods, status codes, URIs, and header fields remain the same. However, HTTP/2 modifies the way data is formatted (framed) and transferred between the client and server. These two points collectively hide all complexity from our application through the new framing layer. Therefore, all existing applications can run under the new protocol without modification.

2. Binary frame division layer

At the heart of all HTTP/2 performance enhancements lies the new binary framing layer, which defines how HTTP messages are encapsulated and transmitted between clients and servers.

By “layer,” I mean a new encoding mechanism optimized between the socket interface and the high-level HTTP API visible to the application: the semantics of HTTP (including verbs, methods, headers) are unaffected, but the way they are encoded during transport changes. The HTTP/1.x protocol uses a newline as a separator for plain text, while HTTP/2 splits all transmitted information into smaller messages and frames and encodes them in a binary format.

3. Data flows, messages, and frames

The new binary framing mechanism changes the way data is exchanged between clients and servers. To illustrate this process, we need to understand three concepts of HTTP/2:

The data flow: A bidirectional byte stream within an established connection that can carry one or more messages.
The message: A complete sequence of frames corresponding to a logical request or response message.
frame: The smallest unit of HTTP/2 communication. Each frame contains a frame header and at least identifies the data stream to which the current frame belongs.

The relationship between these concepts is summarized as follows:

All communication is done over a SINGLE TCP connection, which can host any number of two-way data streams.
Each data flow has a unique identifier and optional priority information for carrying two-way messages.
Each message is a logical HTTP message (such as a request or response) that contains one or more frames.
A frame is the smallest unit of communication and carries specific types of data, such as HTTP headers, message payloads, and so on. Frames from different data streams can be sent interlaced and then reassembled according to the data stream identifier for each frame header.

In short, HTTP/2 breaks down HTTP protocol communication into an exchange of binary-encoded frames that correspond to messages in a particular data stream. All of this is multiplexed within a TCP connection. This is the basis for all other features and performance optimizations of the HTTP/2 protocol.

4. Request and response reuse

In HTTP/1.x, if a client wants to make multiple parallel requests to improve performance, it must use multiple TCP connections (see Using Multiple TCP Connections). This is a direct result of the HTTP/1.x delivery model, which ensures that only one response (response queuing) is delivered per connection at a time. Worse, this model also leads to queue head blocking, which makes the underlying TCP connection inefficient.

The new binary framing layer in HTTP/2 breaks through these limitations and enables complete request and response reuse: clients and servers can break HTTP messages into discrete frames, send them interlaced, and finally reassemble them at the other end.

A snapshot captures multiple data flows in parallel within the same connection. The client is sending a DATA frame (Stream 5) to the server, while the server is sending a series of interleaved frames of stream 1 and stream 3 to the client. Thus, there are three parallel data flows on one connection.

Breaking HTTP messages into separate frames, interleaving them, and then reassembling them at the other end is one of the most important enhancements to HTTP 2. In fact, this mechanism will set off a chain reaction throughout the network technology stack that will lead to huge performance gains, allowing us to:

Multiple requests are sent in parallel and interleaved, without affecting each other.
Multiple responses are sent interleaved in parallel and do not interfere with each other.
Use a single connection to send multiple requests and responses in parallel.
You don’t have to do much to circumvent HTTP/1.x restrictions (see optimizations for HTTP/1.x, such as cascading files, Image sprites, and domain sharding.
Reduce page load times by eliminating unnecessary latency and increasing utilization of existing network capacity.
And so on…

The new binary framing layer in HTTP/2 addresses the problem of queue head blocking in HTTP/1.x and eliminates the need for multiple connections to process and send requests and responses in parallel. As a result, applications are faster, easier to develop, and cheaper to deploy.

5. Data flow priority

After splitting the HTTP message into many independent frames, we can reuse frames from multiple data streams, and the order in which the client and server interleave to send and transmit these frames becomes a key performance determinant. To do this, the HTTP/2 standard allows each data stream to have an associated weight and dependency:

Each data stream can be assigned an integer between 1 and 256.
There can be explicit dependencies between each data flow and other data flows.

The combination of data flow dependencies and weights allows a client to build and pass a “priority tree” indicating how it prefers to receive responses. In turn, the server can use this information to prioritize data flow processing by controlling the allocation of CPU, memory, and other resources. After resource data is available, bandwidth allocation ensures that high-priority responses are optimally transmitted to the client.

Data flow dependencies within HTTP/2 are declared by referring to the unique identifier of another data flow as a parent item; If you omit the identifier, the corresponding data flow will depend on the “root data flow”. Declaring data flow dependencies states that, whenever possible, resources should be allocated to the parent data flow before they are allocated to its dependencies. In other words, “Please process and transmit response D before processing and transmitting response C.”

Data flows that share the same parent item (that is, sibling data flows) should be allocated resources in proportion to their weight. For example, if data flow A has A weight of 12 and its peer data flow B has A weight of 4, to determine the proportion of resources each data flow should receive, do the following:

Resum ownership:4 plus 12 is 16
Divide the weight of each data stream by the total weight:A = 12/16, B = 4/16

Therefore, data flow A should get three quarters of the available resources and data flow B should get one quarter of the available resources; Data flow B receives one-third as many resources as data flow A does.

Let’s look at a few other examples of operations shown above. From left to right:

Neither data flows A nor B specify A parent dependency, relying on the implicit “root data flow”; A has A weight of 12 and B has A weight of 4. Therefore, according to proportional weights: data flow B receives one-third of the resources that A receives.
Data stream D depends on the root data stream; C depends on D. Therefore, D should obtain full resource allocation before C. The weights don’t matter because C’s dependencies have higher priority.
Data flow D should get full resource allocation before C; C shall receive full resource allocation prior to A and B; Data flow B receives one-third as many resources as A does.
Data stream D should get full resource allocation before E and C; E and C should receive the same allocation of resources as A and B; A and B should be proportionally allocated based on their weights.

As shown in the example above, the combination of data flow dependencies and weights explicitly expresses resource priority, a key function used to improve browsing performance. There are many resource types in the network with different dependencies and weights. Furthermore, the HTTP/2 protocol allows clients to update these priorities at any time, further optimizing browser performance. In other words, we can change dependencies and reassign weights based on user interactions and other signals.

Note: Data flow dependencies and weights represent transfer priorities, not requirements, so a particular processing or transfer order cannot be guaranteed. That is, the client cannot force the server to process data flows in a particular order by flow priority. While this may seem counterintuitive, it is a necessary behavior. We do not want to prevent the server from processing lower-priority resources while higher-priority resources are blocked.

6. One connection per source

With the new frame splitting mechanism, HTTP/2 no longer relies on multiple TCP connections to reuse data streams in parallel. Each data stream is split into frames that can be interleaved and prioritized. Therefore, all HTTP/2 connections are permanent and require only one connection per source, with many performance benefits.

The killer feature of SPDY and HTTP/2 is that it can be reused at will over a well-controlled channel. I was surprised by the importance and health of this feature. One really nice metric I like is connection splits, which host only one HTTP transaction (and therefore make that transaction bear all the overhead). For HTTP/1, 74% of our active connections hosted only one transaction – permanent connections are not as useful as we all hoped. In HTTP/2, however, this percentage drops to 25%. This is a huge achievement in reducing costs. (HTTP/2 login to Firefox, Patrick McManus)

While most HTTP transfers are short and hurried, TCP is optimized for long, bulk data transfers. By reusing the same connection, HTTP/2 can both make more efficient use of each TCP connection and significantly reduce the overall protocol overhead. Not only that, but using fewer connections can reduce memory footprint and processing space, as well as shorten the full connection path (that is, the path between the client, trusted intermediary, and source server), which reduces overall operating costs and improves network utilization and capacity. Therefore, moving to HTTP/2 not only reduces network latency, but also helps improve throughput and reduce operating costs.

Note: Reducing the number of connections is a particularly important feature for improving the performance of HTTPS deployments: reducing the number of expensive TLS connections, increasing session reuse, and overall reducing the need for client and server resources.

7. Flow control

Flow control is a mechanism that prevents a sender from sending a large amount of data to a receiver in case it exceeds the latter’s needs or processing capacity: the sender may be very busy, under high load, or may only want to allocate a fixed amount of resources to a particular data stream. For example, a client might request a large video stream with a high priority, but the user has paused the video, and the client now wants to pause or limit the transfer from the server to avoid extracting and buffering unnecessary data. For example, a proxy server may have fast downstream connections and slow upstream connections, and may want to control its resource utilization by adjusting the data transfer speed of the downstream connection to match that of the upstream connection. And so on.

Do these requirements make you think of TCP flow control? You should think of that; Because the problem is basically the same (see flow control). However, because HTTP/2 data flows are multiplexed within a TCP connection, TCP flow control is neither fine-grained nor provides the necessary application-level APIS to regulate the transmission of individual data flows. To address this issue, HTTP/2 provides a simple set of building blocks that allow clients and servers to implement their own data flow and connection-level flow control:

Flow control is directional. Each receiver can choose to set any window size for each data stream and for the entire connection according to their needs.
Flow control is based on credit. Each receiver can publish its initial connection and data flow flow control window (in bytes) whenever the sender sends itDATAFrames are all reduced and sent at the receiverWINDOW_UPDATEIncreases at frame time.
Flow control cannot be deactivated. After an HTTP/2 connection is established, the client will exchange with the serverSETTINGSFrame, which sets the flow control window in both directions. The default value for the flow control window is set to 65,535 bytes, but the receiver can set a larger maximum window size (2 ^ 31-1Bytes), and is sent when any data is receivedWINDOW_UPDATEFrames to maintain this size.
Flow control is hop – by – hop rather than end – to – end control. That is, trusted mediations can use it to control resource usage and implement resource allocation mechanisms based on their own conditions and heuristic algorithms.

HTTP/2 does not specify any specific algorithm to implement flow control. However, it provides simple building blocks and postpones client and server implementations, enabling custom policies to regulate resource usage and allocation, and enabling new transport capabilities while improving the actual and perceived performance of web applications (see Speed, Performance, and human Perception).

For example, applying laminar flow control allows the browser to extract only a portion of a specific resource, pause the extraction by reducing the data flow flow control window to zero, and resume it later. In other words, it allows the browser to extract the image preview or first scan results, display them and allow other high-priority extracts to continue, then resume the extraction after the more critical resources have finished loading.

8. Server push

Another powerful new feature added to HTTP/2 is the ability for the server to send multiple responses to a single client request. In other words, in addition to the response to the initial request, the server can push additional resources to the client (Figure 12-5) without the client explicitly requesting them.

Note: HTTP/2 breaks strict request-response semantics and supports one-to-many and server-initiated push workflows, opening up whole new possibilities for interaction within and outside the browser. This is an enablement feature that has important long-term implications for how we think about protocols, what they do, and how they are used.

Why do you need such a mechanism in a browser? A typical web application contains multiple resources, and clients need to examine the documents provided by the server to find them one by one. So why not have the server push these resources ahead of time to reduce the extra latency? Server push comes in handy when the server already knows what resources the client is going to request next.

In fact, if you’ve ever inlined CSS, JavaScript, or other assets through data URIs (see Resource inlining) in a web page, you’ve already experienced server push firsthand. By manually inlining resources into documents, we are actually pushing resources to the client, rather than waiting for the client to request them. With HTTP/2, we can not only achieve the same results, but also gain other performance benefits. Push resources can be handled as follows:

Cached by the client
Reuse across different pages
Reuse with other resources
Priority is set by the server
Description Rejected by the client

PUSH_PROMISE 101

All server push streams are initiated by a PUSH_PROMISE frame, which indicates the server’s intention to push the resource to the client and requires the transmission of response data prior to the request to push the resource. This transfer order is important: the client needs to know which resources the server intends to push to avoid creating duplicate requests for those resources. The simplest strategy to satisfy this requirement is to send all PUSH_PROMISE frames containing HTTP headers for the promised resource before the parent response (that is, DATA frames).

After the client receives a PUSH_PROMISE frame, it can choose to reject the data stream (via the RST_STREAM frame) as appropriate. (This can happen, for example, if the resource is already in the cache.) This is a significant improvement over HTTP/1.x. By contrast, using resource inlining (a popular HTTP/1.x “optimization”) is tantamount to “force-pushing” : clients cannot choose to reject, cancel, or process inlined resources separately.

With HTTP/2, the client still has complete control over how the server push is used. The client can limit the number of data streams pushed in parallel. Adjust the initial flow control window to control the amount of data pushed when the data flow is first opened; Or stop server push altogether. These priorities are transmitted via the SETTINGS frame at the beginning of an HTTP/2 connection and may be updated at any time.

Each pushed resource is a data stream. Unlike embedded resources, clients can reuse, prioritize, and process pushed resources one by one. The only security restriction imposed by the browser is that pushed resources must comply with the policy of having the same origin: the server must have authority over the content being served.

9. Header compression

Each HTTP transport carries a set of headers that describe the transferred resource and its properties. In HTTP/1.x, this metadata is always in plain text, typically adding 500-800 bytes of overhead per transfer. If HTTP cookies are used, the added overhead can sometimes reach thousands of bytes. (See Measuring and Controlling protocol overhead.) To reduce this overhead and improve performance, HTTP/2 compresses request and response header metadata using the HPACK compression format, which uses two simple but powerful techniques:

This format reduces the size of the individual transports by supporting the encoding of the header fields of the transports through static Huffman code.
This format requires both the client and server to maintain and update an index list of previously seen header fields (in other words, it establishes a shared compression context), which is then used as a reference to effectively encode previously transmitted values.

With Huffman encoding, individual values can be compressed at transmission, while with an index list of previously transmitted values, duplicate values can be encoded by transferring index values that can be used to efficiently query and reconstruct complete header key-value pairs.

As a further refinement, the HPACK compression context contains a static table and a dynamic table: the static table is defined in the specification and provides a list of common HTTP header fields (for example, valid header names) that all connections may use; The dynamic table is initially empty and will be updated based on the values exchanged within a particular connection. Therefore, using static Huffman encoding for previously unseen values and replacing indexes of existing values in static or dynamic tables on each side can reduce the size of each request.

Note: In HTTP/2, the definition of the request and response header fields remains the same, with minor differences: all header field names are lowercase, and the request line is now split into: Method, : Scheme, : Authority, and: PATH pseudo-header fields.

HPACK security and performance

Earlier versions of HTTP/2 and SPDY used Zlib (with a custom dictionary) to compress all HTTP headers. This reduces the size of the transmitted header data by 85%-88% and significantly reduces page load latency:

On low-bandwidth DSL links with uplink speeds as low as 375 Kbps, compression of request headers alone significantly reduces page load times for specific sites (i.e., sites that make a lot of resource requests). We found that page load times were reduced by 45-1142 milliseconds due to header compression alone. (SPDY White paper, chromium.org)

10. Read about it

“HTTP/2” – Full article by Ilya Grigorik
“Setting up HTTP/2” – How to set up HTTP/2 in different backends, by Surma
HTTP/2 is here to stay, let’s optimize it!” — Ilya Grigorik’s presentation at Velocity 2015
“Rules of thumb for HTTP/2 Push” — Analysis of when and how to use push by Tom Bergan, Simon Pelchat, and Michael Buettner.