preface

Usually when we talk about queue header blocking, we may refer to TCP queue header blocking, but HTTP1.1 also has a similar problem with TCP queue header blocking, which is described below.

The TCP queue header is blocked

Head-of-line blocking occurs when a TCP section is lost, causing its subsequent sections to arrive at the receiver out of order. This subsequent section is maintained by the receiving end until the lost first section is retransmitted by the sending end and reaches the receiving end. The delayed delivery of this subsequent section ensures that the receiving application process receives the data in the order in which the sender sent it. This delay mechanism introduced in order to achieve complete order is very useful, but it also has a downside.

Suppose you send semantically independent messages over a single TCP connection. For example, the server might send three different images for the Web browser to display. To create the effect that the images are displayed in parallel on the user’s screen, the server sends a fragment of the first image, then a fragment of the second image, and then a fragment of the third image. The server repeats this process until all three images are successfully sent to the browser.

If a TCP segment of a fragment of the first image is lost, the client will hold all the out-of-order data that has arrived until the lost segment is successfully retransmitted. This not only delays the delivery of the first image data, but also delays the delivery of the second and third image data.

HTTP queue header is blocked

The above example uses a browser request for an image resource, but in fact HTTP itself has similar TCP queue header blocking. To introduce HTTP queue blocking, we need to talk about PIPELining HTTP.

What is HTTP pipelining

HTTP1.1 allows the optional use of request pipes on persistent connections. This is another performance optimization over keep-alive connections. Multiple requests can be queued before they arrive. When the first request is sent to the server, the second and third requests can also be sent. This reduces network loopback time and improves performance on high-latency networks.

The difference between non-pipetization and pipetization

Background to HTTP pipelining

In general, HTTP follows a “request-response” pattern, where the client sends a request to the server one at a time and the server returns a response. This model is very easy to understand, but the efficiency is not that high. In order to improve the speed and efficiency, people have made many attempts:

  • In the simplest case, the server closes the connection once it returns a response, and the client’s multiple requests are actually serial.
  • In addition, clients can choose to create multiple connections at the same time and send different requests in parallel on multiple connections. But creating more connections also costs more, and most browsers today limit the number of connections to the same domain.
  • The concept of persistent connections (HTTP1.0 keep-alive and HTTP1.1 persistent) has been added since HTTP1.0, enabling HTTP to reuse already created connections. After receiving a response from the server, the client can reuse the last connection and send the next request instead of reestablishing the connection.
  • Most modern browsers speed up access by sharing parallel and persistent connections, setting up a small number of concurrent persistent connections for each domain name.
  • HTTP1.1 further supports pipelining over persistent connections on top of persistent connections. Pipelining allows the client to send the next request before the sent request has received the response from the server, thereby reducing waiting time and improving throughput. Network utilization can also be improved if multiple requests can be sent on the same TCP section. But because HTTP pipelining itself can cause queue blocking problems, among other things, modern browsers turn pipelining off by default.

Limitations of HTTP pipelining

  1. Pipelining requires the server to return a response (FIFO) in the order in which the request was sent, for the simple reason that HTTP requests and responses are not numbered and cannot be associated with out-of-order responses to requests.
  2. The client needs to keep the unresponded requests. When the connection is interrupted unexpectedly, the client needs to resend the requests.
  3. Only idempotent requests can be piped, that is, only GET and HEAD requests can be piped, otherwise unexpected results may occur

Queue header congestion caused by HTTP pipelining

As mentioned earlier, HTTP pipelining requires the server to return the response in the order in which the request was sent, so if one response is returned late, its subsequent responses will be delayed until the header response arrives.

How to solve the head block

How do I resolve HTTP queue header blocking

Request/response level queue head blocking caused by pipelining in HTTP1.1 can be resolved using HTTP2. HTTP2 does not use pipelinization, but introduces the concept of frames, messages, and data streams. Each request/response is called a message, and each message is split into several frames for transmission, each frame is assigned a serial number. Each frame belongs to a data stream during transmission, and there can be multiple streams on a connection. Each frame is transmitted independently on the stream and connection, and then assembled into a message upon arrival, thus avoiding request/response blocking.

Of course, even with HTTP2, TCP queue header blocking can occur if the underlying HTTP2 protocol is TCP.

How do I resolve TCP header blocking

The generation of queue header blocking in TCP is determined by the implementation mechanism of TCP itself, so it can’t be avoided. The only way to avoid the effects of TCP queue header blocking in your application is to abandon TCP.

For example, Google’s QuIC protocol, to some extent, can be said to avoid the TCP queue blocking, because it does not use TCP protocol, but on the basis of UDP protocol to achieve reliable transmission. UDP is a datagram oriented protocol, and there are no blocking constraints between datagrams.

There is also SCTP (Flow Control Transport Protocol), which is a transport protocol on the same level as TCP and UDP. The multi-stream feature of SCTP also prevents queue head blocking as much as possible.

conclusion

From the TCP and HTTP queue header blocking causes we can see that there are two reasons for queue header blocking:

  1. Individual message data are transmitted on a single link, that is, there is a “queue”. For example, TCP has only one stream, and multiple HTTP requests share a TCP connection
  2. The data transmitted on the queue is strictly sequential. For example, TCP requires that data be returned in strict sequential order, and HTTP pipelining requires that responses be returned in strict request order

So to avoid queue header blocking, you need to start from the above two aspects, for example, quIC uses UDP instead of TCP, SCTP supports multiple data streams on a connection, and so on.