Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

This article is published in the cloud + community column by Tencent IVWEB team

Author: yangchunwen

HTTP protocol is a very important topic in front-end Performance and even security. Recently, I read “High Performance Browser Networking” and shared the content about HTTP and added some of my own ideas. It’s certainly not as detailed as the Definitive HTTP Guide, but it’s illuminating for understanding what we do on a regular basis. There will be two or three articles focusing on HTTP 1.1, HTTPS, and HTTP 2.0, respectively. This one will focus on HTTP 1.1 and its applications.

The history of HTTP

The HTTP 0.9

The first version of HTTP, officially called HTTP0.9, was a one-line protocol, such as:

GET /about/ (hypertext response......) (Connection closed......)Copy the code

HTTP 0.9 has several key points:

  • Client/server, request/response protocol
  • ASCII protocol, which runs over TCP/IP links
  • Designed to transfer hypertext documents (HTML)
  • The connection between the server and the client is closed after each request

This version of HTTP is primarily used to transfer text and does not share TCP connections.

The HTTP 1.0

A typical HTTP 1.0 request process would look like this:

GET/RFC /rfc1945. TXT HTTP/1.0 user-agent: cern-linemode /2.15 libwww/2.17b3 Accept: */* HTTP/1.0 200 OK contenttype: text/plain Content-Length: 137582 Expires: Thu, 01 Dec 1997 16:00:00 GMT Last-Modified: Wed, 1 May 1996 12:45:26 GMT Server: Apache 0.84 (Hypertext response......) (Connection closed......)Copy the code

HTTP 1.0 has the following major changes from its predecessor:

  • Requests and responses can be composed of multiple line header fields
  • The response object is preceded by a response status line
  • Response objects are not limited to hypertext
  • The connection between the server and the client is closed after each request
  • Realizes the cache control of the transmitted content such as Expires
  • Content Encoding accept-encoding, character set Accept-charset and other negotiation contents are supported

This is where the concept of request and return headers comes in, and you start transferring more than just text (other binary content)

The HTTP 1.1

HTTP 1.1 is the protocol version used by most current applications. The semantic format of HTTP 1.1 remains largely unchanged from previous version 1.0, but it adds many important performance improvements: persistent connections, chunk-encoded transport, byte range requests, enhanced caching, transport encoding, and request pipelines.

In fact, persistent links were later reverse-ported to HTTP1.0

The HTTP 2.0

The main goal of HTTP 2.0 is to improve transport performance and achieve low latency and high throughput. HTTP 2.0 has a number of performance optimizations. On the other hand, HTTP’s high-level protocol semantics are not affected by this update. All HTTP headers, values, and their usage scenarios remain the same. Any existing web site or application can run on HTTP 2.0 without any modifications. In other words, when our servers and clients (such as browsers) all support HTTP 2.0, we won’t have to change the markup and do a lot of extra coding to take advantage of HTTP 2.0, but we’ll enjoy the lower latency and higher network connection utilization delivered!

The content of HTTP 2.0 will be published in the next or next article, and will not be touched up too much in this article

HTTP 1.1 and front-end performance

As mentioned earlier, HTTP 1.1 introduces a number of important performance enhancing features, including:

  • Persisting connections to support connection reuse
  • Block transfer encoding to support streaming response
  • Request pipelines to support parallel request processing
  • Byte services to support scope-based resource requests
  • Improved and better caching mechanism

Here I will focus on some applications of persistence and pipes in front-end performance optimization

A persistent connection

Persistent connections reuse TCP connections. Multiple HTTP requests share a TCP connection.

HTTP 1.1 changed the semantics of the HTTP protocol to use persistent connections by default. In other words, unless explicitly told (via the Connection: Close header), the server leaves the TCP Connection open by default. If you’re using HTTP 1.1, you technically don’t need the Connection: keep-alive header, but many clients choose to include it anyway. For example, our browsers will carry the Connection: keep-alive header by default when making requests.

Let’s take a look at why persistent connections are so important to us.

Assume that a web page contains only one HTML document and one CSS style file. The server takes 40ms and 20ms to respond to these two files. The server and the visitor are in Harbin and Shenzhen respectively.

  1. The first is the request for the HTML document:

After the HTML is downloaded, the TCP connection is closed.

  1. Second, the REQUEST for the CSS resource is initiated, again through a TCP handshake

It can be seen that both HTTP requests need to undergo a TCP three-way handshake. In addition, what is not reflected in the figure is that each TCP request may undergo a TCP slow start process, which is an important factor affecting propagation performance.

If our underlying TCP connection were reused, the situation would look like this:

Obviously, a handshake round trip is reduced in the request to get the CSS.

Initially, each request uses two TCP connections with a total latency of 284ms. After using persistent connections, a handshake round trip is avoided and the total latency is reduced to 228ms. The two requests saved 56ms (one RTT, round-trip Time)

While the above example is a simple hypothetical case with only one HTML and one CSS, the real world Web has many more HTTP requests than this, and with persistent connections enabled, the total latency saved for N requests is (n-1)×RTT.

In the real world, with higher latency and more requests, the performance gains are much higher. In fact, the higher the network latency, the more requests, and the more time you save. In practice, the total time saved can be measured in seconds. Imagine how much time would be wasted if every HTTP restarted a TCP connection!

The HTTP pipeline

Persistent HTTP allows us to reuse existing connections to complete multiple application requests, but multiple requests must follow a strict FIFO (first in first out) queue order: send the request, wait for the response to complete, and then send the next request in the client queue.

To take the example of persistent connections from the previous section, first, after the server processes the first request, a full round trip occurs: a response is sent back, followed by a second request, and the server is idle in the time between the second request reaching the server.

What if the server could start processing the second request as soon as it had processed the first? Or even, what if the server could process two requests in parallel?

This is where the HTTP pipeline comes in, a small but important optimization for the workflow described above.

With the HTTP pipeline, we can have multiple HTTP requests in parallel rather than one after another, which seems ideal:

As shown above, HTML and CSS requests arrive at the server at the same time, the server processes them at the same time, and then returns.

This time, another round trip between two requests was reduced by using HTTP pipes, reducing the total latency to 172 ms. From 284ms with no persistent connections and no pipes to 172ms optimized, this 40% performance improvement is entirely due to simple protocol optimization.

Wait a minute, what’s not good enough about that example: if the request arrives and is processed at the same time, why return HTML first and then CSS? Can’t both return at the same time?

This is a big limitation of the HTTP 1.1 pipeline: HTTP requests do not take advantage of multiplexing and do not allow multiple responses on a connection to be staggered back (multiplexing). Therefore, a response must be fully returned before the next response can begin transmission.

This pipe simply lets us migrate the FIFO queue from the client to the server. That is, the request can arrive at the server at the same time, and the server can process both files at the same time, but both files still have to be returned to the user in order, as shown below:

  • HTML and CSS requests arrive at the same time, but the HTML request is processed first
  • The server processes both requests in parallel, 40ms for HTML and 20ms for CSS
  • CSS requests are processed first, but are buffered to wait for the HTML response to be sent first
  • After sending the HTML response, send the CSS response in the server buffer

As you can see, even if the client sends two requests at the same time and the CSS resource is ready first, the server will send the HTML response before delivering the CSS.

As an aside, the example in the previous two sections, where HTML and CSS requests arrive at the same time, is an example from the book, and actually, PERSONALLY, I don’t think it’s a very good example. In the real World of the Web, HTML and the CSS it contains don’t usually arrive at the server at the same time, nor do normal waterfall diagrams, where the HTML content is retrieved before the browser can request CSS and other resources. I think the author is just trying to explain the principle, I think it would be more appropriate to replace CSS and JS in the same HTML document.

The principle of this problem lies in the TCP level of “queue head blocking”, interested can go to review the computer network course. The trade-off is often underutilization of network connections, server buffering overhead, and potentially greater client latency. More seriously, if the previous request hangs indefinitely or takes a long time to process, all subsequent requests will be blocked waiting for it to complete.

As a result, the use of pipelining in HTTP 1.1 is very limited, although its benefits are undeniable. In fact, some browsers that do support pipes often include them as an advanced configuration option, but most browsers disable them. In other words, as a front-end engineer, if you’re developing an application for a normal browser, don’t expect too much of the HTTP pipeline, and expect the pipeline optimization in HTTP 2.0.

However, there are some applications that make good use of the HTTP pipeline. For example, at WWDC 2012, an Apple engineer shared a great example of how HTTP optimization worked: Reuse existing TCP connections in iTunes by using HTTP persistent connections and pipes to triple performance for users with low Internet speeds!

In fact, if you want to take full advantage of pipelines, you must make sure that:

  • HTTP clients support pipelines
  • HTTP servers support pipelines
  • Applications can handle broken connections and recover
  • Applications can handle idempotent interrupt requests
  • Applications can protect themselves from bad agents

Because iTunes servers and clients are controlled by developers, they can meet these criteria. This may give some insight to front-end engineers who are developing hybrid applications or web applications other than browsers, but if you’re developing a web site for people who use a variety of browsers, you might not be able to.

Use multiple TCP connections

Because of the drawbacks above, HTTP 1.1 pipes are underutilized. So the question is: assuming that no HTTP pipes are used, all of our HTTP requests are returned sequentially over persistent connections, one after the other. How slow would that be?

In fact, current browser vendors have taken a different approach to the HTTP 1.1 pipeline flaw: allowing us to open multiple TCP sessions in parallel. As for how many, you’ve probably already seen them: anywhere from four to eight. This is where the realization that the browser, familiar to front-end engineers, only allows 4 to 8 resources to be loaded in parallel from the same server really comes from.

HTTP persistent connection helps us solve the problem of TCP connection reuse, but the current HTTP pipeline can not realize the staggered return of multiple request results, so the browser can only open multiple TCP connections to achieve the purpose of loading resources in parallel.

Suffice it to say, this is a stopgap measure to get around the application protocol (HTTP) restrictions. To use an analogy, a water pipe cannot carry multiple liquids at the same time, so there is only one transportation pipe for each liquid. When will the water pipe be intelligent enough to carry different liquids at the same time, and ensure their integrity and undisturbed arrival at the destination and self-classification at the destination? Again, expect HTTP 2.0.

Why is four to eight the number of connections here, is the result of the multiple balance: this number is larger, the client and the server resource usage more (in high concurrent access to the server for overhead caused by the TCP connection can not be ignored), each host four to eight connection is just everyone feel more secure a number.

Domain partition

As mentioned earlier, the browser and the server can only have 4 to 8 CONCURRENT TCP connections, which means that 4 to 8 resources can be downloaded at the same time. Is that enough?

Look at most of our websites now, at every turn dozens of JS, CSS, six at a time, will cause a large number of resources waiting in line behind; In addition, download only 6 resources, bandwidth utilization is also very low.

To give an analogy, a factory installed 100 pipes, but only 6 of them can be used for water each time, both slow and waste water pipes!

So, we have a best practice for front-end performance optimization: use domain partitions!

Yeah, why limit yourself to just one console? We can manually spread all resources across multiple subdomains, and since the host names are different, we can break through the browser connection limits and achieve higher parallelism.

By “tricking” the browser in this way, the number of parallel transfers between the browser and the server increases.

The more domain partitions you use, the more parallelism you get!

However, domain zoning comes at a cost!

In practice, domain name partitions are often abused.

For example, if your application is targeted at mobile users on 2G networks, you assign several domain names and load dozens of CSS and JS at the same time, the problem here is:

  • Each domain name will add DNS query overhead, which is the cost of extra machine resources and extra network latency. A DNS query on a 2G network is not like a DNS query on your company’s computer, but can be delayed by several seconds
  • Loading multiple resources at once, with the meagre bandwidth of a 2G network, often results in full bandwidth and slow downloading of each resource
  • Cell phones consume power faster

Therefore, in some low bandwidth and high latency scenarios, such as 2G mobile networks, domain partition will not bring front-end performance improvement, but become a performance killer.

Domain name partitioning is a reasonable but imperfect optimization method. The most appropriate method is to start with the minimum number of partitions (no partitions), and then increase the number of partitions one by one and measure the impact of partitions on applications to obtain an optimal number of domain names.

Join and splice

One of our best practices for front-end performance optimization is to merge and package JS, CSS files, and CSS Sprites.

Now we know why we did this, but it’s actually because HTTP 1.1 piping is so weak right now that the effect of the two techniques is to implicitly enable HTTP piping: data from multiple responses are connected one after the other, eliminating additional network latency.

In effect, it’s a layer of piping built into the application, so maybe in the HTTP 2.0 era, the front-end engineer won’t have to do this kind of work anymore? (HTTP 2.0, part 2)

Of course, jointing comes at a cost.

  • CSS Sprite, for example, requires the browser to analyze the entire image and keep the entire image in memory at all times, even if only a small piece of it is actually displayed. The browser has no way of removing parts from memory that are not displayed.
  • Moreover, since JS, CSS merge, generally is the increase in volume, in the bandwidth limited environment (such as 2G) download time is longer, generally lead to page rendering time delay and other consequences. Since neither JavaScript nor CSS processors allow incremental execution, parsing and execution of JavaScript and CSS will have to wait until the entire file has been downloaded.

How much of a package is appropriate? Unfortunately, there is no ideal size. However, tests by Google’s PageSpeed team show that 30 to 50 KB (compressed) is the right range for each JavaScript file size: large enough to reduce network latency associated with small files while ensuring incremental and hierarchical execution. The exact results may vary depending on the type of application and number of scripts.

Resources are embedded

JavaScript and CSS code can be placed directly on the page with appropriate script and style blocks, while images and even audio or PDF files can be embedded in the page via data URIs (Data :[Mediatype][;base64],data).

This approach is called resource embedding.

Embedding resources is another popular optimization method. Embedding resources into documents can reduce the number of requests. Especially in 2G networks, embedded resources can effectively reduce the delay caused by multiple requests. You can refer to some practices in 2G in this article.

Of course, there are drawbacks:

  • Inline resources cannot be cached as separate resources by browsers, CDN, or other caching agents. If you embed the same resource in multiple pages, the resource will be loaded as each page loads, increasing the overall size of each page.
  • If the embedded resource is updated, all pages that previously appeared on it are invalidated and refetched by the client from the server.
  • The base64 encoding of non-textual resources, such as images, will result in a significant increase in overhead: the size of the encoded resources will increase by 33% compared to the original size!

Google’s experts offer some lessons:

  • Only resources that are embedded below 1 to 2 KB are considered, because resources that are less than this often result in higher HTTP overhead than they are
  • If the file is small and only used by individual pages, consider embedding. Ideally, it is best to use resources only once
  • If the file is small but needs to be reused across multiple pages, consider centralized packaging
  • If small files need to be updated frequently, don’t embed them
  • Minimize protocol overhead by reducing the size of HTTP cookies

summary

This article introduces some of the applications of HTTP 1.1 for front-end performance optimization, some of which are things you have to do to get around the limitations of HTTP 1.1, such as resource consolidation, compression, embedding, and so on. These are some of the “black magic” solutions to problems before HTTP 2.0 came along.

HTTP 1.1 and its use are certainly not as simple as this article suggests, but I’ve just condensed a few of them for your interest in the Definitive GUIDE to HTTP.

Question and answer

How to build the front end of BDD framework?

reading

Full Advanced H5 Live (part 1)

NodeJs memory management

WebGL Texture color principle

Machine learning in action! Quick introduction to online advertising business and CTR knowledge

This article has been authorized by the author to Tencent Cloud + community, more original text pleaseClick on the

Search concern public number “cloud plus community”, the first time to obtain technical dry goods, after concern reply 1024 send you a technical course gift package!

Massive technical practice experience, all in the cloud plus community!