Rambling on HTTP performance tuning

This article focuses on HTTP optimization, which will be covered in subsequent articles on HTTPS.

Now that we’re doing performance tuning, we need to know: What is performance? What indicators does it have, how should it be measured, and what measures should be taken to optimize it?

Performance is a complex concept. It can be defined differently by different people and different application scenarios. For HTTP, it is a very complex system with so many roles that it is difficult to describe performance in one or two simple words.

Let’s start with the most basic HTTP request-reply model. There are two roles in this model: the client and the server, and the transport link in between.

The HTTP server

Let’s take a look at the server, it generally runs on the Linux operating system, with Apache, Nginx and other Web server software to provide services, so the meaning of performance is its service ability, that is, as much as possible, as fast as possible to handle user requests.

There are three primary metrics for measuring server performance: Requests per second, Concurrency, and Time per Request.

Throughput is often referred to as RPS, the number of requests per second, also known as TPS, QPS, it is the most basic performance index of the server, the higher the RPS, the better the performance of the server.

Concurrency reflects the load capacity of the server, that is, the number of clients the server can support at the same time, of course, the more the better, to serve more users.

The response time reflects the processing capacity of the server. The shorter the response time, the more users the server can provide services per unit of time, improving throughput and concurrency.

In addition to the above three basic performance indicators, the server also needs to consider the usage of system resources, such as CPUS, memory, hard disks, and network adapters. High or low usage may cause problems.

Over the years of HTTP’s evolution, there have been a number of mature tools, open source, commercial, command-line, graphical, to measure the performance of these servers.

On Linux, the most commonly used performance testing tool is probably AB (Apache Bench). For example, the following command specifies the number of concurrent requests to be sent, 100, for a total of 10,000 requests:

ab -c 100 -n 10000 'http://www.xxx.com'
Copy the code

System resource monitoring, Linux own tools are also very many, commonly used uptime, top, vmstat, netstat, SAR and so on, you may be more familiar than I am, I will list a few simple examples:  copy code

Top // Check the CPU and memory usage vmstat 2 // Check the system status every two seconds SAR -n DEV 2 // Check the traffic of all nics every two secondsCopy the code

With these performance metrics in mind, we know where to go to optimize server performance: use system resources wisely, increase server throughput and concurrency, and reduce response time.

The HTTP client

Having looked at the server performance metrics, let’s look at how to measure client performance.

Clients are consumers of information, and all data needs to be fetched from the server over the network. Therefore, its most basic performance metric is latency.

Latency was briefly introduced earlier in HTTP/2. The so-called “delay” is actually “wait”, the time spent waiting for data to arrive at the client. But because HTTP transport links are complex, there are many reasons for the delay.

First of all, we must keep in mind that there is an “insurmountable” barrier – the speed of light – and the delay caused by geographical distance is insurmountable. Visiting a website thousands of kilometers away will obviously have an even greater delay.
Secondly, the second factor is bandwidth, which includes the cable, WiFi, 4G and the various bandwidths of the internal network and the network between operators when accessing the Internet. Each of them may become the bottleneck of data transmission, reducing transmission speed and increasing delay.
The third factor is DNS query, if the domain name is not cached locally, you have to initiate a query to the DNS system, causing a series of network communication costs, and the client can only wait before obtaining the IP address, unable to access the website.
The fourth factor is the TCP handshake, which requires three packets: SYN, SYN/ACK, and ACK to establish a connection. The delay is determined by the speed of light and the bandwidth.

After the TCP connection is established, the normal data is sent and received, followed by HTML parsing, JavaScript execution, layout rendering, etc., which also takes some time. However, they are no longer part of HTTP, so they are beyond the scope of today’s discussion.

There is a web site called SSLLabs that was introduced earlier in HTTPS, and there is also a test site called WebPageTest for HTTP performance optimization. It is characterized by the establishment of a lot of test points around the world, you can choose any geographical location, model, operating system and browser to launch the test, very convenient, easy to use.

The end result of the site testing is a visual “Waterfall Chart” that clearly shows the sequence and time consumption of all resources on a page, such as the one shown below on the GitHub front page.

Developer tools in browsers such as Chrome also provide a good look at client latency metrics, with the specific time consumed by each URI on the left side of the panel and a similar waterfall graph on the right.

Clicking on a URI displays a small “waterfall graph” in the Timing page that breaks down the elapsed time of the resource. The reason for the delay is clearly listed, as shown in the following figure:

What do these indicators mean? Let me explain to you:

A Queueing page is Queued for 1.62 seconds before being processed by the browser. A Queueing page is Queueing for a single domain name, while a single domain name is Queueing for a single domain name.
The browser’s ability to pre-allocate resources and dispatch connections cost 11.56 milliseconds to cost cost
The domain name must be resolved before the connection, which takes 0.41 milliseconds (DNS Lookup) because there is a local cache.
The cost of establishing a connection to the web server is high, 270.87 ms in total, 134.89 ms for TLS handshake, so TCP handshake time is 135.98 ms (Initial Connection, SSL);
The actual data was very fast, taking 0.11 milliseconds (Request sent);
Then we waited for the server’s response, known as Time To First Byte (TTFB), which includes the server’s processing Time and network transmission Time. It took 124.2 milliseconds.
Receiving data was also very fast, taking 3.58 milliseconds (Content Dowload).

As you can see from this graph, the latency of an HTTP “request-response” process is staggering, accounting for almost 99% of the total time of 415.04 ms.

So the key to client HTTP performance optimization is to reduce latency.

HTTP transport link

Starting with the basic HTTP request-response model, we’ve just gotten some metrics for HTTP performance optimization. Now, let’s zoom out to the “real world” and look at the transport link between the client and server, which is also critical to HTTP performance.

This is known as the “first kilometre”, “middle kilometre” and “last kilometre”.

“The first kilometer” refers to the exit of the website, that is, the transmission line through which the server accesses the Internet. Its bandwidth directly determines the external service capacity of the website, that is, the throughput and other indicators. Obviously, optimizing performance should increase investment in this “first kilometer”, buy as much bandwidth as possible, and connect to more carrier networks.

The “middle kilometer” is the actual Internet made up of many small networks. In fact, it is far more than “one kilometer”, but a very, very large and complex network. Geographical distance and network connectivity have a serious impact on transmission speed. The good news is that there is an HTTP helper, the CDN, that can help websites cross the “mountains and rivers” and make the distance really look like only “one kilometer”.

The “last kilometer” is the entrance for users to access the Internet. For fixed network users, it is optical fiber and network cable, and for mobile users, it is WiFi and base station. It used to be a major bottleneck for client performance, with high latency and low bandwidth, but with the spread of 4G and high-speed broadband in recent years, the “last mile” situation has gotten much better and is no longer a major performance constraint.

In addition to the “three kilometers”, I personally think there is another “zero kilometer”, which is the Web service system inside the website. It is also a small network (of course, it can be very large), and the processing and transmission of data in the middle can cause delays, increase the response time of the server, and is also a non-negligible optimization point.

In the whole Internet transmission link above, we can’t control the “last kilometer” at the end, so we can only focus on the “zero kilometer”, “first kilometer” and “middle kilometer” to increase bandwidth and reduce latency and optimize transmission speed.

But because we don’t have complete control over the client side, the actual optimization is usually done on the server side. Here can be subdivided into the back end and the front end, the back end refers to the website background services, and the front end is HTML, CSS, pictures and other code and data displayed in the client.

Given the general direction, what should YOU do to optimize HTTP performance?

In general, the optimization of any computer system can be divided into several categories: hardware and software, internal and external, and cost and no cost.

The simplest way to optimize by investing in off-the-shelf hardware, such as changing into a stronger CPU, faster network card, more bandwidth, more servers, the effect will be “immediate”, directly improve the service capacity of the website, also achieved HTTP optimization.

In addition, spending money to buy external software or services is also an effective way of optimization, and the most “value for money” should be CDN. CDN focuses on web content delivery, helps websites solve the “middle mile” problem, and many other very specialized optimization features. Handing over a website to a CDN is like “putting a website on a jet plane”. It can reach users directly and achieve good optimization with little effort.

However, these methods of “spending money” are too “technical” and belong to “lazy people” (no derogatory meaning) practice, so I won’t go into details, next focus on the site, “free” software optimization.

I’ve summed up HTTP performance optimization in three key words: open source, throttle, and cache.

Open source

This “Open Source” is not Open Source, but refers to the “Source”, the development of the website server’s own potential, in the existing conditions unchanged as far as possible to dig out more service capacity.

First of all, we should choose a high performance Web server, the best choice of course is Nginx/OpenResty, try not to choose Java, Python, Ruby based servers, they are better for the later business logic server. Use Nginx’s powerful reverse proxy capability to achieve “static separation”, dynamic pages to Tomcat, Django, Rails, images, style sheets and other static resources to Nginx.

Nginx or OpenResty themselves have many configuration parameters that can be used for further tuning, such as disabling load balancing locks, increasing connection pools, binding CPUS, and more, to name a few.

In particular, long connections must be enabled for HTTP. The cost of establishing a new connection between TCP and SSL is very high and can account for more than half of the total client latency. Long connections do not optimize connection handshakes, but they can be “spread out” over multiple requests so that only the first request has a delay, and subsequent requests have no connection delay, reducing overall latency.

In addition, the new feature “TCP Fast Open” (Windows 10, iOS9, Linux 4.1) has been supported on modern operating systems, which is similar to TLS “False Start”, can transfer data at the first handshake, namely 0-rtt. Therefore, we should enable this feature in the operating system and Nginx as much as possible to reduce the handshake latency between the Internet and Intranet.

Here is a short example of a Nginx configuration that enables long connections and other optimization parameters to achieve static and static separation:

server { listen 80 deferred reuseport backlog=4096 fastopen=1024; keepalive_timeout 60; keepalive_requests 10000; location ~* \.(png)$ { root /var/images/png/; } location ~* \.(php)$ { proxy_pass http://php_back_end; }}Copy the code

The throttle

Throttling refers to reducing the amount of data sent and received between the client and the server, so that more content can be transferred within the limited bandwidth.

The most basic method of throttling is to use the “data compression” encoding built into the HTTP protocol. Not only can you choose the standard GZIP, but you can also actively try the new compression algorithm BR, which has better compression results.

However, when data compression, you should pay attention to the appropriate compression ratio, do not pursue the highest compression ratio, otherwise it will consume the server’s computing resources, increase the response time, reduce the service capacity, but the “gain is not worth the loss”.

Gzip and BR are general compression algorithms, and we can also use special compression methods for various types of data transmitted through HTTP.

HTML/CSS/JS is plain text, so you can use special “compression” to remove unnecessary white space, line breaks, comments and other elements in the source code. This “compressed” text may look cluttered and unfriendly to humans, but the computer can still read it without any problem, without affecting performance on the browser.

Images account for a very high percentage of HTTP transport, and although they are already compressed and cannot be processed by GZIP or BR, there is still room for optimization. For example, remove metadata such as when, where and what type of image was taken, and reduce the resolution and size appropriately. The format of the image is also very important. Try to choose a high compression format. Lossy format should be JPEG and lossless format should be Webp.

For small text or small pictures, there is another optimization method called “Concatenation”, which is to combine many small resources into one large resource, download it to the client with a request, and then the client uses JS and CSS to slice and use it. The advantage is to save the number of requests, but the disadvantage is that it is more troublesome to deal with.

In HTTP/1, there is no way to compress the header, but we can also take some measures to reduce the size of the header, so as not to send unnecessary fields (such as Server, X-powered-by).

Websites often use cookies to record user data. Browsers will carry cookies every time they visit websites, which is highly redundant. Therefore, cookies should be used less, reduce the amount of data recorded by cookies, always use domain and PATH attributes to limit the scope of cookies, and reduce the transmission of cookies as much as possible. If the client is a modern browser, you can also use the Web Local Storage defined in HTML5 to avoid using cookies.

In addition to compression, “throttling” has two optimization points, namely domain name and redirection.

DNS domain name will take some time, if you have more than one domain name, domain name resolution to obtain an IP address is a not small cost, therefore should be appropriate “shrink” domain name, limit in about two or three, reduced the time needed for parsing full domain name, let the client as soon as possible to obtain analytical results from the system cache.

The client latency caused by redirection is also very high, which not only adds a round-trip request, but also may cause DNS resolution of the new domain name, which is a big no-no for HTTP front-end performance optimization. You should avoid redirects unless necessary, or use “internal redirects” from the Web server.

The cache

“Caching” is not only HTTP, but also the magic weapon of any computer system performance optimization. When it is combined with the above “open source” and “throttling” applied to transport links, HTTP performance can be improved to a new level.

In the “zero kilometer”, that is, the website system can use Memcache, Redis, Varnish and other special cache services to store the intermediate results and resources in memory or hard disk. The Web server first checks the cache system and immediately returns the data to the client if it has any. It saves time for accessing background services.

In the “middle mile”, caching is an important means of performance optimization, and the network acceleration function of CDN is based on caching. It can be said that if there is no caching, there will be no CDN.

The key to taking advantage of caching is to understand how it works, add ETag and Last-Modified fields to each resource, and set cache-control and Expires attributes.

The most basic of these is the max-age validity period, which marks how long a resource can be cached. For static resources such as images and CSS, a long time can be set, such as one day or one month. For dynamic resources, a short time can be set, such as one second or five seconds, unless the real-time performance is very high.

In this way, once the resource reaches the client, it will be cached and no more requests will be sent to the server during the validity period, i.e., “A request without a request is the fastest request.”

HTTP/2

An alternative to the “open source”, “throttling”, and “caching” strategies for HTTP performance optimization is to upgrade the protocol from HTTP/1 to HTTP/2.

Through “Flying”, you already know many advantages of HTTP/2, it eliminates the application layer queue blocking, header compression, binary frame, multiplexing, flow control, server push and many other new features, greatly improve the efficiency of HTTP transmission.

These features play on both “open source” and “throttling”, but because they are already built into the protocol, websites can get an immediate performance boost by simply switching to HTTP/2.

Be aware, however, that some optimizations made in HTTP/1 can backfire in HTTP/2.

For HTTP/2, one domain name uses one TCP connection to achieve the best performance. If multiple domain names are opened, bandwidth and server resources are wasted and HTTP/2 efficiency is reduced. Therefore, domain name shrinkage is a must in HTTP/2.

“Resource merge” reduces the cost of multiple requests in HTTP/1, but in HTTP/2 the cost of transferring small files is low because of header compression and multiplexing, so merge makes no sense. Another disadvantage of “resource merge” is that it reduces the availability of the cache, and as soon as a small file is updated, the entire cache is completely invalidated and must be re-downloaded.

Therefore, in the current scenario of large bandwidth and CDN application, resource merging (JS, CSS image merging, data embedding) should be minimized to make the granularity of resources as small as possible, so as to better play the role of cache.

summary

Performance optimization is a complex concept, which can be divided into server performance optimization, client performance optimization and transmission link optimization in HTTP.
A server has three main performance metrics: throughput, concurrency, and response time, plus resource utilization.
The basic performance indicator of the client is latency, which is affected by geographical distance, bandwidth, DNS query, and TCP handshake.
The transmission link from the server to the client can be divided into three parts, and we can optimize the first two parts, namely the “first kilometer” and “middle kilometer”;
There are many tools to measure these indicators, such as AB, TOP, SAR, etc., on the server side, the client side can use the developer tools for testing websites and browsers.
Spending money to buy hardware, software or services can directly improve the service capability of the website, among which the most valuable is CDN;
You can optimize HTTP without spending money. The three key words are “open source”, “throttle” and “cache”.
Use a high-performance Web server on the back-end to enable long connections to improve TCP transmission efficiency.
Front-end should enable GZIP, BR compression, reduce the volume of text, picture, as little as possible to pass unnecessary header fields;
Caching is a performance optimization tool that should never be forgotten, and resources should always be marked with Etag or Last-Modified fields;
Upgrading to HTTP/2 provides many immediate performance gains, but be aware of some HTTP/1 “anti-patterns.”

series

HTTP overview
TCP three-way handshake and four-way Wave (Finite State machine)
From the time you type in the url to the time you see the page — explain what happens in between
HTTPS (Detailed Version)
Rambling on about HTTP connections
Rambling on HTTP performance tuning
Introduction to HTTP packet format
Easy to understand: HTTP/2

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

The HTTP server

The HTTP client

HTTP transport link

Open source

The throttle

The cache

HTTP/2

summary

series

HTTP overview

TCP three-way handshake and four-way Wave (Finite State machine)

From the time you type in the url to the time you see the page — explain what happens in between

HTTPS (Detailed Version)

Rambling on about HTTP connections