Learn HTTP/2 from theory to practice

preface

To reduce load times, most of us have tried the following

Keep-alive: indicates the TCP persistent connection, which increases the reusability of TCP connections. However, the client can send the next request only after the previous request or response is completed
Pipelining: Sends multiple requests at the same time, but the server must respond in exactly the same order as the first one. If the first response is delayed, all subsequent responses are blocked
Request merge: Sprite, CSS/JS inlining, CSS/JS merge, etc. However, request merge will bring cache invalidation, slow parsing, blocking rendering, barrel effect and many other problems
Domain name hash: Bypasses the limit of 6 TCP for the same domain name, but increases DNS overhead and TCP overhead, which also significantly reduces cache utilization
…

Admittedly, these optimizations reduce site load times to some extent, but they are just the tip of the iceberg when it comes to the sheer volume of requests for a Web application.

The above problem boils down to the HTTP1.1 protocol itself, if you want to fundamentally solve the inefficiency of HTTP1.1, only from the protocol itself. To this end, Google developed the SPDY protocol, mainly to reduce the transmission time; Based on the SPDY protocol, THE IETF and the SPDY group jointly developed HTTP/2, which was officially published as RFC 7504 in May 2015. SPDY or HTTP/2 is not an entirely new protocol. It simply modifs the way HTTP requests and replies travel over the network, adding an SPDY transport layer that processes, flags, simplifies, and compresses HTTP requests, so they don’t break the work of existing programs. Faster speeds can be achieved with new features, and smooth degradation can be achieved for unsupported scenarios.

HTTP/2 inherits many of spDY’s best features, such as multiplexing and prioritization, with additional improvements. One notable improvement is that HTTP/2 uses a customized compression algorithm instead of SPDY’s dynamic stream compression algorithm to avoid Oracle attacks on the protocol.

Most major browsers supported the standard (highlighted) by the end of 2015. Specific support is as follows:

The data source

It can be seen that 58.55% of domestic browsers fully support HTTP/2, while the global support rate is as high as 85.66%. Such a high level of support, so, are you tempted

why HTTP/2

Binary format transfer

We know that HTTP/1.1 headers are always text (ASCII encoded), and that the data body can be either text or binary (you need to do your own extra conversion, the protocol itself does not convert). In HTTP/2, the binary framing layer is added to convert data to binary, which means that all content in HTTP/2 is transmitted in binary.

Are there any benefits to using binary? Of course! This is more efficient and, most importantly, allows you to define additional frames that would be cumbersome to parse if they were transmitted in text. HTTP/2 defines ten types of frames, including data frame, header frame, PING frame, SETTING frame, priority frame and PUSH_PROMISE frame, which lay a foundation for advanced applications in the future.

Binary Framing is the new Binary Framing layer.

multiplexing

The binary frame splitting layer converts the data into binary and divides the data into frames. Frames are the smallest unit of data transfer in HTTP/2; Each frame has a stream_ID field, indicating which stream the frame belongs to, and the receiver combines all frames with the same stream_ID to transmit the content. Stream is a logical concept in HTTP/2. It represents a request or a response in HTTP/1.1. The protocol specifies that the stream_ID of the stream sent from client to server is odd, and the stream ID of the stream sent from server to client is even. It is important to note that flow is a logical concept that is easy to understand and remember and does not actually exist.

With an understanding of frames and streams, complete HTTP/2 communication can be represented graphically as follows:

You can see that within a TCP connection, frames can be sent both ways at the same time, and frames in different streams can be sent interleaved without waiting for one stream to finish sending before sending the next. In other words, a TCP connection can transmit multiple streams at the same time, that is, multiple HTTP requests and responses at the same time. This kind of simultaneous transmission does not comply with first-in, first-out rules, and therefore does not cause blocking, which is highly efficient.

In this mode of transport, HTTP requests become cheap and we no longer have to worry about whether the site has too many HTTP requests, too many TCP connections, or whether it will block.

HPACK header compression

Why compress?

In HTTP/1, HTTP request and response are composed of three parts: a status line, a request/response header, and a message body. In general, the body of the message is either gzip compressed or itself transmitted as a compressed binary (such as image, audio), but the status line and header are not compressed and are transmitted as plain text.

As Web functions become more complex, the number of requests per page is increasing. According to HTTP Archive, the average page currently generates hundreds of requests. More and more requests result in more and more traffic being consumed in the header, especially if the content such as UserAgent, Cookie, etc. does not change frequently is wasted every time.

To reduce the cost of redundant header information, HTTP/2 uses the HPACK algorithm to compress request and response headers. The following diagram illustrates the principle of HPACK header compression very intuitively:

Velocity 2015 • SC Conference

The specific rules can be described as follows:

The communication parties jointly maintain a static table containing common combinations of header names and values
Maintain a dynamic table with dynamically added content on a first-in, first-out basis
Haffman encoded data based on the static Haffman code table

When a request is sent, its header is first compared with the static table. For a perfectly matched key-value pair, a number can be used directly, such as 2: method: GET in the figure above. For a key-value pair with a matching header name, the name can be transmitted by a number, such as 19: path in the figure above. /resource, and tells the server to add it to the dynamic table so that the same key/value pairs are represented by a number. In this way, values like cookies, which don’t change very often, are sent only once.

server push

Before we start the HTTP/2 Server push, let’s take a look at how an HTTP/1.1 page loads.


      
<html>
<head>
  <link rel="stylesheet" href="style.css">
  <script src="user.js"></script>
</head>
<body>
  <h1>hello http2</h1>
</body>
</html>
Copy the code

The browser makes a request to the server/user.html
The server processes the request/user.htmlSend to browser
The browser parses the received/user.htmlFind that you still need to request/user.jsandstyle.cssStatic resource
Send two separate requests to obtain/user.jsandstyle.css
The server sends resources in response to both requests
The browser receives the resource and renders the page

At this point, the page is loaded and can be seen by the user. You can see that in steps 3 and 4, the server was idle and waiting, and the browser didn’t get the resource render page until step 6, which made the first load of the page slow.

HTTP/2’s Server push allows the server to push resources to the browser without receiving a request. After sending /user. HTML, the server can send /user.js and style.csspush to the browser, so that the resources reach the browser in advance. This is the only feature in HTTP/2 that requires developer configuration. Everything else is done automatically by the server and browser, without developer intervention.

In the HTTP1.1 era, there are methods to obtain resources in advance, such as preload and prefetch, the former is in the early stage of page parsing to tell the browser, the resource is immediately used, can immediately send a request to the resource, when the need to use the resource can be directly used without waiting for the return of the request and response; The latter is a resource that is not used on the current page but might be used on the next page. It has a lower priority and only requests the resource for the Prefetch tag when the browser is idle. From an application level, Preload and Server Push are no different, but Server Push is slightly better than PreLoad in reducing browser request time, and in some scenarios, you can use both together.

In actual combat

Paper war sleep shallow, to practice it! Build their own HTTP/2 demo, and capture packet verification.

The spdy library implements HTTP/2 and also provides support for Express, so HERE I use SPdy + Express to build the demo. The demo source code

Path description:

- Ca/Certificate and key files -src / -img / -js / -page1.html-server.jsCopy the code

HTTPS key and certificate

Although HTTP/2 is available in both encrypted (H2) and unencrypted (H2C) forms, most major browsers only support H2 – encrypted connections based on TLS/1.2 or above, so before setting up the demo, we will first issue a certificate so that we can use HTTPS for browser access. You can search for certificate issuing methods yourself, or follow these steps to generate one

First install open-SSL, and then execute the following command

$ openssl genrsa -des3 -passout pass:x -out server.pass.key 2048.$ openssl rsa -passin pass:x -in server.pass.key -out server.key
writing RSA key
$ rm server.pass.key

$ openssl x509 -req -sha256 -days 365 -in server.csr -signkey server.key -out server.crt.$ openssl x509 -req -sha256 -days 365 -in server.csr -signkey server.key -out server.crt
Copy the code

You will then get three files server.crt, server.csr, and server.key. Copy them to the CA folder and use them later.

Setting up the HTTP/2 service

Express is a Node.js framework. Here we use it to declare the route/and return HTML file pag1.html that references static resources such as JS and images.

// server.js
const http2 = require('spdy')
const express = require('express')
const app = express()
const publicPath = 'src'

app.use(express.static(publicPath))

app.get('/'.function (req, res) {
    res.setHeader('Content-Type'.'text/html')
    res.sendFile(__dirname + '/src/page1.html')})var options = {
    key: fs.readFileSync('./ca/server.key'),
    cert: fs.readFileSync('./ca/server.crt')
}
http2.createServer(options, app).listen(8080, () = > {console.log('Server is listening on https://127.0.0.1:8080.)})Copy the code

Visit https://127.0.0.1:8080/ with a browser and open the console to see all the requests and their waterfall diagrams:

As you can clearly see, the browser does not start making requests for static resources such as JS and images until the first request, the document request, is fully returned and parsed. As mentioned earlier, server push allows the server to actively push resources to the browser. Is it possible to push js and IMG to the browser before the first request is completed? This not only takes full advantage of HTTP/2 multiplexing, but also reduces the server’s idle wait time.

Transform the routing processing function:

app.get('/'.function (req, res) {
+   push('/img/yunxin1.png', res, 'image/png')
+   push('/img/yunxin2.png', res, 'image/png')
+   push('/js/log3.js', res, 'application/javascript')
    res.setHeader('Content-Type'.'text/html')
    res.sendFile(__dirname + '/src/page1.html')})function push (reqPath, target, type) {
    let content = fs.readFileSync(path.join(__dirname, publicPath, reqPath))
    let stream = target.push(reqPath, {
        status: 200.method: 'GET'.request: { accept: '* / *' },
        response: {
            'content-type': type
        }
    })
    stream.on('error'.function() {})
    stream.end(content)
}
Copy the code

Take a look at the waterfall diagram with Server Push:

Obviously, static resources that are pushed can be used quickly, while resources that are not pushed, such as log1.js and log2.js, take a longer time to be used.

The browser console is limited in what it can see, so let’s try something more interesting

The Wireshark verifies packet capture

Wireshark is a packet capture tool that can identify HTTP/2 packets. The Wireshark reads and analyzes NETWORK adapter data to verify whether HTTP/2 and underlying communication principles are implemented.

To install wireshark, go to the wireshark website and download the installation package.

We know that HTTP/2 requests and responses are broken up into frames. If we were to grab HTTP/2 packets, we would only be able to capture the data frame by frame, like this:

As you can see, all captured packets are of the TCP type (red box); Look at the contents of the first three packets (the green boxes), which are SYN, [SYN, ACK], and ACK, known as the TCP three-way handshake. The yellow box in the lower right corner is the total number of TCP packets caught after requesting the current page. There are only seven or eight requests on the page, but the number of packets caught is 334, which verifies that HTTP/2 requests and responses are indeed broken up into frames.

HTTP/2: HTTP/2: HTTP/2: HTTP/2: HTTP/2: HTTP/2: HTTP/2: HTTP/2: HTTP/2 Wireshark will automatically regroup frames that have the same stream_ID to see the actual requests and responses, but because we are using HTTPS, all data is encrypted and wireshark does not know how to regroup.

There are two methods to decrypt HTTPS traffic in wireshark: First, you can use the encrypted private key to decrypt HTTPS traffic. 2) Some browsers support storing symmetric keys used in TLS sessions in external files for Wireshark to decrypt.

However, HTTP/2 does not allow RAS key exchange for forward security, so we cannot use the first method to decrypt HTTP/2 traffic. If SSLKEYFILELOG exists in the system environment variable, Chrome and Firefox will save the symmetric key to the file pointed to by the environment variable, and import the file to wireshark to decrypt HTTP/2 traffic.

Create the ssl.log file
Add system environment variablesSSLKEYFILELOGPoints to the file created in the first step
In the Wireshark, open Preferences ->Protocols, locate SSL, and select the file created in the first step under (Pre) -master-secret log filename in the configuration panel

If you use Chrome or Firefox to access any HTTPS page, you should have the key data written to ssl.log.

Once decrypted, we can see the HTTP/2 package

The following image shows the package captured on the demo’s home page, so you can clearly see what HTTP/2 requests are.

HTTP/2 streams and streams can be interlaced over a TCP connection, and all communication with the server can be completed by establishing a TCP connection.

The Wireshark displays information about the current packet, such as the size, source IP address, destination IP address, port, data, and Protocol. The Wireshark displays a Stream index under Transmission Control Protocol, as shown in the following figure. Represents the TCP connection from which the current packet is transferred. If you look at the packets generated by the demo page request, you can see that they all have the same stream index, indicating that these HTTP/2 requests and responses are being transmitted in a single TCP connection, and so many streams are indeed multiplexing a SINGLE TCP connection.

In addition to multiplexing, we can also observe HTTP/2 header compression by capturing packets. The following figure shows the first request in the current route. The actual transmitted header contains 253bytes, and the decompressed header contains 482bytes. The compressed size is reduced by almost half

But this is only the first request, let’s look at the later request, such as the third, the actual size of the transfer header is only 30bytes, and the decompressed size is 441byte, the compressed size is only 1/14 of the original size! In a world where web applications can expect hundreds of requests for a single page, HPACK can save a lot of traffic.

conclusion

At the beginning of the article, we enumerated the dilemmas of the HTTP1.x era, introduced and briefly explained the origins of HTTP/2; HTTP/2 is the most popular HTTP/2. HTTP/2 is the most popular HTTP/2. In the last part of the article, it introduces how to build an HTTP/2 example step by step, and capture the packet observation, and verify HTTP/2 multiplexing, header compression and other characteristics. Finally, are you attracted to these efficiency features? Give it a try

Reference:

HTTP/2 Wikipedia
w3c-preload
HTTP/2 Server Push with Node.js
Use Wireshark to debug HTTP/2 traffic
Optimize Your App with HTTP/2 Server Push Using Node and Express
Web Performance optimization with HTTP/2

For more technical dry goods and industry insights, please pay attention to netease Yunxin blog.

Learn about netease Yunxin. , communication and video cloud services from the core architecture of netease.

NeteaseYunXin is a PaaS service product combining 18 years of NETEASE’S IM and audio and video technologies. It is a communication and video cloud service based on netease’s core technology architecture, which is stable, easy to use and has comprehensive functions. It is committed to providing the world’s leading technical capabilities and scene-oriented solutions. By integrating the client SDK and OPEN API, developers can quickly realize functions including IM, audio and video calls, live broadcast, on-demand, interactive whiteboard, and SMS.