From HTTP1. * to HTTP2.0

What is the HTTP

HTTP: Hyper Text Transfer Protocol (HTTP) : an application-layer Protocol used to Transfer hypermedia documents.

HTTP role

Transport hypermedia (term extension of hypertext, non-linear messaging media including pictures, audio, video, text, and hyperlinks) HTTP is the basis for data communication on the World Wide Web. HTTP was originally designed to provide a way to publish and receive HTML pages, using URIs to identify unique resources

Hypertext (English: Hypertext) is a kind of text that can be displayed on a computer monitor or electronic device. Currently, Hypertext generally exists in the form of electronic documents. The text contains hyperlinks that can be linked to other fields or documents, allowing direct switching from the current reading position to the text that the hyperlink points to

HTTP features

  • Client-server model: the client initiates a Request and the server returns a Response for data interaction
  • Text protocol
  • Stateless protocol: HTTP itself does not save communication state
  • Connectionless protocol: HTTP communication requires TCP to be connected/disconnected once

When HTTP was first created, it was primarily used to transmit text, possibly with hyperlinks. Back then, the content was far less rich, the typography was far less elegant, and the interactions were far less complex, and HTTP was fine for such simple scenarios

However, with the rapid development of the Internet, the rise of WEB2.0, more rich content, more exquisite typesetting, more complex interaction and so on. As a result, the size of any site is much larger than the original site, not to mention social and e-commerce sites. The number of requests, request files can cause a problem, that is load speed. There are two main factors affecting network requests, bandwidth and latency. We will analyze HTTP latency under constant bandwidth

HTTP delay analysis

Text protocol

HTTP1.* is a text protocol. The text protocol has the advantages of good readability and scalability. However, compared with binary protocol, text protocol wastes bandwidth and has lower transmission efficiency. So how can HTTP2 improve performance without changing the semantics of HTTP1.*

Binary frame division: Adds the binary frame division layer between HTTP and TCP


HTTP1.* preserves its semantics, but the encoding changes during transmission. HTTP1.* uses newlines as plain text segmentation, while HTTP2 splits all transmitted information into Message and Frame and encodes it in binary format

HTTP1.* Headers => HTTP2 Headers Frame
HTTP1.* Body Entity => HTTP2 Data Frame
Copy the code

  • Stream: an HTTP2 connected virtual channel with bidirectional hosted transport.

    • Multiple streams can be created for each connection, and each Stream has a unique identifier. To prevent client and server identifiers from conflicting, odd ids are used for client and even IDS for server. Stream identifier zero (0x0) is used for connection control messages, and Stream identifier 0 cannot be used to create new streams
    • Each Stream can be assigned an integer between 1 and 256 to control Stream priority
    • There can be explicit dependencies between each Stream and other streams

    The combination of data flow dependencies and weights allows a client to build and pass a “priority tree” indicating how it prefers to receive responses Google – Data flow priority

  • Message: Represents a complete request and response, consisting of one or more frames

  • Frame: Basic unit for HTTP2 transmission. Frames are either Headers frames or Data frames

HTTP1.* Request and Frame conversion


Connection cannot be reused

Because HTTP cannot reuse connections, each HTTP request initiates a TCP connection. When the HTTP request ends, the TCP connection is disconnected because of the following reasons

  1. Since the TCP three-way handshake takes an additional 1RTT time, it increases latency
  2. Because of the Slow Start in TCP’s congestion mechanism, new connections always take a while to transfer efficiently

Round Trip Time (RTT) : indicates the Round Trip Time. It is the interval between the sender sending data and receiving confirmation from the server. You can use ping and tcpping to test

Keep-Alive

So introduce the keep-alive header in HTTP1.*

Connection: keep-alive # HTTP1.1 enabled by default, HTTP1.0 needs to display the setting header keep-alive: timeout=10 #Copy the code

Connection: close # Indicates that the long Connection is closed or the wait times out

Note that Connection is a hop-by-hop header and is only valid for single forwarding. If a cache server or proxy server encounters a Connection, the Connection will not be forwarded

There are also disadvantages to long connections

  1. Even in idle state, it consumes server resources
  2. There is also the possibility of DoS attacks when the server is heavily loaded. In this scenario, non-long connections can be used to close those idle connections as soon as possible to improve performance

Other ways to reuse connections

Not just keep-alive headers, there are other ways to make long joins (pseudo-long joins)

  1. HTTP polling, HTTP long polling
  2. HTTP Stream
  3. WebSocket

Team head block

By default, HTTP requests are made sequentially, with the next request waiting for a response from the current request. That’s a huge delay

Pipelining

HTTP1.1 introduced Pipelining, which allows clients to send multiple requests simultaneously without waiting for a server response.


However, it also has the following disadvantages

  1. The server may not support it
  2. Proxy servers may not support it
  3. Although the requests can be sent simultaneously, the returns are sequential, note thatResponse is not returned first and processed first, but according to the FIFO principleWhich may lead to new problemsFront of queue blocking

Chromium HTTP Pipelining Enable HTTP Pipelining by default

Domain name subdivision

Requests in HTTP1.* are serialized, even if they were originally unordered. The browser establishes multiple connections for each domain name to enable concurrent requests. Once the default number of connections was 2 or 3, the more common number of concurrent connections has been increased to 6.

I use FireFox and Chrome a lot. FireFox can use about:config to change the number of concurrent connections, but Chrome doesn’t

To increase concurrency, different domain names can be used for transmission to improve performance, but it has the following disadvantages

  1. Increased resource consumption in exchange for user waiting time
  2. You can reduce the requested data, static resource servers don’t need cookies, and split these resources out to reduce the transfer size

Domain sharding can increase concurrency, but it is not unlimited, for the following reasons

  1. Resource consumption
  2. Each additional domain name incurs a delay of “DNS resolution + three-way handshake + slow start “, so you need to find a balance. Tips can be usedDNS - prefetch, preconnectTo optimize

Multiplexing

While HTTP1.1 provides Pipelining, it does not completely solve header congestion and browsers are either turned off by default or disabled due to their shortcomings. So HTTP2.0 implements Multiplexing with a new pipeline


Multiplexing allows multiple request-response (Message) requests to a domain name in an HTTP2 request. Each Message is split into multiple frames. Frames can be transmitted out of order and eventually splicing frames into messages on the client side. Note that out-of-order transmission refers to different requests, that is, frames of different messages can be interspersed with each other, but FIFO is still applied to frames of the same Message

Multiplexing has the following characteristics:

  1. Multiple Streams can be hosted within a TCP connection
  2. Multiple requests and responses are interleaved in parallel, which means that one connection is used to send multiple requests and responses in parallel
  3. You don’t have to do a lot of work to get around HTTP/1.x restrictions, for exampleDomain name sharding and keep-alive
  4. Reduce page load times by eliminating unnecessary latency and increasing utilization of existing network capacity

HTTP2.0 connections are persistent and there is only one connection per domain name. In HTTPS, TLS/SSL overhead is reduced, session reuse rate is increased, and resource consumption on the server and client is reduced

Multiplexing solves the problem of header and pre-queue blocking in HTTP1.*, but it creates a new problem that is not HTTP’s problem

HTTP2.0 addresses HTTP blocking. It does address HTTP latency, but because TCP queues block and HTTP2 is just a single TCP connection, HTTP2 performance gets worse and worse as packet loss rates increase, around 2% (a very poor network quality). Test results show better performance for HTTP/1.* users. Because HTTP/1.* usually has multiple TCP connections, even if one of them is blocked, other connections that do not lose packets can continue to transmit TCP queue headers blocked

The head of overhead

Each HTTP transport carries a set of headers that describe the transferred resource and its properties. In HTTP/1.*, this metadata is always in plain text, typically adding 500-800 bytes of overhead per transfer. If HTTP cookies are used, the added overhead can sometimes reach thousands of bytes (see Measuring and Controlling protocol overhead). To reduce this overhead and improve performance, HTTP/2 uses the HPACK compression format to compress request and response header metadata

HTTP2 header compression HPACK

Using the HPACK algorithm, compress the header

  1. Support through staticHoffman codeEncode the transport header field to reduce transport. Of course, if the Huffman encoding is larger than the original data, then hofman encoding will not be used
  2. Require both the client and the server to maintain and update a header containing the header fields you’ve seen beforeThe index lists
    • For data in an existing indexed table, the index is sent
    • Data that does not exist in the index table is cached and only the index is sent the next time
    • Duplicate data, or unchanged data, is not sent

Note: In HTTP/2, the definition of the request and response header fields remains the same, with minor differences: all header field names are lowercase, and the request line is now split into: Method, : Scheme, : Authority, and: PATH pseudo-header fields

The index table

Index tables are classified into static index tables and dynamic index tables

Static indexed table

A static index table is a predefined set of Header fields that are ordered and read-only

Index Header Name Header Value
1 :authority
2 :method GET
3 :method POST
4 :path /
5 :path /index.html
.

To see all static table definitions, click static.table.definition

Dynamic index table

Dynamic is maintained on a first-in, first-out (queue) basis. The dynamic table is initially empty, and new entries are added as each header block is decompressed. Dynamic tables are always inserted from the lowest index, that is, the newest dynamic table is at the lowest index, and the oldest dynamic table is at the highest index. Dynamic table entries can contain duplicate pairs

Take a chestnut

I need to transfer the following header (left table), after static index table compression, to transfer the right table


Assuming that origin: https://… If the index in the dynamic table is 65, send origin: https://… Only index = 65 is transmitted

HTTP3.0 profile

Because of the TCP latency introduced in HTTP2.0 multiplexing, Google has created a QUIC protocol based on UDP, and HTTP3.0 is based on QUIC

0 RTT

HTTP2 delay

  • Creating a connection: 1RTT(TCP) + 2RTT(SSL) delay
  • Session reuse: 1RTT(TCP) + 1RTT(SSL) delay

QUIC protocol can realize 0RTT for data transmission

multiplexing

QUIC implements the multiplexing of HTTP2. A single transmission stream does not affect other data streams

Encrypted authentication message

The TCP protocol is not encrypted or authenticated. It may be monitored or tampered with during transmission for performance optimization or active attacks. However, except for a few packets, such as PUBLIC_RESET and CHLO, all headers of QUIC packets are authenticated, and the Body of QUIC packets is encrypted

Forward error correction mechanism

In addition to its own data, QUIC datagrams contain some data from other packets, so in the case of small packet loss, they can be assembled from redundant data without retransmission. Forward correction increases datagram size but reduces retransmission

Some common HTTP problems

  1. Does POST send two TCP packets

    • The HTTP protocol does not specify that POST produces two TCP packets
    • To capture packets using WireShark, perform the following operations
// Start node server

const http = require('http');



http

  .createServer(function (req, res{

    res.end('Hello word');

  })

  .listen(9001);

Copy the code
/ / curl request

curl -X POST http://localhost:9001

Copy the code

As shown in the screenshot, WireShark sends only one TCP packet

  1. Can’t a GET request carry a Body Entity

    HTTP does not state that a GET method cannot carry a Body Entity. Here’s the test

// Start node server

const http = require('http');



http

  .createServer(function (req, res{

    req.on('data', (v) => {

      console.log(v.toString()); / / print the body

    });

    res.end('Hello word');

  })

  .listen(9001);

Copy the code
/ / curl request

curl -X GET -d Test http://localhost:9001

Copy the code

Node server prints and WireShark screenshots prove that GET requests can carry bodies

  1. Do GET requests have a limit on URL length

    HTTP does not limit the length of GET request urls, but browsers do

  2. Is POST more secure than GET

    In terms of transport, HTTP is transmitted naked, so POST is not more secure than GET. But GET can be cached and stored in browser history, and POST can’t, so in some ways POST is more secure than GET

  3. Text protocol and binary protocol

    Protocol and binary text is one of the biggest difference between coding, whether text or binary format, storage, transfer, affirmation is a binary format, but binary and no special significance, for example, 1000001 (65) a string of Numbers, no special meaning, know also useless, but if you use ASCII code, So this is useful. This is A

    HTTP1.* uses THE ASCII encoding (request line, response line, and header); Entities can accept any Encoding specified by Transfer-Encoding and Content-Encoding

HPACK: The Header Compression will for HTTP / 2 Hypertext Transfer Protocol Version 2, HTTP / 2 head Compression algorithm — HPACK HPACK fully http3 – explained http2-explained