(Intensive reading recommended) HTTP Soul question, strengthen your knowledge of HTTP

Last time, I promised you that THERE would be a series of HTTP articles, and today I have finally finished. As a Web developer, HTTP is something you deal with on an almost daily basis, but I find that most people just scratch the surface of IT and don’t understand much more about the details and principles, which makes interviewing difficult. The purpose of this article is to help you build up your complete knowledge of HTTP and reach the depth you need to be able to answer the questions of the soul and improve your professional skills as a Web developer. Here’s the mind map for this article:

001. What is the HTTP message structure?

For TCP, transmission is divided into two parts :TCP header and data.

HTTP is similar to the header + body structure, specifically:

Start line + header + blank line + entityCopy the code

There are some differences between HTTP request packets and response packets. Therefore, we introduce them separately.

The starting line

For request packets, the start line looks like the following:

GET /home HTTP/1.1
Copy the code

Method + path + HTTP version.

For a response message, the starting line usually looks like this:

HTTP/1.1 200 OK
Copy the code

The start line of the response message is also called the status line. It consists of the HTTP version, status code, and cause.

It is important to note that in the opening line, every two sections are separated by a space, and the last section should be followed by a newline, strictly following the ABNF syntax.

The head

Show the position of the request header and response header in the packet:

There are quite a few fields in both the request header and the response header, and they are involved in many HTTP features. I will not list them here, but focus on the format of these header fields:

1. Field names are case insensitive
1. Field names are not allowed to contain Spaces and cannot contain underscores_
1. The field name must be followed by:

A blank line

It’s important to distinguish the head from the entity.

Q: What if I deliberately put a blank line in the middle of the head?

Everything after a blank line is treated as an entity.

entity

That’s the specific data, the body. The request packet corresponds to the request body, and the response packet corresponds to the response body.

How to understand the HTTP request method?

What are the request methods?

HTTP /1.1 specifies the following request methods (note that they are all uppercase):

GET: Usually used to obtain resources
HEAD: Obtain the meta information of the resource
POST: Submit data, that is, upload data
PUT: modifies data
DELETE: DELETE resources (rarely used)
CONNECT: Establishes a connection tunnel for the proxy server
OPTIONS: Lists the request methods that can be applied to resources for cross-domain requests
TRACE: TRACE the transmission path of the request-response

What’s the difference between GET and POST?

The first and most intuitive difference is semantic.

Then there are these specific differences:

From a caching perspective, GET requests are actively cached by the browser, leaving a history, while POST is not by default.
From an encoding point of view, GET can only encode URLS and can only accept ASCII characters, while POST has no restrictions.
From a parameter perspective, GET is generally placed in the URL and therefore insecure, while POST is placed in the body of the request and is more suitable for transmitting sensitive information.
fromidempotenceThe point of view,GETisPower etc., andPOSTIt isn’t. (Power etc.Indicates that the same operation is performed and the result is the same.)
From the TCP perspective, a GET request sends the request packet at once, while a POST packet is divided into two TCP packets, sending the header first and then the body if the server responds with 100(continue). (Except in Firefox, where POST requests only send a TCP packet)

003: How do I understand urIs?

The Uniform Resource Identifier (URI), or Uniform Resource Identifier (URI), has the simple function of distinguishing between different resources on the Internet.

However, it is not a URL. A URL is a URL. In fact, a URI contains both a URN and a URL.

The structure of the URI

The real, most complete structure of a URI is this.

Maybe you will have questions, as if it is not the same as usually seen ah! Hold on. Let’s break them down.

Scheme represents the protocol name, such as HTTP, HTTPS, file, and so on. It must be followed by ://.

User :passwd@ indicates the user information used to log in to the host. However, it is not recommended and is not commonly used because it is insecure.

Host :port Indicates the host name and port.

Path indicates the request path, marking the location of the resource.

Query Query parameters in the form of key=val. Multiple key pairs are separated by ampersand (&).

Fragment indicates an anchor point in the resource located by the URI. The browser can jump to the corresponding location based on this anchor point.

Here’s an example:

https://www.baidu.com/s?wd=HTTP&rsv_spt=1
Copy the code

In this URI, HTTPS is the Scheme part, www.baidu.com is the host:port part (note that the default ports for HTTP and HTTPS are 80 and 443, respectively), /s is the path part, and wd=HTTP&rsv_spt=1 is the Query part.

URI encoding

Uris can only be used in ASCII; characters outside OF ASCII are not supported for display, and some symbols are delimiters that cause parsing errors if left untreated.

Thus, URIs introduce an encoding mechanism that converts all non-ASCII characters and delimiters into hexadecimal byte values preceded by a %.

For example, Spaces are escaped to %20 and ternary is escaped to %E4%B8%89%E5%85%83.

004: How do I understand the HTTP status code?

RFC specifies that HTTP status codes are three digits, which are divided into five categories:

1xx: indicates that the protocol processing is in the intermediate state and further operations are required.
2xx: indicates the success.
3xx: indicates the redirection status. The location of the resource has changed and a new request needs to be made.
4xx: The request packet is incorrect.
5xx: An error occurs on the server.

Let’s analyze the specific status codes one by one.

1xx

101 Switching separate Protocols. When HTTP is upgraded to WebSocket, if the server agrees to the change, it will send status code 101.

2xx

200 OK is the most commonly seen success status code. Data is usually placed in the response body.

204 No Content Is the same as 200, but there is No body data after the response header.

206 Partial Content, as the name implies, is part of the Content. It can be used for HTTP block downloads and breakpoint continuation, with the corresponding response header content-range.

3xx

301 Moved Permanently, which corresponds to 302 Found, which is a temporary redirection.

For example, if your website is upgraded from HTTP to HTTPS, the previous site is no longer used, should return 301, this time the browser will do the default cache optimization, on the second visit to automatically access the redirected address.

If it is temporarily unavailable, simply return 302. Unlike 301, the browser does not optimize the cache.

304 Not Modified: This status code is returned when the negotiation cache is hit. See Browser Caching

4xx

400 Bad Request: Developers often get confused and just give a general error and don’t know what went wrong.

403 Forbidden: This is not actually a request packet error, but the server is Forbidden to access, for a number of reasons, such as legal prohibition, sensitive information.

404 Not Found: The resource was Not Found, indicating that the corresponding resource was Not Found on the server.

405 Method Not Allowed: The request Method is Not Allowed on the server.

406 Not Acceptable: Resource does Not meet client requirements.

408 Request Timeout: The server waits too long.

409 Conflict: Multiple requests are in Conflict.

413 Request Entity Too Large: The data in the Request body is Too Large.

414 request-uri Too Long: Indicates that the URI in the Request line is Too large.

429 Too Many Request: The client sends Too Many requests.

431 Request Header Fields Too Large The field content of the Request Header is Too Large.

5xx

500 Internal Server Error: It tells you something is wrong with the Server.

501 Not Implemented: Requests from clients are Not supported.

502 Bad Gateway: The server itself is normal, but there is an error when accessing the server.

503 Service Unavailable: Indicates that the server is busy and cannot respond to services temporarily.

005: Briefly summarize the characteristics of HTTP? What are the disadvantages of HTTP?

HTTP features

HTTP features are summarized as follows:

Flexible and extensible, mainly reflected in two aspects. One is semantically free, with only basic formatting, such as spacing to separate words and newlines to separate fields, and no strict syntax restrictions on the rest of the sections. Another is the diversity of transmission forms, not only can transmit text, but also images, videos and other arbitrary data, very convenient.
Reliable transmission. HTTP is based on TCP/IP, so it inherits this feature. This is a feature of TCP and will not be described in detail.
Request-reply. That is, one send one receive, there are back, of course, the requestor and responder is not only between the client and the server, if a server as a proxy to connect to the back-end server, then this server will also play the role of the requestor.
Stateless. State here refers to the context information of the communication process, whereas each HTTP request is independent and irrelevant, and by default no state information needs to be retained.

HTTP shortcomings

stateless

The pros and cons are scenariowise, but the most controversial aspect of HTTP is that it is stateless.

Stateless HTTP can be a disadvantage in scenarios that require persistent connections, where a large amount of context information needs to be saved to avoid sending large amounts of repeated information.

At the same time, however, other applications simply want to get some data and do not need to store connection context information. Statelessness reduces network overhead and is an advantage of HTTP.

Clear transmission

That is, packets (mainly headers) in a protocol use text rather than binary data.

Of course, this facilitates debugging, but also exposes HTTP message information to the outside world, which also facilitates attackers. WIFI traps take advantage of the drawbacks of HTTP cleartext to lure you into a hot spot and then grab all your traffic to get your sensitive information.

Head block problem

When A TCP connection is enabled, only one request can be processed at a time. If the current request takes a long time, other requests can only be blocked, which is known as queue head blocking. This is discussed in a section that follows.

006: How much do you know about the Accept series of fields?

The introduction to the Accept series of fields is divided into four parts: data format, compression mode, supported language, and character set.

The data format

In the previous section, WE talked about the flexible nature of HTTP. It supports a large number of data formats. So how does the client know what format to use when data in so many formats arrives together?

Of course, the least efficient way is to just guess, is there a better way? Can I just specify it?

The answer is yes. First, you need to introduce a standard called MIME(Multipurpose Internet Mail Extensions). It was first used in E-mail systems to allow messages to send any type of data, which is also common with HTTP.

Therefore, HTTP takes part of the MIME type to mark the data type of the body part of the message. These types are represented in the Content-Type field. Of course, this is for the sender, but the receiver can also use the Accept field if he wants to receive a specific type of data.

Specifically, the values of the two fields can be classified into the following categories:

Text: text/ HTML, text/plain, text/ CSS, etc
Image: image/ GIF, image/ JPEG, image/ PNG etc
Audio/video, audio/mpeg video/mp4, etc
application: application/json, application/javascript, application/pdf, application/octet-stream

Compression way

Of course, the data is encoded and compressed. The compression is reflected in the sender’s Content-Encoding field, and the receiving is reflected in the accept-encoding field. The values of this field are as follows:

Gzip: The most popular compression format today
Deflate: Another well-known compression format
Br: A compression algorithm invented specifically for HTTP

/ / the sender
Content-Encoding: gzip
/ / the receiving end
Accept-Encoding: gzip
Copy the code

Support language

For the sender, there is also a content-Language field, which can be used to specify supported languages in internationalized scenarios, and accept-language for the receiver. Such as:

/ / the sender
Content-Language: zh-CN, zh, en
/ / the receiving end
Accept-Language: zh-CN, zh, en
Copy the code

Character set

Finally, there is a special field called accept-charset on the receiving end, which specifies the accepted character set. On the sending end, there is no content-charset. Instead, it is placed in the content-type and specified as a Charset attribute. Such as:

/ / the sender
Content-Type: text/html; charset=utf- 8 -
/ / the receiving end
Accept-Charset: charset=utf- 8 -
Copy the code

Let me conclude with a picture:

007: How does HTTP transmit data of fixed and variable length?

Fixed-length inclusions

For fixed-length packet bodies, the sender usually takes content-length to indicate the Length of the packet body during transmission.

Let’s use a NodeJS server to simulate this:

const http = require('http');

const server = http.createServer();

server.on('request', (req, res) => {
  if(req.url === '/') {
    res.setHeader('Content-Type'.'text/plain');
    res.setHeader('Content-Length'.10);
    res.write("helloworld");
  }
})

server.listen(8081, () = > {console.log("Successful startup");
})
Copy the code

Access after startup: localhost:8081.

The browser displays the following:

helloworld
Copy the code

That’s the right length case, but what about the wrong length case?

Let’s try to make the length smaller:

res.setHeader('Content-Length'.8);
Copy the code

Restart the service and access it again. Now the content in the browser is as follows:

hellowor
Copy the code

Where is the LD in the back? It is actually truncated directly in the body of the HTTP response.

Then let’s try to make the length bigger:

res.setHeader('Content-Length'.12);
Copy the code

The browser displays the following:

It can’t be displayed directly. It can be seen that Content-Length plays a key role in THE HTTP transmission process, and improper setting can directly lead to transmission failure.

Indefinite long inclusion

The above is for the package body of fixed length, how is it transmitted for the package body of variable length?

There is another HTTP header field that must be introduced:

Transfer-Encoding: chunked
Copy the code

Indicates data transfer in blocks. Setting this field automatically produces two effects:

The content-length field is ignored
Push dynamic content continuously based on persistent connections

Let’s still use a real example to simulate block transfer, nodeJS program is as follows:

const http = require('http');

const server = http.createServer();

server.on('request', (req, res) => {
  if(req.url === '/') {
    res.setHeader('Content-Type'.'text/html; charset=utf8');
    res.setHeader('Content-Length'.10);
    res.setHeader('Transfer-Encoding'.'chunked');
    res.write("< p > < / p >");
    setTimeout((a)= > {
      res.write("First transmission 

");
    }, 1000);
    setTimeout((a)= > {
      res.write("Second transmission");
      res.end()
    }, 2000);
  }
})

server.listen(8009, () = > {console.log("Successful startup");
})
Copy the code

Access the effects below:

The response of Telnet capture is as follows:

Note that Connection: keep-alive is the response line and the response header, followed by the response body, separated by a newline.

The structure of the response body is interesting, as follows:

Chunk Length (hexadecimal number) Content of the first chunk Chunk length (hexadecimal number) Content of the second chunk...... 0Copy the code

Finally, there is a blank line, which I want you to notice.

That’s how HTTP transmits fixed-length and variable-length data.

008: How does HTTP handle large file transfers?

For large files of hundreds of M or even G, it is obviously unrealistic to transfer all of them at one go. There will be a lot of waiting time, which will seriously affect the user experience. Therefore, HTTP addresses this scenario with a scoped request solution, which allows a client to request only a portion of a resource.

How to support

Of course, if the server is to support the scope request, to support this function, you must add a response header like this:

Accept-Ranges: none
Copy the code

Used to inform the client that scope requests are supported.

Unpack the Range field

The client, on the other hand, needs to specify which part of the request, which is determined by the request header field Range in the format of bytes=x-y. Let’s discuss the format of this Range:

0-499 indicates from the start to the 499th byte.
500- Indicates the 500th byte to the end of the file.
-100 Indicates the last 100 bytes of the file.

When the server receives the request, it first verifies that the range is valid. If it is out of bounds, it returns a 416 error code. Otherwise, it reads the appropriate fragment and returns a 206 status code.

At the same time, the server needs to add the Content-range field. The format of this field varies according to the Range field in the request header.

Specifically, the response header is different when requesting a single segment of data than when requesting multiple segments of data.

Here’s an example:

// Single-segment data
Range: bytes=09 -
// Multiple segments of data
Range: bytes=09 -.30- 39

Copy the code

We will discuss the two cases separately.

A single piece of data

For a single data request, the response is as follows:

HTTP/1.1 206 Partial Content content-length: 10 Accept-ranges: bytes Content-range: bytes 0-9/100 I am XXXXXCopy the code

It’s worth noting that the Content-range field, where 0-9 represents the return of the request and 100 represents the total size of the resource, makes sense.

Many piece of data

Let’s take a look at the case of multiple requests. The response will look like this:

HTTP/1.1 206 Partial Content
Content-Type: multipart/byteranges; boundary=00000010101
Content-Length: 189
Connection: keep-alive
Accept-Ranges: bytes


--00000010101
Content-Type: text/plain
Content-Range: bytes 0-9/96

i am xxxxx
--00000010101
Content-Type: text/plain
Content-Range: bytes 20-29/96

eex jspy e
--00000010101--
Copy the code

Content-type: multipart/byteranges; Boundary =00000010101, which represents the amount of information:

Requests must be multisegment data requests
The delimiter in the response body is 00000010101

Therefore, the segments of data in the response body are separated by the delimiters specified here, and a — is added to the end of the last delimiter to indicate the end.

That’s what HTTP does for large file transfers.

009: How is the submission of form data handled in HTTP?

There are two main ways to submit a form in HTTP, with two different content-type values:

application/x-www-form-urlencoded
multipart/form-data

Since form submissions are typically POST requests and rarely GET is considered, we put the default submitted data in the body of the request.

application/x-www-form-urlencoded

For application/ X-wwW-form-Urlencoded form content, have the following characteristics:

The data is encoded into&Delimited key-value pairs
Characters are encoded in URL encoding mode.

Such as:

{a: 1, b: 2} -> a=1&b=2 -> {a: 1, b: 2} -> a=1&b=2 ->
"a%3D1%26b%3D2"
Copy the code

multipart/form-data

For multipart/form-data:

Request headerContent-TypeThe field will containboundaryAnd,boundaryThe value of is specified by the browser default. Ex. :Content-Type: multipart/form-data; boundary=----WebkitFormBoundaryRRJKeWfHPGrS4LKe.
Data is divided into multiple parts. Each part is separated by a delimiter. Each part has an HTTP header description subpackage body, such asContent-TypeAt the end of the separator is added--Indicates the end.

The corresponding request body would look like this:

Content-Disposition: form-data; name="data1"; Content-Type: text/plain data1 ----WebkitFormBoundaryRRJKeWfHPGrS4LKe Content-Disposition: form-data; name="data2";
Content-Type: text/plain
data2
----WebkitFormBoundaryRRJKeWfHPGrS4LKe--
Copy the code

summary

It is worth mentioning that the most important feature of multipart/form-data format is that each form element is an independent resource representation. In addition, you may not notice the presence of boundary in the process of writing the business. If you open the packet capture tool, you can see that different form elements are separated. The reason why you don’t feel it in normal times is that the browser and HTTP encapsulates these operations for you.

Moreover, in the actual scene, for the upload of pictures and other files, the basic use of Multipart /form-data rather than application/ X-wwW-form-urlencoded, because there is no need to do URL coding, which brings huge time and occupies more space.

010: How does HTTP1.1 solve the HTTP queue header blocking problem?

What is HTTP header blocking?

As you can see from the previous section, HTTP transmission is based on request-reply mode. Packets must be sent and received one after the other, but it should be noted that the tasks are placed in a task queue and executed sequentially. If the first request is processed too slowly, the processing of subsequent requests will be blocked. This is known as the HTTP queue header blocking problem.

Concurrent connections

For a domain name to allow the allocation of multiple long connections, so the equivalent of increasing the task queue, not a queue of tasks block all other tasks. RFC2616 specifies a maximum of 2 concurrent connections for clients, but in the current browser standard, the limit is much higher (6 in Chrome).

But in fact, even with improved concurrent connections, people still do not meet the performance requirements.

Domain name subdivision

Isn’t it possible for a domain name to have 6 concurrent long connections? I’ll just split up some more domains.

For example content1.sanyuan.com, content2.sanyuan.com.

Such a sanyuan.com domain name can be divided into a very large number of secondary domain names, and they all point to the same server, the number of concurrent long connection is more, in fact, also better solve the problem of queue head blocking.

011: How much do you know about cookies?

Introduction of cookies

As mentioned earlier, HTTP is a stateless protocol. Each HTTP request is independent and irrelevant. By default, no state information needs to be retained. But sometimes you need to save some state. What can you do?

HTTP introduces cookies for this purpose. A Cookie is essentially a small text file stored in the browser, stored internally as key-value pairs (you can see it in the Application section of the Chrome developer panel). Send a request to the same domain name, will carry the same Cookie, the server gets the Cookie for parsing, then can get the state of the client. The server can write cookies to the client by using the set-cookie field in the response header. Here are some examples:

// Request header Cookie: a= XXX; B = XXX // Response header set-cookie: a= XXXset-Cookie: b=xxx
Copy the code

Cookie attribute

Life cycle

Cookie validity can be set using the Expires and max-age attributes.

ExpiresnamelyExpiration time
Max-age Indicates a period of time (in seconds) from the time the browser receives the packet.

If the Cookie expires, it is deleted and not sent to the server.

scope

There are also two properties about scope: Domain and path. The Cookie is bound to a Domain name and path. If the Domain name or path does not match the two properties before sending the request, the Cookie will not be included. Note that for paths, / means that any path under the domain name is allowed to use cookies.

Safety related

If Secure is specified, cookies can be transmitted only through HTTPS.

If the cookie field with HttpOnly, then it can only be transmitted through THE HTTP protocol, not through JS access, which is also an important means to prevent XSS attacks.

Accordingly, for the prevention of CSRF attack, there is also SameSite attribute.

SameSite can be set to three values: Strict, Lax, and None.

A. In Strict mode, the browser completely forbids third-party requests to carry cookies. For example, the request sanyuan.com website can only be carried in the sanyuan.com domain name request, request in other websites are not.

B. In Lax mode, it is more relaxed, but can only carry cookies when the GET method submits a form condition or the A tag sends a GET request, but not otherwise.

C. In the None mode, which is the default mode, cookies are automatically attached to requests.

The disadvantage of the Cookie

Capacity defects. Cookies have a maximum size of 4KB and can only be used to store a small amount of information.
Performance defects. Cookie follows the domain name. No matter whether an address under the domain name needs the Cookie or not, the request will carry the complete Cookie. As the number of requests increases, it will actually cause a huge waste of performance, because the request carries a lot of unnecessary content. This can be resolved by specifying the scope through Domain and Path.
Security defects. Because the Cookie is passed between the browser and the server in the form of plain text, it is easy to be intercepted by illegal users, and then a series of tampering is carried out to re-send the Cookie to the server within the validity period, which is quite dangerous. In addition, when HttpOnly is false, the Cookie information can be read directly from the JS script.

012: How to understand HTTP proxy?

We know that HTTP is based on the request-response model protocol, generally by the client to send a request, the server to respond.

Of course, there is a special case, which is the case of proxy servers. After the introduction of the agent, the server as the agent is equivalent to a middleman role. For the client, the server responds. For the source server, the request is initiated by the client and has a dual identity.

So what exactly is a proxy server for?

function

Load balancing. The request from the client only reaches the proxy server first. The client does not know how many source servers and IP addresses there are. Therefore, the proxy server can get the request and distribute it to different source servers through a specific algorithm, so that the load of each source server is as equal as possible. Of course, there are many such algorithms, including random algorithms, polling, consistent hash, LRU(the least recently used), and so on, but these algorithms are not the focus of this article, so you’ll be interested to explore them for yourself.
Ensure safety. Monitor the server in the background using the heartbeat mechanism, and kick it out of the cluster once the faulty machine is found. In addition, it is the work of the proxy server to filter the data of the upper and lower rows and limit the flow of illegal IP addresses.
Cache proxy. Cache content to the proxy server so that clients can get it directly from the proxy server rather than from the source server. The next section breaks it down in detail.

Relevant header field

Via

What if the proxy server needs to identify itself and leave its own imprint on HTTP traffic?

Record Via the Via field. For example, there are now two proxy servers in the middle that go through this process after the client sends a request:

Client -> Proxy 1 -> Proxy 2 -> Source serverCopy the code

When the source server receives the request, it gets this field in the request header:

Via: proxy_server1, proxy_server2
Copy the code

When the source server responds, the client eventually gets a response header like this:

Via: proxy_server2, proxy_server1
Copy the code

As you can see, the order of agents in Via is the order in which packets are delivered in the HTTP transport.

X-Forwarded-For

This literally means For whom it is Forwarded, and it records the requesting PARTY’s IP address (note that, unlike Via, X-Forwarded-for records the requesting party’s IP address).

X-Real-IP

Is a field that gets the real IP address of a user. No matter how many agents pass through, this field always records the IP address of the original client.

It also carries a list of the client’s domain name and protocol name to X-Forwarded-host and X-Forwarded-proto, respectively.

Questions posed by X-Forwarded-for

This field carries the IP address of the requesting server to forwarded-for. This field changes to forwarded-for each time it passes through a different proxy. This field carries the IP address of the requesting server to forwarded-for.

But this creates two problems:

Means that the proxy must parse the HTTP request header and then modify it, which degrades the performance of forwarding the data directly.
During HTTPS encryption, the original packet cannot be modified.

This results in the proxy protocol, which typically uses the plaintext version and simply adds text in this format above the HTTP request line:

// PROXY + TCP4/TCP6 + Address of the requester + address of the receiver + request port + receive port
PROXY TCP4 0.0. 01. 0.0. 02. 1111 2222
GET / HTTP/1.1.Copy the code

This solves the problem with X-Forwarded-for.

013: How do I understand HTTP caching and caching proxies?

In terms of strong and negotiated caches, I’ve done a detailed analysis of whether I can talk about browser caches, which is summarized as follows:

First verify that the strong Cache is available through cache-control

If strong cache is available, use it directly
Otherwise enter the negotiation cache, that is, send HTTP request, the server through the request headerIf-Modified-SinceorIf-None-MatchtheseConditions of the requestField to check whether the resource is updated
- If the resource is updated, return the resource and 200 status code
- Otherwise, 304 is returned, telling the browser to get the resource directly from the cache

In this section we’ll focus on another type of caching: proxy caching.

Why is the proxy cache generated?

For the source server, it also has a cache, such as Redis, Memcache, but for the HTTP cache, if every time the client cache invalidation has to get to the source server, that is a lot of pressure on the source server.

This introduces the mechanism for caching proxies. Let the proxy server take over part of the HTTP cache on the server side. The client obtains the HTTP cache from the proxy cache when the proxy cache expires, and requests the source server when the proxy cache expires. In this way, the pressure on the source server can be significantly reduced when the traffic is heavy.

So how does a caching proxy work?

In general, the control of the cache proxy is divided into two parts, one on the source server side and the other on the client side.

Cache control of the source server

Private and public

In the response header of the source server, the cache-control field is added to the Cache Control field. The value of this field can be added to the private or public value to indicate whether the proxy server is allowed to Cache, the former is disabled, and the latter is allowed.

For example, if some very private data is cached in the proxy server, other people can access the proxy directly to get the data, it is very dangerous, so generally do not allow the proxy server to Cache the data, set the cache-control response header to private, not public.

proxy-revalidate

Must-revalidate means that when the client cache expires, it will get it from the source server, while proxy-revalidate means that when the proxy server cache expires, it will get it from the source server.

s-maxage

S stands for share and limits how long the cache can be stored in the proxy server. This does not conflict with max-age, which limits the client cache time.

For example, the source server adds a field to the response header:

Cache-Control: public, max-age=1000, s-maxage=2000
Copy the code

This is equivalent to the source server saying: THIS response is allowed to be cached by the proxy server. The client cache has expired and is cached in the proxy server for 1000 seconds and 2000 seconds.

Client cache control

Max – stale and min – fresh

These two fields can be added to the client’s request header to tolerate and restrict caching on the proxy server. Such as:

max-stale: 5
Copy the code

When a client accesses the proxy server cache, it does not matter if the proxy cache expires. As long as the expiration time is less than 5 seconds, the client can still obtain the cache from the proxy server.

Such as:

min-fresh: 5
Copy the code

Do not wait until the cache expires to get it, make sure to get it 5 seconds before the expiration, otherwise you will not get it.

only-if-cached

This field is added to indicate that the client will only accept the proxy cache and not the response from the source server. If the proxy cache is invalid, then 504 (Gateway Timeout) is returned.

The above is the content of the cache proxy, involving more fields, hope to have a good review, deepen understanding.

014: What is cross-domain? How does the browser intercept the response? How to solve it?

In a development model that separates the front and back ends, you often encounter cross-domain problems where an Ajax request is sent, the server responds successfully, but the front end just doesn’t get the response. So let’s talk about that a little bit.

What is cross-domain

To review the composition of a URI:

Browsers follow the same origin policy (if scheme, host, and port are the same). Non-homologous sites have some limitations:

Cannot read and modify each other’s DOM
Do not read the Cookie, IndexDB, and LocalStorage of the peer
Limit XMLHttpRequest requests. (The following topic will focus on this.)

When a browser sends an Ajax request to a target URI, it generates a cross-domain request as long as the current URL and the target URL are from different sources.

The response to a cross-domain request is usually intercepted by the browser. Note that the response is intercepted by the browser, but it actually reaches the client successfully. So how did this intercept happen?

The first thing to know is that the browser is multi-process. In Chrome’s case, the process is composed as follows:

Both the WebKit rendering engine and the V8 engine are in the rendering process.

When xhr.send is called and the Ajax request is ready to be sent, it is actually only being processed by the rendering process. To prevent hackers from accessing system resources through scripts, the browser sandboxes each rendering process. To prevent Spectre and Meltdown, which are always present in CPU chips, the browser allocates sandboxes to different sites (with different level 1 domains). See the Chromium security team’s presentation on YouTube.

The rendering process in the sandbox has no way to send network requests. What then? It can only be sent through a network process. That involves Inter Process Communication (IPC). Next we look at how inter-process communication in Chromium is completed, in the chromium source code call order is as follows:

May see you will be more meng, if you want to in-depth understanding can go to see the latest source code of Chromium, IPC source code address and Chromium IPC source code parsing article.

The idea is to use Unix Domain Socket sockets in conjunction with the event-driven, high-performance network concurrency library Libevent to complete the IPC process.

Ok, so now the data is passed to the browser main process, and when the main process receives it, it actually makes the appropriate network request.

After the server has processed the data, it returns the response. The main process checks that it is cross-domain and has no CORS (more on this later) response header. The response body is discarded and not sent to the rendering process. This serves the purpose of intercepting data.

Let’s look at some of the solutions to cross-domain problems.

CORS

CORS is actually a W3C standard, the full name is cross-domain resource sharing. It requires both browser and server support. Specifically, non-IE and IE10 + support CORS. The server needs to attach specific response headers, which will be dismantled later. But before we can understand how CORS works, we need to understand two concepts: simple requests and non-simple requests.

The browser classifies requests according to the request method and specific fields in the request header. Specifically, the rule is that a simple request is one that satisfies the following conditions:

The request method can be GET, POST, or HEAD
The value of the request header can be Accept, Accept-language, Content-language, and Content-type (only three values are allowed.)application/x-www-form-urlencoded,multipart/form-data,text/plain)

The browser draws a circle with simple requests on the inside and non-simple requests on the outside, and handles the two different types of requests differently.

A simple request

What does the browser do before the request goes out?

It automatically adds an Origin field to the request header to indicate which source the request came from. When the server receives the request, it responds with the access-control-allow-Origin field. If the Origin field is not in the scope of the field, the browser intercepts the response.

Therefore, the access-control-allow-Origin field is required for the server to determine whether the browser intercepts the response. In the meantime, there are other optional functional fields that describe what would happen if interception were not done.

Access – Control – Allow – Credentials. For cross-domain requests, the browser sets this field to false by default. To get the browser’s cookies, you need to add the response header and set it to true. The withCredentials property should also be set on the front end:

let xhr = new XMLHttpRequest();
xhr.withCredentials = true;
Copy the code

Access – Control – Expose – Headers. This field is going to empower the XMLHttpRequest object, Not only does it get the basic six response header fields (including cache-Control, Content-language, Content-Type, Expires, last-Modified, and Pragma), but it also gets the response header field declared by this field. For example:

Access-Control-Expose-Headers: aaa
Copy the code

Then on the front end can be by XMLHttpRequest. GetResponseHeader (” aaa “) to the aaa the value of this field.

Non-simple request

Non-simple requests are somewhat different in two ways: precheck request and response fields.

Let’s take the PUT method as an example.

var url = 'http://xxx.com';
var xhr = new XMLHttpRequest();
xhr.open('PUT', url, true);
xhr.setRequestHeader('X-Custom-Header'.'xxx');
xhr.send();
Copy the code

When this code executes, the precheck request is first sent. The request line and body for this precheck request are in the following format:

OPTIONS/HTTP/1.1 Origin: Current address Host: xxx.com access-control-request-method: PUT access-control-request-headers: X-Custom-HeaderCopy the code

OPTIONS is used to precheck the request, adding the Origin address and the Host destination address. This is simple. Two key fields will also be added:

Access-control-request-method lists which HTTP methods are used for CORS requests
Access-control-request-headers specifies what Request header will be added to the CORS Request

This is a precheck request. Next comes the response field, which is also divided into two parts: one is the response to the precheck request, and the other is the response to the CORS request.

Precheck the response to the request. The format is as follows:

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PUT
Access-Control-Allow-Headers: X-Custom-Header
Access-Control-Allow-Credentials: true
Access-Control-Max-Age: 1728000
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Content-Length: 0
Copy the code

There are several key response header fields:

Access-control-allow-origin: Indicates the source that the request can be allowed. The value can be a specific source name or an Origin name*Indicates that requests from any source are allowed.
Access-control-allow-methods: Indicates the list of allowed request Methods.
Access-control-allow-credentials: Has been described in simple requests.
Access-control-allow-headers: indicates the request header field that can be sent
Access-control-max-age: The validity period of the precheck request, during which no additional precheck request is issued.

After the response to the prechecked request is returned, the ONError method of the XMLHttpRequest is triggered if the request does not meet the conditions in the response header, and of course the actual CORS request is not sent.

Response to CORS request. Having come a long way, it is much easier to get to a real CORS request, which is now the same as a simple request. The browser automatically adds the Origin field, and the server response header returns access-control-allow-Origin. Refer to the simple request section above.

JSONP

Although the XMLHttpRequest object follows the same origin policy, the script tag is different. It can make a GET request by filling in the target address with SRC, implementing a cross-domain request and getting a response. This is how JSONP works. Let’s encapsulate a JSONP:

const jsonp = ({ url, params, callbackName }) = > {
  const generateURL = (a)= > {
    let dataStr = ' ';
    for(let key in params) {
      dataStr += `${key}=${params[key]}& `;
    }
    dataStr += `callback=${callbackName}`;
    return `${url}?${dataStr}`;
  };
  return new Promise((resolve, reject) = > {
    // Initialize the callback function name
    callbackName = callbackName || Math.random().toString.replace(', '.' '); 
    // Create the script element and add it to the current document
    let scriptEle = document.createElement('script');
    scriptEle.src = generateURL();
    document.body.appendChild(scriptEle);
    // Bind to window for later invocation
    window[callbackName] = (data) = > {
      resolve(data);
      // The script is useless and needs to be cleared
      document.body.removeChild(scriptEle); }}); }Copy the code

Of course, there is also a response operation on the server. For example, express:

let express = require('express')
let app = express()
app.get('/'.function(req, res) {
  let { a, b, callback } = req.query
  console.log(a); / / 1
  console.log(b); / / 2
  // Return the script tag, and the browser executes the string directly
  res.end(`${callback}(' packet ') ');
})
app.listen(3000)
Copy the code

The front end simply calls this:

jsonp({
  url: 'http://localhost:3000'.params: { 
    a: 1.b: 2
  }
}).then(data= > {
  // Get the data for processing
  console.log(data); / / packet
})
Copy the code

Compared with CORS, the biggest advantage of JSONP is good compatibility, IE low version can not use CORS but can use JSONP, disadvantages are also obvious, the request method is single, only support GET request.

Nginx

Nginx is a high-performance reverse proxy server that can be used to easily solve cross-domain problems.

What? Reverse proxy? I’ll show you a picture and you’ll understand.

The forward proxy helps the client access a server that the client cannot access on its own, and then returns the result to the client.

The reverse proxy takes requests from clients and forwards them to other servers. The main scenario is to maintain the load balancing of the server cluster. In other words, the reverse proxy takes requests from other servers and then selects an appropriate server to forward the request to.

Therefore, the difference between the two is clear: the forward proxy server does things for the client, while the reverse proxy server does things for the other server.

Ok, so how does Nginx solve cross-domain?

For example, if the client’s domain name is client.com and the server’s domain name is server.com, the client sends Ajax requests to the server, which of course will cross domains.

server { listen 80; server_name client.com; location /api { proxy_pass server.com; }}Copy the code

The Nginx server acts as a reverse proxy and forwards the request to server.com. The Nginx server, which has the same domain name as client.com, sends the client to client.com/api first. When the response is returned, it is sent back to the client, completing the process of cross-domain requests.

In fact, there are some less commonly used ways, you can understand, such as postMessage, of course, WebSocket is also a way, but no longer belong to the category of HTTP, some other strange technology is not recommended to rote memorization, on the one hand, never use, the name is hard to remember, on the other hand, temporarily memorized, The interviewer won’t be able to score points for your impression because you can see that it’s a bad impression. Of course, no back does not mean minus points, cross-domain principle and the first three main cross-domain way to understand clearly, stand up to further consideration, but will let others think you are a reliable person.

015: What is the process of shaking hands with TLS1.2?

As mentioned earlier, HTTP is a cleartext transmission protocol, which is completely transparent and very insecure. How can we further ensure security?

The result is HTTPS, which is not a new protocol, but a layer of SSL/TLS protocol added to HTTP. Simply put, HTTPS = HTTP + SSL/TLS.

What is SSL/TLS?

SSL, or Secure Sockets Layer, is at the session Layer (Layer 5) in the OSI 7-layer model. There have been three major versions of SSL, and it was only in the third major version that SSL was standardized as TLS (Transport Layer Security) and is considered the version of TLS1.0, specifically TLS1.0 = SSL3.1.

TLS1.0 and TLS1.1 are considered insecure and will be completely phased out in the near future. So TLS1.2 is what we’re going to talk about, and of course in 2018, TLS1.3 came out with a much better TLS handshake process, which we’ll talk about in the next section.

The process of TLS handshake is quite complicated. Before writing this article, I consulted a large number of materials, and found that it was very unfriendly to TLS beginners, and many knowledge points were vague. It can be said that the process of sorting was quite painful. Hope my following disassembly can help you understand more smoothly:

Traditional RSA handshake

Let’s start with the traditional TLS handshake, which you see a lot on the Internet. I’ve written before about why HTTPS (traditional RSA version) makes data transfer more secure, and also about the concepts of symmetric and asymmetric encryption, which I recommend you read. It is called the RSA version because it uses THE RSA algorithm when encrypting and decrypting pre_random.

TLS 1.2 Handshake procedure

Now let’s talk about the mainstream TLS version 1.2 approach.

You may be confused at first, but don’t worry. Go through the following process and you’ll see.

step 1: Client Hello

First, the browser sends client_RANDOM, the TLS version, and the list of encryption suites.

What is client_random? An argument to the final secret.

What is the list of encryption suites? For example, a list of encryption suites would look like this:

TLS_ECDHE_WITH_AES_128_GCM_SHA256
Copy the code

In the TLS handshake process, the ECDHE algorithm is used to generate pre_random(this number will be introduced later), the 128-bit AES algorithm is used for symmetric encryption, and the mainstream GCM grouping mode is used in the process of symmetric encryption, because a very important problem in symmetric encryption is how to group. The last one is the hash summary algorithm, using SHA256 algorithm.

One of the things worth explaining is the hash digest algorithm. Imagine a scenario where the server is sending a message to the client, and the client doesn’t know whether the message was sent by the server or a fake message from a middleman. Now introduce the hash digest algorithm, the server certificate information through this algorithm to generate a digest (can be understood as a short string), used to identify the server identity, encrypted with the private key after the encrypted identity and its own public key to the client. The client takes the public key, decrypts it, and generates another abstract. Two abstracts are compared and, if identical, the identity of the server is confirmed. This is the principle of the so-called digital signature. In addition to the hash algorithm, the most important process is private key encryption, public key decryption.

step 2: Server Hello

You can see that the server responds to the client in a single sitting.

Server_random is also the last parameter to generate secret, and it’s not hard to see why, confirming the TLS version, the encryption suite you need to use, and your own certificate. So what’s the rest of the server_params for?

So, we’re going to put this in context, but for now, all you need to know is that server_random has reached the client.

Step 3: The Client verifies the certificate and generates a secret

The client verifies that the certificate and signature passed from the server are valid. If so, the parameter client_params is passed to the server.

The client then calculates pre_random through the ECDHE algorithm, passing in two parameters :server_params and client_params. Now you can see what these two parameters are for, since ECDHE is based on the elliptic curve discrete logarithm, these two parameters are also called the public keys of the elliptic curve.

The client now has client_RANDOM, server_RANDOM, and pre_RANDOM, which it then runs through a pseudo-random number function to calculate the final secret.

Step4: Server generates secret

Didn’t the client just send client_params?

Now the server starts generating pre_random using the ECDHE algorithm, and then generates the final secret using the same pseudo-random number function as the client.

Matters needing attention

The TLS process is basically finished, but there are two more points to note.

First, TLS handshake is actually a two-way authentication process. As shown in step1, the client can verify the identity of the server. Can the server verify the identity of the client?

Of course you can. Specifically, in Step3, when the client sends client_params, it actually sends an authentication message to the server, asking the server to go through the same authentication process (hash digest + private key encryption + public key decryption) to confirm the identity of the client.

Second, when the client generates secret, will send a server end of the message, tell the server after the use of symmetric encryption, symmetric encryption algorithm with the first convention. When the secret is generated, the server also sends a final message to the client, telling the client to communicate directly with symmetric encryption.

The Finished message consists of two parts: one is the “Change Cipher Spec” message, which indicates that the Finished message is encrypted and transmitted. The other is the “Finished Spec” message, which summarizes all sent data and encrypts the Finished message for verification.

When both parties are verified, the handshake is officially over. Later HTTP packets begin to transmit encrypted packets.

The difference between RSA and ECDHE handshake process

The ECDHE handshake, which is the mainstream TLS1.2 handshake, uses ECDHE to implement pre_random encryption and decryption, without using RSA.
Another feature of using ECDHE is that after the client sends the final message, it can jump the queue in advance and directly send HTTP packets, saving an RTT. It does not have to wait for the final message to reach the server, and then wait for the server to return the final message to itself, and then directly start sending requests. This is also called TLS False Start.

016: What improvements have been made to TLS 1.3?

Although TLS1.2 has been in existence for more than 10 years and has gone through numerous tests, the wheel of history is always moving forward. In order to obtain stronger security and better performance, TLS1.3 was launched in 2018. A series of improvements have been made to TLS1.2, which are mainly divided into the following parts: strengthening security and improving performance.

Strengthen the security

In TLS1.3, a large number of encryption algorithms were abolished, leaving only five encryption suites:

TLS_AES_128_GCM_SHA256
TLS_AES_256_GCM_SHA384
TLS_CHACHA20_POLY1305_SHA256
TLS_AES_128_GCM_SHA256
TLS_AES_128_GCM_8_SHA256

As you can see, the only symmetric encryption algorithms left are AES and CHACHA20, which are also used by the mainstream. The grouping mode is only GCM and POLY1305, and the hash summary algorithm is only SHA256 and SHA384.

So you might ask, why is an important asymmetric encryption algorithm like RSA gone?

I think there are two reasons:

First, FREAK attack was discovered in 2015, that is, some people have found the RSA vulnerability and can be cracked.

Second, once the private key is leaked, the middleman can use the private key to calculate the secret of all previous messages and decrypt all previous ciphertext.

Why is that? During the RSA handshake, the client obtains the certificate of the server, extracts the public key of the server, generates pre_random, encrypts the pre_random with the public key, and sends it to the server. The server decrypts the pre_random with the private key to obtain the real pre_random. When the intermediate gets the server private key and intercepts all the previous messages, it can get pre_random, server_random and client_random and generate secret according to the corresponding random number function. In other words, it gets the final TLS session key. Every historical message can be cracked in this way.

However, ECDHE generates a temporary key pair for each handshake, and even if the private key is cracked, the previous historical messages are not affected. This one-time hack does not affect the nature of historical information is also called forward security.

RSA algorithm does not have forward security, which ECDHE does, and so it completely replaces RSA in TLS1.3.

Improve performance

Shake hands to improve

The process is as follows:

The server does not have to wait for the certificate to be verified before getting the client_params. Instead, it obtains the client_params directly on the first handshake and calculates secret immediately after receiving the certificate. Saves unnecessary waiting time before. It also means that the client needs to send more information during the first handshake, all at once.

This TLS 1.3 handshake is also known as the 1-RTT handshake. However, there is still some room for optimization of the 1-RTT handshake, which we will discuss next.

Session reuse

There are two modes of Session reuse: Session ID and Session Ticket.

Say earliest Seesion ID first, practice is the first time the client and the server connection after each session ID, and stores the session key, when the connection again, the client sends the ID, the server for the existence of this ID and direct when I found it before the session state reuse, don’t have to regenerate the session key, Just use the original.

However, this method also has a disadvantage, when the number of clients is large, the server storage pressure is very large.

Hence the second way — Session Ticket. The idea is that if there’s a lot of stress on the server, spread the stress on the client. To be specific, after the connection is successful, the server encrypts the Session information and sends a Session Ticket message to the client for the client to save. The next time you try to connect a Ticket, you decrypt the Ticket and verify that it has expired. If it has not expired, you restore the previous session state.

Although this method reduces the storage pressure on the server, it also brings security problems, that is, a fixed key is used to decrypt Ticket data each time. Once the hacker gets the key, all the previous historical records are cracked. To avoid such problems, the key needs to be changed periodically.

All in all, these session reuse techniques save the time used by algorithms such as session key generation while ensuring 1-RTT, which is a considerable performance improvement.

PSK

So all of these are 1-RTT optimizations, can we optimize to 0-RTT?

The answer is yes. The idea is to send a Session Ticket with the application data without waiting for the server to confirm. This is called a pre-shared Key, or PSK.

This approach, while convenient, also brings security concerns. The middleman intercepts the PSK data and repeats the data to the server, similar to the TCP first handshake, increasing the risk of server attack.

conclusion

On the basis of TLS1.2, TLS1.3 abolishes a large number of algorithms and improves security. At the same time, session reuse is used to save the time of regenerating the key, and 0-RTT connection is achieved by using PSK.

017: What are the improvements to HTTP/2?

Since HTTPS is already so good at security, the focus of HTTP improvements is on performance. For HTTP/2, there are two main performance improvements:

The head of compression
multiplexing

There are also some disruptive features:

Set the request priority
Server push

These major improvements are essentially designed to address the problems of HTTP itself. Let’s take a look at what problems HTTP/2 solves, and how.

The head of compression

In the days of HTTP/1.1 and before, the body of a request would normally have a compression Encoding for the response, specified by the Content-Encoding header field, but have you ever thought about compression for the header field itself? When the request fields are very complex, especially for GET requests, the request packets are almost all request headers, and there is still a large space for optimization. HTTP/2 also uses the corresponding compression algorithm, HPACK, to compress the header.

The HPACK algorithm is designed specifically for HTTP/2 and has two main features:

First, create a hash table between the server and client, and store the used fields in this table. Then, when transferring the values that have appeared before, only need to index them (such as 0,1,2,…). Pass to the other side, the other side to get the index table. This way of sending indexes allows for a great deal of simplification and reuse of request header fields.

HTTP/2 does away with the concept of the start line and converts the request method, URI, and status code in the start line into header fields, but these fields have a “:” prefix to distinguish them from other request headers.

The second is to carry out Huffman encoding for integers and strings. The principle of Huffman encoding is to first build an index table for all the characters that appear, and then make the index corresponding to the characters that appear more times as short as possible. During transmission, such index sequence is also transmitted, which can achieve a very high compression rate.

multiplexing

HTTP queue header is blocked

We discussed earlier the problem of HTTP queue blocking. The root cause of this problem is the request-response model of HTTP. In the same TCP long connection, if the previous request is not answered, the subsequent request is blocked.

Later we’ll talk about concurrent connections and domain name sharding to solve this problem, but this doesn’t really solve the problem at the HTTP level, it just increases the risk of TCP connections. The downside of this is that multiple TCP connections compete for limited bandwidth, leaving truly high priority requests unprocessed.

HTTP/2 solves the problem of queue header blocking from the HTTP protocol itself. Note that this is not TCP queue header blocking, but HTTP queue header blocking, which is not the same thing. TCP queue header blocking is at the packet level. If the previous packet is not received, the subsequent packet is not forwarded to HTTP. HTTP queue header blocking is at the HTTP request-response level. They’re at a different level.

So how does HTTP/2 solve so-called queue header blocking?

Binary framing

First of all, HTTP/2 thinks that the plaintext transmission is too cumbersome for the machine to parse, because the text will have multiple characters, such as carriage return and line feed is the content or the delimiter, in the internal need to use the state machine to identify the inefficient. Therefore, HTTP/2 simply changed all the packets into binary format and transmitted all 01 strings, which facilitated the parsing of the machine.

The previous Headers + Body message format is now split into binary frames. Headers frames are used to store header fields and Data frames are used to store request Body Data. After the frame splitting, the server no longer sees a complete HTTP request message, but a bunch of binary frames out of order. These binary frames are not sequenced, so they are not queued, and there is no HTTP queue-head blocking.

Both parties can send binary frames to each other. This two-way transmission sequence of binary frames is also called a Stream. HTTP/2 uses streams to communicate multiple data frames over a TCP connection. This is the concept of multiplexing.

You may have a question, since it is out of order first, then finally how to deal with these out of order data frames?

First of all, by “out of order”, we mean that streams with different IDS are out of order, but frames with the same Stream ID must be transmitted in order. After the binary frame arrives, the binary frame with the same Stream ID is assembled into a complete request packet and a response packet. Of course, there are other fields in the binary frame that enable priority and flow control, which we will cover in the next section.

Server push

Another thing worth mentioning is HTTP/2 Server Push. In HTTP/2, instead of passively receiving and responding to requests, the server can also create a stream to send messages to the client. When a TCP connection is established, for example, a browser requests an HTML file, the server can return the HTML, The other resource files referenced in the HTML are returned to the client, reducing the wait on the client.

conclusion

Of course, with so many new features added to HTTP/2, does HTTP syntax need to be relearned? No, HTTP/2 is fully compatible with the syntax and semantics of previous HTTP, such as request headers, URIs, status codes, and header fields. On the security side, HTTP also supports TLS, and major browsers now openly only support encrypted HTTP/2, so most of the HTTP/2 you’re seeing is running on TOP of TLS. Finally, let me show you a hierarchical diagram:

018: How are binary frames designed in HTTP/2?

The frame structure

The frame structure transmitted in HTTP/2 is shown below:

Each frame is divided into frame head and frame body. It starts with a frame length of three bytes, which represents the length of the frame body.

Then there are the frame types, which can be roughly divided into data frames and control frames. Data frames are used to store HTTP packets, and control frames are used to manage the transmission of streams.

The next byte is the frame flag. There are eight flag bits in it. The common ones are END_HEADERS, which means the end of header data, and END_STREAM, which means the end of one-way data transmission.

The last four bytes are the Stream ID, or Stream identifier, which allows the receiver to select frames with the same ID from the out-of-order binary frames and assemble the request/response message in sequence.

Flow state changes

As you can see, in HTTP/2, a stream is actually a sequence of binary frames transmitted in both directions. So how does the state of the flow change between HTTP/2 requests and responses?

In fact, HTTP/2 also borrowed the idea of TCP state change, according to the frame flag bit to achieve specific state change. Here’s an example of a common request-response process:

After the client sends the Headers frame, the Stream ID is allocated. At this point, the client’s Stream is opened, the server’s Stream is opened, and the server’s Stream is opened after the client receives the Headers frame. After both streams are opened, the data and control frames can be passed to each other.

When the client is about to close, it sends an END_STREAM frame to the server and enters the semi-closed state. At this time, the client can only receive data, but cannot send data.

After receiving the END_STREAM frame, the server also enters the semi-closed state, but the server can only send data, not receive data. Then the server sends the END_STREAM frame to the client, indicating that the data is sent and both sides enter the closed state.

If you want to start a new stream next time, the stream ID increases until the upper limit is reached. When the upper limit is reached, a new TCP connection is restarted and the counting starts. Since the stream ID field is 4 bytes long and the highest bit is reserved, the range is 0 to 2 ^ 31, about 2.1 billion.

The characteristics of the flow

Just as we talked about the process of stream state change, here is a summary of the characteristics of stream transport:

Concurrency. Unlike HTTP/1, multiple frames can be sent simultaneously over an HTTP/2 connection. This is also the basis for multiplexing.
Since the sex. Stream ids are not reusable, but are incrementing sequentially until the upper limit is reached and a new TCP connection is opened to start from scratch.
Two-way sex. Both the client and the server can create streams without interfering with each otherThe senderorThe receiving party.
You can set the priority. You can set the priority of data frames so that the server can process the important resources first to optimize the user experience.

That’s the introduction to binary frames in HTTP/2, hopefully to inspire you.

The last

This article was first published on my blog, if it is helpful to you, I hope you can help me to click a star, thank you very much ~

Reference:

Detailed Explanation of Web Protocol and Actual Packet Capture — Tao Hui

“Perspective HTTP Protocol” — Chrono

Chromium IPC source

Prerequisite Nginx knowledge for front-end developers — Conardli