There was an interview question: What happens from the time the URL is typed into the browser to the page presentation?
I’m sure most of you will be able to answer this question, but if you continue to ask: if the HTML you receive contains dozens of image tags, in what manner, in what order, how many connections are made, and what protocol is used to download the images?
To understand this, we need to solve the following five questions:
-
Do modern browsers disconnect after an HTTP request is completed after establishing a TCP connection with the server? When will it be disconnected?
-
How many HTTP requests can a TCP connection correspond to?
-
Can HTTP requests sent in a TCP connection be sent together (e.g., three requests sent together and three responses received together)?
-
Why do you sometimes refresh pages without re-establishing SSL connections?
-
Does the browser limit the number of TCP connections to the same Host?
First question
Do modern browsers disconnect after an HTTP request is completed after establishing a TCP connection with the server? When will it be disconnected?
In HTTP/1.0, a server breaks the TCP connection after sending an HTTP response. However, TCP connections are re-established and disconnected each time a request is made. So although it is not specified in the standard, some servers support Connection: keep-alive headers. After completing the HTTP request, do not disconnect the TCP connection used for the HTTP request. The advantage of this is that the connection can be reused, and the TCP connection does not need to be re-established when sending HTTP requests later, and the overhead of SSL can be avoided if the connection is maintained. The two pictures are the statistics of my two visits to https://www.github.com in a short period of time:
On the first access, there are initialization connections and SSL overhead
The initialization connection and SSL overhead disappear, indicating that the same TCP connection is being used
Persistent connections: Since there are so many benefits to maintaining a TCP Connection, HTTP/1.1 writes the Connection header into the standard and enables persistent connections by default, unless the request says “Connection” : Close, the TCP connection between the browser and the server is maintained for a period of time and does not break at the end of the request.
So the answer to the first question is: by default, establishing a TCP Connection does not break, only declaring Connection: close in the request header will close the Connection after the request completes.
Second question
How many HTTP requests can a TCP connection correspond to?
The answer to the first question is that a TCP connection can send multiple HTTP requests if the connection is maintained.
Third question
Can HTTP requests sent in a TCP connection be sent together (e.g., three requests sent together and three responses received together)?
A problem with HTTP/1.1 is that a single TCP connection can only handle one request at a time. This means that the life cycles of two requests cannot overlap. Any two HTTP requests cannot overlap from start to end in the same TCP connection.
Pipelining is specified in the HTTP/1.1 specification to address this problem, but it is turned off by default in browsers.
Take a look at what Pipelining is, as outlined in RFC 2616:
A client that supports persistent connections MAY “pipeline” its requests (i.e., send multiple requests without waiting for each response). A server MUST send its responses to those requests in the same order that the requests were received. A client that supports persistent connections can send multiple requests within a connection (without waiting for a response from any request). The server receiving the request must send the response in the order the request was received.
One possible reason for the standard is that HTTP/1.1 is a text protocol, and the content returned does not distinguish which request to send, so the order must be consistent. What if you send two GET/query requests to the server? Q = A and GET/query? Q =B, the server returns two results, and there is no way for the browser to determine which one the response corresponds to.
Pipelining looks like a good idea, but there are many problems in practice:
-
Some proxy servers cannot handle HTTP Pipelining correctly.
-
Proper pipelining implementation is complex.
-
Head of line Blocking: After establishing a TCP connection, suppose that the client sends several consecutive requests to the server on that connection. By standard, the server should return the results in the order it received the requests, assuming that the server took a lot of time to process the first request, then all subsequent requests would have to wait for the first request to end.
So HTTP Pipelining is not enabled in modern browsers by default.
However, HTTP2 provides the Multiplexing feature, which allows multiple HTTP requests to be completed simultaneously over a TCP connection. How exactly the Multiplexing is implemented is another question. We can see the effect of using HTTP2.
The green is the waiting time from the initiation of the request to the return of the request, and the blue is the download time of the response. It can be seen that both are completed in parallel in the same Connection
So there is an answer to this question: Pipelining technology exists at HTTP/1.1 that can do this at the same time, but since the browser is turned off by default, it can be argued that this is not feasible. Multiplexing allows multiple HTTP requests to be processed in parallel in the same TCP connection due to the Multiplexing feature in HTTP2.
So how can browsers improve page loading efficiency in the AGE of HTTP/1.1? There are two main points:
-
Maintains an established TCP connection with the server and processes multiple requests sequentially over the same connection.
-
Multiple TCP connections are established with the server.
Fourth question
Why do you sometimes refresh pages without re-establishing SSL connections?
The answer to the first question has been addressed in the discussion. TCP connections are sometimes maintained by browsers and servers for a period of time. TCP does not need to be re-established, SSL will naturally use the previous.
Fifth question
Does the browser limit the number of TCP connections to the same Host?
Suppose we were in the HTTP/1.1 era, when there was no multiplexing. What would a browser do when it got a web page with dozens of images? You can’t just open a TCP connection for sequential downloading, which will make it very uncomfortable for users to wait. However, if each image has a TCP connection to send HTTP requests, the computer or server may not be able to bear it. If there are 1000 images, you can’t open 1000 TCP connections. Your computer may or may not agree with NAT.
So the answer is: yes. Chrome allows up to six TCP connections to the same Host. There are some differences between browsers.
https://developers.google.com/web/tools/chrome-devtools/network/issues#queued-or-stalled-requestsdevelopers.google.com
So back to the original question, if the HTML you receive contains dozens of image tags, in what way, in what order, how many connections are made, and what protocol is used to download the images?
If the images are all HTTPS connections and under the same domain name, then the browser will negotiate with the server after the SSL handshake whether HTTP2 can be used and if so use the Multiplexing function over the connection. But also would not necessarily all hang in the domain of resources will be to use a TCP connection to get, but what is certain is Multiplexing is likely to be used.
What if you can’t use HTTP2? Or you can’t use HTTPS (in real life HTTP2 is implemented over HTTPS, so you can only use HTTP/1.1). The browser establishes multiple TCP connections on the same HOST. The maximum number of connections depends on the browser Settings. These connections are used by the browser to send new requests when idle. Then the other requests will have to wait.
Further reading
How do you simulate true concurrent requests in Java?
What does the stateless HTTP protocol mean by “state”? !
Explain the connection count and thread pool of Tomcat
TCP common knowledge of interview essentials
What interviewers want to see and what interviewers need to prepare for!
Author: Akira Matsu
Source: https://zhuanlan.zhihu.com/p/61423830