This is the 23rd day of my participation in the First Challenge 2022

preface

From the input the URL to the page display is not only a classic interview question, the knowledge points involved in a wide range of knowledge, so the interviewer can be initiated to the candidates through this problem a bit to the question, so this interview question is frequently asked, so let’s have a look at this topic how to answer it.

Phase 1: User input phase

After a user enters a content in the address bar, the browser first determines whether the user enters a legitimate URL or a search content. If it is a search content, the browser synthesizes the URL, and if it is a legitimate URL, it starts loading.

Phase 2: Initiates the URL request phase

Initiating a URL request consists of the following steps:

  1. Build the request line: The browser process first builds the request line information and then sends the URL request to the network process via the interprocess communication IPC.
  2. Search cache: After obtaining the URL, the network process searches for cached resources in the local cache. If there are cached resources, the network process directly returns them to the browser process. Otherwise, it enters the network request phase.
  3. DNS resolution: The network process checks whether the current domain name is cached from the DNS data cache server. If yes, the request is returned directly. Otherwise, the DNS resolves the IP address and port number of the domain name.
  4. Waiting for TCP queues: Chrome has a mechanism that allows a maximum of six TCP connections to be established for a domain name at the same time. If the number of connections exceeds this number, they must be queued.
  5. Establish A TCP connection: Establish a connection with the server using the TCP three-way handshake, and then transfer data.
  6. Initiating an HTTP request: The browser first sends a request line to the server. The request line contains the request method, request URI, and HTTP version. It also sends a request header to tell the server some information about the browser, such as the browser kernel, request domain name, and Cookie.
  7. The server processes the request: the server first returns the corresponding line, including the protocol version and status code, and then returns the response header containing the returned data type, cookies that the server wants to save on the client side, and so on.
  8. Disconnect THE TCP connection: After the data transfer is complete, disconnect the connection by waving the hand four times.

Stage 3: Prepare the render process stage

  1. The network process parses the retrieved data and determines the type of the response data according to the Content-Type in the response header. If it is a byte stream, it sends the request to the download manager for downloading. If it is a text/ HTML type, it informs the browser process that it has received HTML and is ready to render.
  2. In general, each TAB page in the browser corresponds to a renderer. If a new page is opened from the current page and belongs to the same site, the renderer will be reused. In other cases, a new renderer will be created.

Phase 4: The document submission phase

  1. When the renderer process is ready, the browser will send a message to submit the document to the renderer process. After receiving the message, the renderer process will establish a data transmission pipeline with the network process. After the document data transmission is completed, the renderer process will return a message confirming the submission to the browser process.
  2. When the browser receives a message confirming the submission, it updates the browser’s page status, including the security status, URL of the address bar, historical status of forward and backward, and updates the Web page to blank.

Stage 5: Page rendering stage

  1. After the document is submitted, the renderer process begins page parsing and loads the child resources.
  2. Building a DOM tree: The HTML is parsed and output is a DOM with a tree structure with document as the top node.
  3. Style calculation: Convert external styles from the Link tag, styles in the style tag, and styles on elements into a style sheet that the browser can understand. Then standardize the values of attributes in the style sheet, such as color:red to the RGB form of color. Then the specific style of each node of the DOM tree is calculated according to the inheritance and cascading rules of CSS.
  4. Layout phase: A layout tree with only visible elements is generated, and the location and size of each node in the layout tree is calculated.
  5. Layering: Create layer trees for complex effects such as 3D transformations, page scrolling, or Z-order.
  6. Draw: Generate a draw list for each layer and submit it to the composition thread.
  7. Rasterization: To generate bitmap data by prioritizing blocks within the visual window.
  8. Compositing: the page is displayed after all blocks have been rasterized.

The problem summary

RQ1: The process by which the browser parses HTML

To answer this question, we can start from stage 5: page rendering stage.

RQ2: At what stage does strong and negotiated caching occur?

Strong and negotiated caching occurs during the initiating URL request phase, where the cache is looked up after the request row is built.

RQ3: Do ports need DNS resolution in DNS resolution?

No, because HTTP uses port 80 by default and HTTPS uses port 443 by default. If you want to specify a port, you can add it to the URL directly.

RQ4: Which stages can be optimized?

  1. Optimizing DNS query: DNS preresolution
  2. Optimizing TCP connections: This can be done with keep-alive headers.
  3. Optimize HTTP response packets by CDN and Gzip compression.

More optimization details will be explained through special articles, please look forward to.

Reference documentation

Thanks to the author of the following article, harvest a lot of knowledge!

  • “Interview FAQ” from entering the URL to showing what happened (99 points answers)