This is the fifth day of my participation in Gwen Challenge

This article has participated in the weekend study program, click to see more details

What happens when you go from entering a URL in the browser to presenting a page? This article is explained in two phases, the first is the navigation phase, which is what the browser does before the page actually starts rendering, and the second is rendering, which is how the received Http information is rendered to the page through the browser’s internal processing.

The main responsibilities of browser processes, renderers, and web processes

  • Browser processes are responsible for user interaction, child process management, and file storage.
  • Web processes are web downloads for renderers and browser processes.
  • The rendering process is mainly responsible for parsing HTML, JavaScript, CSS, images and other resources downloaded from the Web into pages that can be displayed and interacted with. Because all the contents of the renderer process are obtained through the network, there will be some malicious code to exploit browser vulnerabilities to attack the system, so the code running in the renderer process is not trusted. This is why Chrome makes the rendering process run in a security sandbox, just to keep the system safe.

process

  1. First, the browser process receives the URL request entered by the user, and the browser process forwards the URL to the network process.
  2. The actual URL request is then made in the network process.
  3. The network process then receives the response header data, parses it, and forwards it to the browser process.
  4. After receiving the response header data from the network process, the browser process sends a “CommitNavigation” message to the renderer process.
  5. After receiving the “submit navigation” message, the renderer process is ready to receive THE HTML data by directly establishing a data pipeline with the network process.
  6. Finally, the renderer process “confirms submission” to the browser process, which tells the browser process that it is ready to accept and parse the page data.
  7. When the browser process receives a “submit document” message from the renderer, it removes the old document and updates the page state in the browser process.

The process by which a user sends a URL request to a page to begin parsing is called navigation.

Dismantling parsing

1. User input

When a user enters a query keyword in the address bar, the address bar determines whether the entered keyword is the search content or the requested URL.

  • For search content, the address bar uses the browser’s default search engine to synthesize new urls with search keywords.
  • If the input content is judged to conform to THE URL rules, for example, enter anblog.top, the address bar will combine this content with the protocol according to the rules to synthesize the complete URL. anblog.top/

2.URL request process

Page resource request process. In this case, the browser process sends the URL request to the network process through interprocess communication (IPC). After receiving the URL request, the network process initiates the actual URL request process here.

  1. First, the network process looks up whether the local cache has cached the resource. If there is a cached resource, it is returned directly to the browser process. If the resource is not found in the cache, the network request flows directly. The first step before making a network request is to perform DNS resolution (1. 2. Check the hosts file on the host. 3. DNS server resolution. 4. If the DNS server is not available, recurse (ask someone else to find it) and iterate (ask someone else to tell you or give you directions) to get the IP address of the server that requested the domain name. If the request protocol is HTTPS, you also need to establish a TLS connection.

  2. The next step is to establish a TCP connection with the server using the IP address. After the connection is established, the browser side will construct the request line, request information, etc., and attach the data related to the domain name, such as cookies, to the request header, and then send the constructed request information to the server.

  3. After receiving the request information, the server generates response data (including response line, response header, and response body) based on the request information and sends it to the network process. After the network process receives the response line and header, it parses the contents of the header. (For the sake of illustration, I refer to the response headers and response rows returned by the server as response headers below.)

    Several related concepts
    • redirect

      Upon receiving the response header from the server, the network process parses the response header. If the status code returned is 301 or 302(configured on the server), the server requires the browser to redirect to another URL. The network process reads the redirected address from the Location field in the response header, then initiates a new HTTP or HTTPS request and starts all over again. If blog.annanblog.top is requested, it will be redirected toblog.annanblog.top

      How do I view the response header returned by the server

      Enter the address without HTTPS in CMD on the terminal

      curl -I blog.annanblog.top
      Copy the code

      Enter the HTTPS address

      curl -I https://blog.annanblog.top
      Copy the code

      Conclusion: In the navigation process, if the status code of the server response line contains the jump information of 301 or 302, the browser will jump to the new address to continue navigation. If the response line is 200, then the browser can continue processing the request.

    • Response data type processing

      After processing the jump information, we continue the analysis of the navigation flow. The data type of the URL request, sometimes a download type, sometimes a normal HTML page, so how do browsers distinguish between them?

      The answer is content-type. The Content-Type is a very important field in the HTTP header that tells the browser what Type of response body data the server is returning. The browser then uses the value of the Content-Type to decide how to display the response body Content.

      curl -I https://blog.annanblog.top
      Copy the code

      You can see that the content-Type field of the response header is text/ HTML, which tells the browser that the server is returning data in HTML format.

      If the request is downloaded, the value of the Content-Type will change

      The content-type value is application/ OCtet-stream, and the data displayed is a byte stream. Normally, the browser will handle the request according to the download Type.

      Conclusion: The subsequent processing flow of different Content-Types is quite different. If the value of the Content-Type field is determined by the browser to be a download Type, the request is submitted to the browser’s download manager and the navigation of the URL request ends. But if it’s HTML, the browser will continue with the navigation process. Since Chrome’s page rendering runs in the render process, the next step is to prepare the render process.

    3. Prepare the rendering process

    By default, Chrome assigns a render process to each page, meaning that a new render process is created for each new page opened. However, there are some exceptions, in some cases the browser will allow multiple pages to run directly in the same render process.

    The navigation bar opens Chrome’s Task Manager

    The same site will also share a thread to the new page, but it is important to note that this open click directly jump, if you right-click the browser to open a new TAB or create a new render thread.

    “Same site” is defined as the root domain name (for example, annanblog.top) plus the protocol (for example, https:// or http://). It also contains all subdomain names under the root and different ports, such as the following three:

    https://blog.annanblog.top
    https://www.annanblog.top
    https://www.annanblog.top:8080
    Copy the code

    Chrome’s default strategy is one render process per TAB. However, if a new page is opened from one page and belongs to the same site as the current page, the new page will reuse the parent page’s rendering process. Officially, this default policy is called process-per-site-instance.

    Example: Let’s say I’m on the Annanblog. top page and open the wordpress login screen from that page

    https://www.annanblog.top/wp-login.php
    Copy the code

    To see the same site, open Chrome’s Task Manager (right click in the Chrome header)

    As you can see, the same process is shared, while using the old site to open a new site, the new site is the subdomain blog.annanblog.top

    https://blog.annanblog.top/
    Copy the code

    As you can see, the same process is still used, and since the annanblog.top and blog.annanblog.top tabs have the same protocol and root domain, they belong to the same site

    But when opening the station end file (Beian. Miit. Gov. Cn / # / both…)

    You can see that a new process has been started because it is not the same site.

Conclusion:

To open a page chrome takes the render process strategy by default:

  • Typically, a separate rendering process is used to open a new page;
  • If page B is opened from page A, and A and B belong to the same site, page B reuses the rendering process of page A; If not, the browser process creates a new renderer for the B page.

After the rendering process is ready, it cannot immediately enter the document parsing state, because the document data is still in the network process and has not been submitted to the rendering process, so the next step is to enter the document submission phase.

4. Submit documents

To submit a document is to submit HTML data from a web process to a rendering process:

  1. First, when the browser receives the response header data from the web process, it sends a “submit document” message to the renderer process. Browser process: OK, it seems that the network process is OK, you two can hand over!
  2. When the rendering process receives the “submit document” message, it will establish an “IPC” with the network process to transmit data. Render process: network process! Come on! To me!!!!!!!
  3. When the document data transfer is complete, the renderer process returns a “confirm submission” message to the browser process. Render process: Good! I got the data! Just so you know, browser process!
  4. After receiving the confirm message, the browser process updates the browser interface status, including the security status, URL in the address bar, forward and backward historical status, and Web interface update. Browser processes: I’ve arranged for user interaction on my part of the browser!

After the browser process confirms the submission: Update as shown below:

  • Update forward and backward status
  • Update the security status lock
  • Update the URL address bar
  • Updating the Web Page

This explains why, when you type an address into your browser’s address bar, the previous page doesn’t disappear immediately, but instead takes a while to load before the page is updated. After that, it’s the rendering phase.

Tomorrow more render stage =-=