This is the first day of my participation in Gwen Challenge

How the Reading Browser works (1)

Browser architecture

  1. The user interface

    Except for the main window that displays the request page, everything else is part of the user interface, including the address bar and so on.

  2. Browser engine

    Transfer instructions between the user interface and the rendering engine.

  3. The rendering engine

    Displays the requested content. If HTML and CSS are requested, it parses and displays them on the screen.

    (This browser works on a long principle.. This is just the beginning… We’ll talk more about this later, but I won’t waste time here.)

    ❓ For Chrome, each TAB corresponds to a separate rendering engine, which is a separate process.

    By default, the rendering engine displays HTML and XML and images, and plug-ins can be extended to display PDFS and more.

  4. Ui interface back end

    Responsible for drawing basic controls (that is, widgets that trigger events from the page (browser), such as composite selection boxes, input boxes, and so on). These controls display differently depending on the browser, but all expose events that are handled by low-level calls to the operating system’s user interface methods.

  5. network

    ❓ is used for network calls, such as HTTP requests.

  6. Js parser

    ❓ is used to parse and execute Javascript code.

  7. Data is stored

    ❓ browsers need to keep all kinds of data on their hard disks, called “web databases,” such as cookies.

    Cookie, LocalStorage, SessionStorage, userData and IndexedDB

doubt

  1. Each TAB corresponds to a separate rendering engine

    • How does Chrome start threads?

      We could, you know, reboot the browser, open an incognito window. At this point, click on the Options menu in the upper right corner of Chrome, select the More Tools submenu, and click On Task Manager to open the Chrome Task Manager window and see what processes are open.

      It can be seen that the browser is started from the closed state, and then a new page needs at least one network process, one browser process, one GPU process and one rendering process, a total of four processes.

      If a TAB page is opened later, the browser, network process and GPU process are shared and will not be restarted. If two pages belong to the same site and page B is opened from page A, they will also share a rendering process; otherwise, a new rendering process will be opened.

      By default, Chrome creates a rendering process for each Tab. For security reasons, renderers are run in sandbox mode.

    • Advantages of multiple processes?

      • Prevent a page crash from affecting the entire browser

      • The operating system provides methods to limit the permissions of certain processes

      • Processes have their own private memory space and can have more memory

    • Site isolation?

      In general, Chrome usually defaults a TAB to a process, but if there is shared content between pages, the same process will be shared. Site isolation is a recently introduced feature in Chrome that eliminates shared processes and allows you to run a separate renderer process for each cross-site iframe, ensuring that different sites are on different processes. To prevent attacks such as Spectre and Meltdown.

    • Spectre and Meltdown attacks?

      【 computer 】15 minutes to read Intel fuse ghost vulnerability -Emory

  2. Network module

    Domain name resolution?

    1. Chrome DNS cache: 1 minute, 1000 entries, Chrome ://net-internals/# DNS

    2. Operating system DNS cache: ipconfig /displaydns

    3. Host file: C:\Windows\System32\drivers\etc

    4. A recursive request for domain name resolution to the locally configured preferred DNS server must return the address.

      DNS server?

      They are usually provided by telecom operators, but you can also use DNS servers like Google

      Domain name Resolution Request Protocol?

      DNS occupies port 53 of both UDP and TCP. TC is used for area transmission and UDP is used for domain name resolution.

      There are two types of DNS server: the main and secondary DNS server DNS server, in an area, the main server read the machine data file to obtain DNS data, secondary DNS server reads the master DNS server to get the data, he started, and the primary communication and load the data information, this behavior is called the regional transmission, using TCP protocol transmission.

      The secondary server will periodically check whether the data is changed to the primary server. If the data is changed, the secondary server will perform regional transmission and synchronize the data. TCP is used because there is a large amount of data to be synchronized and TCP is of high stability.

      When you query a domain name from the DNS server, the returned information usually does not exceed 512 bytes. You can use UDP to transmit the returned information. The DNS server does not need to go through the TCP three-way handshake, so the load is lower and the response is faster.

      1. The DNS server searches its cache

      2. Initiate an iterative DNS resolution request on behalf of our server

        1. Start by looking for the IP address of the root domain (.com)

        2. Search for the IP address of.baidu.com

          This DNS address is usually provided by the domain name registrar, such as Wanwang, xinwang, etc

        3. If you find the domain name address of www.baidu.com, return it to the system kernel, which returns it to the browser.

      3. If no, the operating System searches for the NetBIOS name Cache. This Cache contains the IP addresses that successfully communicated with the computer within a recent period of several minutes. nbtstat -c

      4. Querying WINS servers (Mapping between Storage NETBIOS names and IP addresses)

      5. Client broadcast lookup

      6. The client reads the LMHOSTS file (written in the same directory as the HOSTS file).

      TCP three-way handshake?

      The user-agent (generally refers to the browser) initiates a TCP connection request to the WEB application (HTTPD, nginx, etc.) on a random port (1024 < port < 65535).

      HTTP requests go through layers of encapsulation in the TCP/IP4 model to the server, into the network card, and then into the kernel’s TCP/IP stack layer by layer to be unsealed, and then to the Web application to establish connections.

      How does the server generate AN HTML file after receiving an HTTP request?

      Nginx reads the configuration file, matches the corresponding file path, and makes an IO system call to the kernel. The kernel finds the file, reads it from the hard disk into its own memory space, and copies the file into the memory space where the Nginx process resides.

      At the file system level, after the kernel knows the file path/A /b/ c.HTML to obtain, it reads the inode of/in the metadata area, finds the corresponding data block number of /, finds the directory stored in the data block, and finds the inode number of a in the metadata area.

      After reading the inode of A, read the inode of b, read the inode of B, read the inode of B, read the inode of C. HTML, and finally find the corresponding block of C. HTML, and then get the complete content.

      (obtain the inode number -> find the data block -> read directory, obtain the next target inode number, until find the file location)

      After reading the HTML file, get the static resource? (Take Chrome as an example)

      1. Start loading

        Chrome divides web resources into :(incomplete)

        key meaning
        MainRescouce Pages entered in the address bar, pages nested with frame/ IFrame, pages clicked through hyperlinks, pages redirected after form submission.
        kImage Image resources
        kCSSStyleSheet CSS resource
        kScript The script resource
        kFont The font
        kRaw Mixed type resources, dynamic resources, such as Ajax
        kSVGDocument SVG resources
        kXSLStyleSheet XSLT, the extended stylesheet language, is a transformation language
        kLinkPrefetch Pre-read resources for HTML5 pages
        kTextTrack Video subtitles
        kImportResource HTML Imports, which import an HTML file into other HTML documents, for example<link href="import/post.html" rel="import" />
        kMedia Media resources
        kManifest HTML5 applications cache resources
        kMock Reserved test type
      2. Preprocessing request

        Include URL, HTTP header, HTTP body, priority, and so on.

        Next check that the request is valid and make some changes to the request. If the check returns kAbort or kBlock, the resource is deprecated or blocked and will not be loaded.

        There are several possible reasons for being blocked:

        key meaning
        kCSP Content security policy checks to reduce XSS attacks.
        kMixedContent Mixed Content Mixed Content block.
        kOrigin This is mainly SVG, using the use of SVG resources must not cross the domain, the following resources will be blocked
        kInspector Inspector for DevTools
        kSubresourceFilter Subresource filter
        kOther
        kNone

        kCSP

        Failure to pass a CSP check for a resource type returns the cause of the blockage and changes the request as required.

        Setting content=”upgrade-insecure-requests” changes the request object, forcing HTTP requests for web pages to be upgraded to HTTPS.

        If we are only allowed to load images from our own domain, we can either add the meta tag: content=”img-src ‘self’, or set the response header backend.

        kMixedContent

        Requests for HTTP content on HTTPS sites, such as loading an HTTP JS script, are vulnerable to man-in-the-middle attacks, such as modifying the JS content to control the entire HTTPS page.

        If the content is not set to “block-all-mixed-content”, it will not be blocked out. This is called passive mixed content.

        Active mixed content can be loaded if the user allows it. However, if the page is set to “block-all-mixed-content”, user Settings that allow blockable resources to load will be invalid.

        If it is a nested page, the child page allows active mixed content to be loaded, and the parent page does not.

        Origin Block

        SVG using use to obtain SVG resources must not cross domains. If the protocol, domain name, and port number are the same, the device passes the check. The need for this is different from the same origin policy, where the source block is not even able to send the request, whereas the same origin policy is just blocking the return result of the request.

      3. Resource priority

        Calculate the resource loading priority. First, each resource has a default priority.

        Priorities are classified into five levels: very high, high, medium, low, and very low. The judgment order is as follows: MainRescource pages, CSS, fonts, which can be seen at once, have the highest priority, followed by Script, Ajax, while images, audio and video have the lower default priority, and the lowest is prefetch preloaded resources (link, rel= Prefetch).

        After setting the default priority for resources, some adjustments are made, mainly for prefetch/preload resources.

        • Reduce the priority of preload fonts

          Preloaded fonts have priority changed from very-high to High

        • Lower the priority of defer/async’s script

          Script will have the lowest priority if it is defer

        • The script priority of preload at the bottom of the page becomes Medium

          If it is a preload script, and if the page already has an image that is considered to be at the bottom of the page, set its priority to Medium.

          The resource is considered early before the first non-preload image and late after it. The late script has a low priority.

          Prefetch: In early browsers, once a script was encountered, it was downloaded and executed before parsing the rest of the DOM.

          Preload: When a script is encountered, the DOM stops building, but continues to search for resources needed for page loading (img, script, etc.) and preloads instead of waiting for DOM to start executing again.

        • Set priority to very-high for resources that sync or block loading

          Ajax synchronization requests are set to very high when initialized

          An Ajax synchronization request that was originally hight will execute Max at the end (the priority determined by the switch, the current priority of the synchronization request), so the request will be adjusted to very high

      4. After grading, before sending the request, the rendering thread is converted to Net priority as follows:

        priority The resource type
        HEIGHEST (Very High) CSS /font/ page/synchronization request
        MEDIUM (high) js/ajax
        LOW (medium) Manifest/bottom of the page preload script (cached and preloaded script resources)
        LOWEST (low) Img /video/audio
        IDLE (very low) Prefetch /defer script (js resources that block threads)

        Net Priority is used when requesting resources, and is done in Chrome’s IO thread. The benefit is that if two pages request the same resource, there is a cache to avoid repeated requests.

        Resource requests are made in the IO thread. Communication between the rendering thread and the IO thread is done through the Mojo framework wrapped in Chrome. The renderer thread will send a message to the IO thread telling it to load the resource.

      Resource loading?

      There is a ScheduleRequest function that determines if the current resource is ready to load, if so, and if not, continues to place it on the pending Request queue.

      It is called in two places:

      1. Received a message from rendering thread IPC::Mojo requesting to load resources

      2. In addition a LoadAnyStartablePendingRequests calls him, the function of logic is traversing the pending request, every time a request, remove the highest priority calls ScheduleRequest determine whether can be loaded, you can take to run.

        It is called in three places:

        1. When you insert the body tag
        2. Each request is completed, the trigger loading pending requests in the request has not been loaded LoadAnyStartablePendingRequests (call)
        3. IO thread timing loops unfinished tasks, triggering loading

      None delayable: Non-deferred None delayable requests whose priority is greater than or equal to Medium.

      Layout-blocking: Layout-blocking is a CSS request that has not yet rendered the body tag and has a higher priority than Medium. (CSS /font/ page)

      <! DOCType html> <html> <head> <meta charset="utf-8"> <link rel="icon" href="4.png"> <img src="0.png"> <img src="1.png"> <link rel="stylesheet" href="1.css"> <link rel="stylesheet" href="2.css"> <link rel="stylesheet" href="3.css"> <link rel="stylesheet" href="4.css"> <link rel="stylesheet" href="5.css"> <link rel="stylesheet" href="6.css"> <link rel="stylesheet" href="7.css"> </head> <body> <p>hello</p> <img src="2.png"> <img src="3.png"> <img src="4.png"> <img src="5.png"> <img src="6.png"> <img src="7.png"> <img src="8.png"> <img src="9.png"> <script src="1.js"></script> <script src="2.js"></script> <script src="3.js"></script> <img src="3.png"> <script> ! function(){ let xhr = new XMLHttpRequest(); xhr.open("GET", "https://baidu.com"); xhr.send(); document.write("hi"); } (); </script> <link rel="stylesheet" href="9.css"> </body> </html>Copy the code
      1. High-priority resources (>=Medium), synchronous requests, and non-HTTP (S) requests can be loaded immediately
      2. As long as a layout blocking resource is loaded, you can only load one delayable resource at most
      3. Delayable resources can only be loaded when layout blocking and high priority resources are loaded. This explains why you should wait for CSS to load before loading other JS/images.
      4. A maximum of six delayable resources can be loaded in a domain at the same time. A maximum of 10 delayable resources can be loaded in a client page at the same time.

      It can be concluded that:

      1. Because 1. CSS to 9. CSS were high priority or none delayable, they were in flight but were limited to a maximum of six in the same domain, and the three 6/7/9.css had gained ground

      2. 1. CSS to 5. CSS is layout-blocking, so you can only load a delayable 0

      3. Wait until high priority and layouta-blocking resources 7.css/9.css/1.js are loaded, then start loading delayable resources, mainly preload JS and images

        Why 1.js is high priority while 2.js and 3.js are delayable?

        At first it was Low because it was surmised loaded (the resource at the bottom of the page), so it had a lower priority, but by the time the DOM was built there it wasn’t preloaded, it was normal JS loaded, so it had a Medium priority, which you can infer from the has_html_body tag, 2.js, on the other hand, has to wait until 1.js has been downloaded and parsed to be considered normal, otherwise it is presumed loaded, so its priority is not improved.

  3. Js parser

    • Scanner (lexical analyzer)

      Scan all source code -> Lexical Analysis -> Word flow (lexical unit)

      Tokens online: esprima.org/demo/pars.. …

    • Parser

      Word flow -> syntax tree, which analyzes syntax errors and determines lexical scope (scope declared where)

      var a = 2=>

      As you can probably see, type is the declared variable, id is a, and the initial value is constant 2.

      AST online view website: astexplorer.net/

      • Pre-parser parses only code that is not immediately executed, for scoping purposes. After pre-parsing, the code is executed, and then the full Parser is started.

        function foo() {
            console.log('a');
            function inline() {
                console.log(' 'b)
            }
        }
        
        (function bar() {
            console.log('c')
        })();
        
        foo();
        Copy the code
        1. Foo: It is not executed immediately, so with pre-parsing, inline inside will also be resolved
        2. Bar: Execute immediately and parse directly with Parser
        3. Foo () : When called, parsed with Parser, inline is parsed again.
    • Lgnition interpreter

      Syntax tree -> bytecode (not machine code until machine transfer) -> explain execution

      Convert the AST to bytecode, and then start interpreting execution line by line.

      In earlier versions of V8, there was no intermediate bytecode generation process, but all source code was converted into machine code. Machine code, while faster to execute, takes up a lot of memory.

    • TurboFan (Compiler)

      Enter bytecode and some analysis data and generate optimizations

      When Ignition executes the code, V8 looks at the code and records execution information, such as times of execution and parameter types. If a function is called more times than the set value, it is marked as a hot function, and the bytecode and execution information of the function are sent to TurboFan, who makes assumptions to further optimize the code (such as assuming that the parameter type is numeric, after which there is no need to check the type), and then compiles directly into machine instructions. If a subsequent discovery is not a number, it indicates that the assumption is wrong, and the optimization rollback is performed to restore the bytecode.

    • Perform js

      Syntax analysis stage: the loaded code is checked syntactically, and the pre-compilation stage is entered after the check is completed.

      Precompilation stage: global, function precompilation. This stage creates the execution context, including the function name and variable promotion, establishing the scope chain, determining the this point, and so on.

      Execution phase: the event loop pushes the execution context created during compilation phase onto the call stack and becomes the running execution context. When the code finishes executing, pop it off the call stack.

      (a lot of basic knowledge points, you can see js and little Red Book when you don’t know)

A ramble on modern browser architectures. Md

Chrome is fast for a reason, popularizing the browser architecture

Inside look at modern web browser (part 1)

Explain Spectre and Meltdown vulnerabilities to programmers

【 computer 】15 minutes to read Intel fuse ghost vulnerability -Emory

What does a complete HTTP transaction look like

How to load resources from Chrome source

Browser page resource loading process and optimization

The interpreter, tree traversal interpreter, stack – based and register-based, a hodgepodge

Links to posts an introduction to each JavaScript engine, and related information/blog collection posts

Segmentfault.com/a/119000002…