Browser processes

Browser processes

Mainly include:

  • Broswer process: Responsible for controlling and coordinating, creating and destroying other processes; Control download; Draw bitmaps and so on.
  • GPU process: used for 3D drawing; A maximum of one GPU process can be configured at a time
  • Render process: handles page rendering, script execution, event handling, etc. Generally a TAB for a Render process, depending on the browser implementation
  • Plug-in process: Handles plug-in related work.
  • Network process: Handles network request related work.

The Render process thread

The Render process is closely related to the front end and has multiple threads

  • GUI rendering thread (main thread) : responsible for page construction, such as parsing HTML, CSS, building layout trees, layer trees, drawing lists, etc
  • Compositing thread: Divide the layer into blocks and submit the blocks near the viewport to the raster thread
  • Raster thread (raster thread pool) : Generates a bitmap, the color value of each pixel required for the page
  • JS engine thread: responsible for parsing and executing JAVASCRIPT scripts, a Render process has only one JS engine thread, so JS is called a single thread, and this thread and GUI Render thread are mutually exclusive, which is why a large number of complex calculations in JS can lead to inconsistent page rendering
  • EventLoop polling thread: controls the EventLoop; When the JS engine thread encounters an asynchronous execution block such as setTimeout, it notifies the EventLoop to poll the processing thread, which then delivers it to the corresponding asynchronous thread and waits for the trigger condition to be set up, and then adds the callback function to the corresponding processing queue for processing.
  • Timer trigger thread: It is responsible for timing timer events and notifying the polling thread to queue the callback function when the time is up
  • Asynchronous HTTP request thread: XMLHttpReque A thread newly started after a connection that queues the callback function, if any, if it detects a change in the status of the request.
  • Browser event thread: The browser event thread listens to the DOM and notifies the polling thread to queue the callback after an event occurs

Look at the process in conjunction with the page rendering process

Foreplay:

  1. The network process receives the response data after sending the network request. The response header is parsed and the request continues on success
  2. According to the Content-Type field in the response header, broswer starts the download process if it is Application/OCtet-stream. For text/ HTML, the render process is started and the Broswer process sends a “Receive document” message to the render process.
  3. When the renderer receives the “receive document” message, it establishes a channel with the network process to receive the data.
  4. When the renderer receives the data, it sends a “receive confirmation” message to the Broswer process
  5. After receiving the “acknowledge” message, the browser main process updates the browser page, including the URL in the address bar and the forward and backward buttons. The Render process is then executed

Is play:

The Render process does its job

  1. When the rendering process starts to receive data, it will pre-scan the received data. If it finds tags (IMG,link, external script, etc.) that need to load resources, it will tell the browser main process to download them first. This process is calledPreliminary analysis
  2. Then start parsing the HTML
    • When an HTML tag is encountered, a node is generated to build a DOM tree
    • When CSS code is encountered, CSSOM is built based on the CSS style picker
    • The CSS link will continue to download
    • When JS code is encountered, it blocks the DOM build and waits for the CSS code to download and parse before executing the JS code
    • If you encounter a JS link, the same as above if it is a normal link. If there is async, wait for JS download to complete, then the above process; If there is defer, the JS code will not be executed until the JS download is complete and the HTML parsing is complete
  3. The layout tree is then built based on the DOM tree and CSSOM
  4. After building the layout tree, the GUI thread also layers the nodes to build the layer tree
  5. Once the layer tree is built, the GUI thread is used for each layerCreate draw instructionThe list is then handed over toSynthesis of the threadTo deal with
  6. The compositing thread blocks the layer to the browser userNear the viewportThe map block is given priorityRasterize threadsTo generate bitmaps; The process of rasterization is usually carried out by GPU, that is, the rasterization thread will send the instruction of drawing blocks to GPU, and then THE GPU will generate bitmap (color value of pixel points) of the blocks, which will be stored in GPU memory.
  7. When rasterization is complete, the composite thread is emittedDrawQuadInstruction to the Broswer processVIZ components
  8. VIZ componentsThe page is drawn into memory, at which point the page data has been drawn
  9. After sent to the graphics card processing; The graphics card is divided intoBefore the bufferandAfter the bufferData is first sent toAfter the buffer, the graphics card will be synthesized into a picture, after the front and back buffer switch, the picture is displayed on the screen.

Inspired by the

  • CSS comes first, JS files come after, and async and defer if possible

  • Redraw recalculates styles, skipping the generation of layout trees and layer trees, and then the generation of draw lists and subsequent steps, while GUI threads and JS engine threads are mutually exclusive, thus largely avoiding conflicts between the two; Backflow reconstructs the layout tree, which has a significant impact on performance.

  • If elements in the composition layer change, they are handled directly by the composition thread and do not occupy GUI thread resources. Even if the GUI thread gets stuck, things like animation still appear smoothly

    Therefore, consider the above:

    • Use createDocumentFragment for bulk DOM operations
    • For resize, listen for scrolling for anti-shake/throttling
    • Avoid frequent style changes and try to change them all at once by changing the class
    • Using compositing, the best way to animate with CSS is to use the WILL-change property of CSS
    • Avoid overly complex property selectors

WebWorker

  • When creating a Webworker, the JS engine requests a child thread from the browser to process the JS code in a named file. It cannot manipulate the DOM because it runs in another global context. PostMessage can communicate with JS engine threads to pass data. So time-consuming calculations can be performed by the Worker thread and the result returned to the JS engine thread;

  • SharedWorker is a process in the browser that all pages share; Webworkers are threads that belong to a single page

Second, browser cache

1. HTTP caching

Strong cache

Before sending an HTTP request, the strong cache is checked and returned if hit.

How do I check for strong caching?
  • HTTP / 1.0ExpiresField check that exists in the response header returned by the server
Expires: Wed, 22 Jun 2021 08:41:00 GMT
Copy the code

Indicates that the resource expires at 8:41 am on June 22, 2021

Problem: The server time may be inconsistent with the client time, resulting in inaccurate expiration time

  • HTTP / 1.1Cache-ControlField, which is controlled by expiration time
Cache-Control: max-age = 3600
Copy the code

Indicates that the resource can be used directly by the cache for 3600s after receiving the response

It also has fields such as private (no proxy cache), S-maxage (proxy cache expiration time), no-cache (no strong cache, use negotiated cache), and no-store (no cache)

The use of strong caching
  1. For resources that change frequently, a strong Cache cannot ensure that the latest data is obtained every time. Therefore, you are advised to set the cache-control field to no-cache. Expiration is determined when used (that is, negotiated cache)
  2. For infrequently changing data, cache-control can be set to a large max-age; If data needs to be updated later, dynamic characters such as hash, version number, and random number can be added after the URL to change the URL, invalidating the previous strong cache
Negotiate the cache

Strong cache If there is no hit, the client sends an HTTP request and carries the corresponding cache tag in the request header to decide whether to use the negotiated cache

Cache tag
  • The if-modified-since and last-modified

    On the first request, the server adds a last-Modified field in the response body to inform the client of the Last modification time of the resource. If the client requests it again, the if-modified-since value is set to the last-modified value sent back by the previous server. After receiving the message, the server compares if-modified-since with the last-modified resource on the server. If they are inconsistent, the resource expires and fails to be matched. Otherwise, the resource has not changed during this period.

  • The ETag and If – None – Match

    ETag is an identifier generated by the server based on the contents of the file, and this value changes if the contents of the file change. The browser receives the value and sets it to the if-none-match field in the next request. After receiving if-none-match, the server compares the Etag values of the resource on the current server. If the Etag values are inconsistent, the resource is changed. Otherwise return 304;

  • The difference between

    Etag is content-based, while last-Modified focuses on whether a modification has been made; Those of you who have written in Word know that if you add a word and then delete it, it will assume that you have changed it, even if the content has not changed substantially, as last-modified does, and it records the time.

    So Etag is more accurate; In addition, the minimum last-Modified sensing time is 1s. If the server changes the last-Modified value within 1s after the last-Modified value is sent to the client, the last-Modified value will not change.

    Last-modified is superior to Etag in terms of performance

    If both are supported, the server prioritizes Etag

2. Local cache

Local cache is a built-in storage function of the browser. The advantage is that it can be controlled by JS and is more flexible than HTTP cache

Cookie

Typically generated by the server, cookies are designed to compensate for HTTP statelessness. They are a small text file, 4KB; Internally stored as key-value pairs; Cookie is bound to the domain name (the first and second levels of the same domain name can be shared) and cannot cross domains. By default, it is saved in the memory and disappears after the browser is closed. You can set the expiration time. After the expiration time is set, it is saved in the hard disk and disappears when the time is up

Application Scenarios:

  • Save the account password;
  • Save the last login time information;
  • Save the page you viewed last time;
  • Browse the count

Defects: 1. Cookies will be sent with the request regardless of whether an address under the domain name is needed or not, resulting in a waste of performance; 2. It is transmitted between the client and server in plain text form, which is easy to be intercepted and tampered with; Cookies can also be read directly by JS with HttpOnly false

WebStorage

HTML5 introduces WebStorage, which is divided into two types. Both of them are stored as key-value pairs and the value is a string type. Therefore, json.stringify is used before passing objects

LocalStorage

Also for a domain name, the maximum capacity is 5MB, permanent storage. It only exists on the client side, avoiding performance problems caused by sending and possible interception. Encapsulated whole, through its localStorage exposure method for its operation, convenient.

Application scenarios
  • Large memory capacity, durable, so suitable for storing some stable content, large resources such as the official website logo;

  • LocalStorage can also transfer values between pages;

  • Page count;

  • After Vuex is refreshed, it will return to the initial state. Therefore, when Vuex data changes, it can be saved to the local Storage and retrieved after refreshing the page.

Operation mode:

let obj = {height:180 ,age: 18};
localStorage.setItem("name"."orange");
localStorage.setItem("info".JSON.stringigy(obj));// Stores the string type

// The corresponding value can be obtained under the same domain name
let name = localStorage.getItem("name");
let info = JSON.stringify(localStorage.getItem("info"));
Copy the code

More detailed examples

sessionStorage

The maximum capacity is 5MB. Only stored on the client; Encapsulation, via sessionStorage exposure method; These properties are almost the same as localStorage; But the essential difference between sessionStorage and sessionStorage is that when the session ends, sessionStorage disappears, it only exists temporarily for the duration of the session, it introduces a “browser window concept”, refresh and enter the same page, sessionStorage still exists, Once the window is closed it is deleted; Two Windows open the same page and their sessionStorage is different.

Application scenarios
  • Used to store form information, even if the page refresh will not lose the previous form information, used to store the current browsing history, if the page is no longer needed after closing.
  • Values can be passed between single-tab and same-origin pages
IndexDB

A relational database that runs in a browser and has the characteristics of the database itself; There are also some unique features: 1, key-value pair storage, 2, support for asynchronous I/O operations, 3, no cross-domain access to the database

XSS attack

Cross Site Scripting is a computer security vulnerability in Web applications. It allows malicious Web users to embed code into pages intended for use by other users. Essentially, malicious code is unfiltered, mixed with normal website code, and cannot be distinguished, causing browsers to execute malicious scripts.

Generally, the following operations can be done:

  • Stealing cookies
  • Monitor user behavior, such as stealing after inputting account password
  • Modify DOM forgery login form
  • Generate floating window ads in the page

XSS attacks can be implemented in three ways:

1, storage type

Attack steps:

  1. The attacker submits malicious code to the target website database
  2. The user opens the target site, and the site server takes the malicious code from the database, splices it into HTML, and returns it to the browser.
  3. The browser parses the HTML and executes malicious scripts
  4. Malicious scripts steal user data or impersonate user behavior

A common example is to submit a script in the comments section, and if the back and forth end does not do a good job of escaping it, it is stored in the server database and executed directly during the page rendering process.

2, reflex type

Attack steps:

  1. The user constructs a special URL that contains malicious code and induces the user to click
  2. The user clicks on the malicious URL, and the server takes the malicious code out of the URL, splices it into HTML and returns it to the client
  3. The browser parses the HTML and executes malicious scripts
  4. Malicious scripts steal user data or impersonate user behavior

Reflexive attacks are common in website searches, jumps, etc., by inducing users to click on malicious links, referring to malicious scripts as part of a web request, such as:

http://xxxx.com?s=<script>alert("hello")</script>
Copy the code

The server takes the parameter S, splices it into HTML, and returns it to the browser, which parses it as HTML, discovers it’s a script, and executes it;

The difference with storage type:

Stored malicious code stored in the server, persistent; Reflective malicious code exists in urls, nonpersistent

Of course, you might be wondering, why are users clicking on these malicious urls, or how did they get there?

There are many ways to do this, such as combining ClickJacking; By embedding a transparent iframe page that allows you to actively click

Type 3, DOM

The reason is that the front-end Javascript code is not rigorous enough to execute untrusted data as code; For example, insert untrusted data using innerHTML, outerHTML, document.write()

Measures to prevent

There are two main elements of XSS attack: the attacker submits malicious code; The browser executes malicious code

  1. Input filtering: Front-end filtering: Input is filtered by the front-end and then submitted to the back-end. The downside of this is that if an attacker bypasses the front end and constructs the request directly, it can submit malicious code. Back-end filtering: The back-end filters the input before writing to the database. Disadvantages: Since the backend data may be provided to the Web front end and app client, if the backend uses escapeHTML() to encode and filter the input content, it will be garbled when returned to the APP client for display. That is, the back end does not know where the content will end up, and different filtering methods are adopted in different cases.

  2. Pure front-end rendering: Used to prevent HTML injection. Process: The browser loads a static HTML without any business-related data, then executes Javascript in the HTML, loads the business data via Ajax, and calls the DOM API to update the page. Explicitly telling the browser whether to set text or style next makes it less likely that the browser will be tricked into executing unexpected code.

  3. Avoid DOM types: Use methods like innerHTML with caution, and try to use methods like textContent and setAttribute. DOM inline event listeners, Javascript methods such as eval, onclick, location, setTimeout, and the href attribute of the A tag can all execute strings directly. You should avoid passing untrusted data to these apis.

  4. Other solutions: Use CSP, namely the content security policy of the browser, which can (1) restrict the downloading of resources in other domains; (2) prohibit the submission of data to other domains; (3) provide a reporting mechanism to detect XSS attacks in time; Leverage the HttpOnly attribute of the Cookie

CSRF attack

The attack steps are as follows:

  • Users visit the target website (a.com) and retain their login credentials
  • Attackers lure users to click on links and enter malicious websites (b.com)
  • After the user clicks, b.com sends a request such as (a.com/act=xx) (can be achieved by image or script, etc., can also be achieved by automatically submitting a POST form); By default, the browser carries a Cookie from A.com
  • A.com execute act=xx, attack complete

Cross-site request forgery An attacker induces a target to click on its own website and then initiates a cross-site request using the user’s current login status. It could be one of the following:

1. Automatically send GET requests

When you click through to an attacker’s website, the site may contain a code that reads:

<img src="https://xxx.com/info?user=xxx%count=100">
Copy the code

Once you enter the page, you will automatically send a get request to XXx.com through the picture, which will be accompanied by the user’s Cookie information in XXx.com (assuming that the user has logged in xxx.com); If the server does not have an authentication mechanism, it may assume that the request is being sent by a normal user, and then the attacker can do all sorts of things

2. Automatically send POST requests

The attacker writes a form and submits it, also carrying the user’s Cookie information, but the server mistakenly thinks that the user is operating

3. Inducing clicks

There may be links on an attacker’s website that induce users to click, similar to the images that automatically send get requests.

Measures to prevent

  1. CSRF can be implemented because it carries the user’s Cookie information and performs operations disguised as the user. The SameSite property of the Cookie can be controlled: Stric, Lax, allows third parties to carry cookies only when submitting a form in the GET method or sending a GET request via the A tag.

  2. You can also verify the source site: By verifying the Origin and Referer fields in the request header, Origin only contains the domain name information, and Referer contains the specific path; However, both can be forged and modified in the AJAX custom request header, so they are less effective

  3. You can also use a CSRF Token. When the browser sends a request, the server generates a string and implants it in the returned page. The browser must carry this string when sending a request

  4. If a verification code is used, the user needs to enter the verification code to confirm the operation

Mark clearing algorithm and reference counting algorithm

Tag clearing algorithm: access starts from the root and marks all accessible objects; The garbage collection algorithm scans and reclaims all unmarked objects;

The downside: The tag-scavenging algorithm does garbage collection when the program is suspended and garbage collected when memory runs out, obviously causing the page to become unresponsive for a certain period of time

Benefits: No circular references; Regular operations on reference objects do not incur any additional overhead.

Reference counting: Tracks how many times each value is referenced;

Disadvantages: Circular references

What is the function of the same Origin policy?

Same-origin policy restrictions:

  • Read non-cognate cookies,localStorage, and indexDB
  • Gets the DOM of a non-homologous page
  • Make non-homologous Ajax requests

Function:

One and one correspond to the above:

  • Prevent malicious websites from obtaining cookies from other websites of users through JS
  • Prevent malicious websites from obtaining page information of other websites through iframe, manipulating DOM of other websites, and performing malicious operations
  • Prevent malicious websites from stealing users’ information on other websites

Cross domain implementation

This I will not say, many blogs summed up clearly, here is a recommended podcast:

Scenario cross-domain solution

References

How browsers work

Browser process

Browser cache