This is the 14th day of my participation in the August Text Challenge.More challenges in August

preface

Browser questions are an important part of the front-end foundation interview. Most of you, like me, can only answer a few points piecemeal and can’t connect the points in a series. It’s extremely easy to Pass the basic points.

Today, and everyone to pull a good pull a detail, as soon as possible to become a face bully.

(This article will continue to be updated at ๐Ÿ’•. If there are any omissions, please comment at โ›.)

Threads and processes

Since 2007, the browser has abandoned the single-process architecture and adopted the multi-process architecture mode, which has greatly improved stability, smoothness and security. What is the process? A process is the smallest unit of resource allocation and is used to start and manage threads. So what is a thread? A thread is the smallest unit of CPU scheduling, used to execute tasks and attached to a process. Zhihu biaodianfu’s analogy is very clever: process = train, thread = carriage.

Thread and process summaries have the following features:

  1. Processes and threads areMore than a pair ofThread needs to execute under process.
  2. Any thread that fails to execute willitselfProcess, causing the entire process to crash.
  3. Resources can be shared between threads. Processes do not affect each other, isolated from each other, communication can be usedIPCMechanism.
  4. When a process is shut down, the operating systemrecyclingThe memory occupied by the process.
  5. Processes can use memory addresseslockedTo ensure that only one thread occupies memory; And memory addresses can limit usage.

Multi-process architecture

The latest Chrome multi-process architecture consists of one main browser process, one GPU process, one web process, multiple renderers, and multiple plug-in processes. Figure from “Browser working Principles and Practices” Li Bing ๐Ÿ‘

Browser main process: responsible for interface display, user interaction, sub-process management, storage and other functions.

Renderers: Turn HTML, CSS, and JavaScript into user-friendly pages. The layout engine (Blink) and JavaScript (V8) are both in this process. Each Tab page (different root domain) has its own unique rendering process.

GPU process: In the early days, there was no GPU process. The original intention of GPU is to render 3D CSS, and the subsequent slowly evolving UI interface is drawn by the current process.

Web process: responsible for loading web resources on a page. Originally a module of the main browser process, it is now a separate process.

Plug-in process: Responsible for running plug-ins. Because plug-ins are prone to crash, they are isolated through the plug-in process to ensure that the main process is not affected.

Note the sandbox in the bottom right corner of the figure, which locks the operating system to prevent plug-ins or scripts from reading or tampering with hard disk data. This is why the multi-process architecture makes the browser more secure.

From entering URL to page display

Browser process knowledge popularization end ~, into the topic.

User input

Scheme :[//authority]path[? Query][#fragment]

When a user enters a URL in the address bar and presses Enter, the main process of the browser determines whether the entered content complies with the URL rules.

If the URL rules are met, integrate the URL and the corresponding protocol header (HTTP or HTTPS) to form a complete URL for URL request. Otherwise, concatenate the default search engine + text for search.

The URL request

The browser main process sends the URL request to the network process via IPC, and goes through DNS resolution -> TCP connection -> build request -> parse response data.

The DNS

The following process, as long as there is a step to find the domain name to get the IP, you can end the query.

  1. Query whether the browser cache existsThe domain nameThe resolvedThe IP address.
  2. Check localhostsCheck whether the file has a mapping between domain names and IP addresses.
  3. Check localDNS resolverWhether the cache has a mapping between domain names and IP addresses.
  4. Check localThe DNS serverWhether the cache has a mapping between domain names and IP addresses.
  5. Local DNSThe server is based on the configured forwarder (Not using forward mode/using forward mode).

In unused forwarding mode, the local DNS server sends requests to the root DNS server. The root DNS server queries the top-level DNS server based on the domain name and returns its IP address to the local DNS server. The local DNS server contacts the top-level DNS server. If the top-level DNS server cannot resolve the DNS server, the local DNS returns the IP address of the next-level DNS server to the local DNS. In this way, the local DNS iterates until the domain name is found.

In forwarding mode, the local DNS server sends a request to the upper-level DNS server for resolution. If the upper-level server fails to resolve the request, the local DNS server sends a request to its upper-level server for recursive query.

A TCP connection

After obtaining the IP address, the network process checks whether the URL request carries a port number. If yes, the network process parses the PORT number from the URL and creates a new socket to initiate a TCP connection request. Tips: HTTP default port 80, HTTPS default port 443.

A TCP connection is established through three handshakes to check whether the sending and receiving capabilities of the two parties are normal. Initial state: The client is in the Closed state and the server is in the Listen state.

  1. The client sends a TCP packet with the SYN=1 flag and the initial ISN(C) sequence number to the server. After the TCP packet is sent, the client is in the SYN_SENT state and waits for confirmation from the server.
  2. After receiving the TCP packet with SYN=1 from the client, the server sends the TCP packet with SYN=1, ACK=1 ACK= ISN(C) + 1, and the initial SEQUENCE number ISN(S) to the client. After the packet is sent, the server is in the SYN_RCVD state.
  3. After receiving the packet, the client checks whether the ACK field is ISN(C)+1 and the ACK field is 1. If yes, the client sends the TCP packet with ACK=1 and ACK= ISN(s)+1 to the server. After receiving the packet from the client, the server checks whether the ACK is ISN(s)+1 and the ACK is 1. If yes, the connection is established successfully. Enter the client and serverESTABLISHEDStatus, data transfer begins.

Q: Why not shake hands twice? A: To prevent invalid connection request packets from being sent to the server, the server wastes resources.

Build request

After a TCP connection is successfully established, the network process constructs a request header (carrying identity authentication information such as cookies) and sends the request header to the server. Before sending the request, the browser cache is queried to see if there is a file to request, and if there is, the request is intercepted, a copy of the resource request is returned, and the request is terminated.

The browser cache has a strong cache and a negotiated cache. The requested resource determines whether the strong cache is matched and then whether the negotiated cache is matched.

Strong Cache: Determines whether a strong Cache is matched based on the Expires and cache-Control fields in the header of the local Cache resource. If a strong Cache is matched, the resources in the Cache are directly used. The cache-Control field takes precedence over the Expires field. When the negotiation cache is entered in the following cases, the priority decreases from high to low.

  1. Cache-Control: no-cache
  2. Cache-control: Max – age = XXX.To compareThe current time is the same as the last status code 200Time differenceIf the value exceeds the set value, the negotiation cache is entered. No, hit strong cache.
  3. There is no cache-control, and the Expires value is compared to the time difference. Same rule as above.

Negotiation cache: If the strong cache is not hit, the browser sends a request to the server. The server matches last-modify/if-modify-since and ETag/ if-none-match in the header. To determine if the cache is hit. If it hits, 304 is returned, telling the browser that the resource is not updated and that the local cache is available.

Last-modify: Indicates the Last modification time. This field is added to the response header of the browser when it first sends a request to the server. After the browser receives the request, the next request header carries the if-modify-since field, whose value is the last modification time sent from the server. The server compares the if-modify-since field in the request header with the last modification time of the server resource:

  • If the time is shorter than the last modification time, update the resource and return a new resource.
  • If the value is greater than or equal to the last modification time, return 304 and use cache.

ETag: Generates a unique identifier for the file based on the current file contents. This value will change as soon as changes are made. The server returns the value to the browser via the response header. The next time the browser asks, the value is assigned to the if-none-match field in the request header. The server receives the if-none-match field, which is compared to the ETag of the resource.

  • If they are different, update them and return a new resource.
  • Otherwise, return 304 and use cache.

Last-modify performance is better than Etag. Etag accuracy is higher than last-modify. If they both exist, the server preferentially supports ETag.

Parsing response data

The network process receives the response header and response information and parses the response content. If the status code in the response header is 301 or 302, the redirected URL is obtained from the Location field of the response header, and a new HTTP or HTTPS request occurs again. (301 is a permanent redirect, 302 is a temporary redirect.) If 200 continues to process the response data, the browser process also caches the response data, reducing the server burden and speeding up the next user access speed. Determine whether a text/ HTML file or an application/ OCtet-stream is returned based on the Content-Type field. If the stream is downloaded, it is submitted to the browser’s main process’s download manager, otherwise it enters the renderer process.

Rendering process

If the new page has the same root domain name as other pages of the TAB page, the reuse process; Otherwise, create a separate renderer. After the renderer is ready, it cannot enter the document parsing state because the document data is still in the network process and has not been submitted to the renderer, so the next step is to submit the document.

Submit the document

When the browser main process receives the response data from the web process, it sends a “submit document” message to the renderer process. The renderer receives a “submit document” message and sets up a “pipeline” with the network process to transfer data. After the document transfer is complete, the renderer process returns a “confirm submission” message to the browser process. When the browser process receives a “submit confirmation” message, it updates the browser interface state, including the security status, the URL of the address bar, the history of forward and backward status, and updates the Web.

Rendering phase

The document is submitted, the renderer process begins page parsing and child resources are loaded. At this point, a complete page is generated.

conclusion

Computer network knowledge must avoid rote memorization, more is to understand ๐Ÿ’œ. You can like it or bookmark it, read it a few times, and it always makes sense.

It has been updated for the first time (2021/09/02 0:53) ๐Ÿ˜ด, refining the URL request process, and will refine the rendering page process tomorrow.

reference

What is the difference between a thread and a process from entering a URL to seeing a page? Analysis of the process from entering a URL to seeing a page