This is a very classic question, I believe that you have been interviewed or interviewed by someone has a very high probability of contact, or just a part of the question. This topic covers a lot of knowledge points, more comprehensive investigation, a search on the Internet there are hundreds of thousands of articles, different people have different views, but most of them are the same. If you haven’t done your research thoroughly and systematically, rote memorization of questions that are slightly targeted at one point, or asked in a different way, can give you away. Think carefully, learning accumulated to a certain stage, also should be based on the technical reserve of the knowledge system for a comprehensive sorting and summary.
Open topic, no fixed answer, involving computer graphics, operating system, compiler principle, computer network, communication principle, distributed system, browser principle and many different disciplines, fields. But no matter where you start, from a software perspective or a hardware perspective, it can be a long story to lay out. If you specialize in a subject area for many years, you must have more profound experience and unique insights in this area than I do.
From my point of view, I can’t answer the question directly if the meaning of the question is not clear enough, and there are no situations and limited conditions. In today’s computer more and more complex, any change and combination of conditions, are likely to produce tens of millions of possibilities, break the convention. As for the title itself, it may include but is not limited to the following conditions:
- Requested Resource Type
- Browser type and version
- Server type and version
- Network protocol type and version
- Network link status
- Through which intermediate devices
- LAN types and standards
- Physical media types
- Operator route
If static resources are requested, traffic may reach the CDN server. If the request is for a dynamic resource, the situation becomes more complicated and the traffic may go through proxy/gateway, Web server, application server, and database in sequence. Figure 1 shows the high availability deployment diagram of Aliyun Server Load BalB (SLB). It is different from the traditional active/standby switchover mode, which relies too much on the processing capacity of single machine. Traffic from the public network is forwarded to the LVS cluster (Layer 4 SLB) through the equal-cost multi-path Routing (ECMP) of the upper-layer switch. For TCP/UDP requests, the LVS cluster directly forwards the traffic to the back-end ECS cluster. For HTTP requests, they are forwarded to the Tengine cluster (Layer 7 SLB), which in turn forwards them to the back-end ECS cluster. Session synchronization and health check are implemented between clusters to ensure high availability.
With the continuous expansion of the service scale, to bear tens of millions or even hundreds of millions of traffic and massive storage, the system has higher requirements for multi-room DISASTER recovery and free capacity expansion. The system may evolve into a remote multi-live architecture (Figure 2). Different from the traditional DISASTER recovery design, multiple data centers provide services externally at the same time and ensure data consistency and integrity between remote units (CAP theory). The whole system architecture is divided into traffic layer, application layer, and data layer. The DNS technology is used to implement Global Server Load Balance (GSLB), so that users can access the system nearby. If the system in a region fails, all traffic requests are switched to another region to meet remote DISASTER recovery. This is similar to the overall architecture of Ele. me at the present stage.
However, different from dynamic resources, static resources are generally cached through CDN using intermediate servers in consideration of deployment and traffic costs. If no cache is hit, static resources are returned to the OSS (Object Storage Service) or private servers.
As a result, the real world situation is extremely variable, and it’s hard to imagine a GET request even triggering a bank transfer operation, depending on the human implementation. To return to the theme of this article, let’s eliminate all special conditions and simplify the problem if we only consider:
- A Chrome browser
- A Linux server
- Making an HTML request
- No caching and optimization mechanisms are considered
- Uses HTTP/1.1 + TLS/1.2 + TCP protocol
The process is as follows:
DNS Resolution Process
First, the browser initiates a request to the local DNS server. Since the local DNS server has no cache and cannot directly convert the domain name to IP address, it needs to initiate query requests to the root DNS server, top-level DNS server and authoritative DNS server in sequence by recursive or iterative query (Figure 3). Until an IP address or set of IP addresses is found, it is returned to the browser. Generally, local DNS addresses are dynamically assigned by ISPs (Internet Service providers) through DHCP. You can manually change them to public DNS addresses, such as 8.8.8.8 provided by Google. The domestic 114.114.114.114, which are distributed in different geographical locations, uses Anycast technology to route requests to the DNS server nearest to the user. To ensure accurate DNS resolution, the client must carry its own source IP address in the request packet. Otherwise, A DNS server similar to the GSLB cannot accurately determine the destination IP address nearest to the user.
HTTP Request Process
Since HTTP is based on a more readable text format, in addition to initiating HTTP directly from the browser, you can also use Telnet on the command line to establish a TCP connection with a specified port on the server to send headers and request entities in the format specified by the protocol. To view detailed packet contents, you can use the Wireshark or tcpdump command to capture packets on a network adapter. In the previous step, we obtained the server IP address through DNS resolution, and then the browser communicated with the server port 443 through system call Socket interface. The whole process can be divided into five parts: establishing a connection, sending HTTP requests, returning HTTP responses, maintaining the connection, and releasing the connection (Figure 4). The arrows in the figure may represent a TCP packet segment or a complete application-layer packet. In actual transmission, the packets are combined into one or fragmented into multiple TCP packet segments.
Establish a connection
- Before the connection is established, the server must be ready to accept the connection by calling socket, bind, listen, and accept functions to bind the public IP address, listen to port 443, and accept the request.
- The client uses the socket and connect functions to open the connection, send a packet to the server with the SYN flag bit, randomly generate an initial sequence number X, and attach additional information such as MSS (Maximum Segment Size). To avoid IP fragmentation at the network layer, which increases the probability of error loss and achieves the best Transmission effect, the MSS value is generally the Ethernet MTU value minus the IP header and TCP header size. Equals 1460 bytes.
- The server must acknowledge receipt of packets from the client, send packets with the SYN+ACK flag bit, and randomly generate an initial sequence number Y with an acknowledgement number x+1, along with additional information such as MSS. When one end receives the MSS value from the other end, the maximum TCP packet segment size is determined based on the minimum MSS value of the two ends.
- After receiving the packet from the server, the client sends the packet with the ACK bit and the ACK number y+1 to establish a TCP connection.
- If the client has not previously established a session with the server, a full TLS quad handshake is required. The Client sends a Client Hello packet to the server, which contains a random number, TLS protocol version, and a list of encryption suites arranged by priority.
- The Server sends a Server Hello packet to the client, which contains a new random number, TLS version, and a selected encryption suite.
- The server sends a Certificate packet to the client, which contains the server X.509 Certificate chain. The first Certificate is the primary Certificate, and the intermediate Certificate follows the primary Certificate in sequence. The root CA Certificate is usually embedded in the operating system or browser and does not need to be sent by the server.
- If the DH algorithm is used for Key Exchange, the Server sends a Server Key Exchange packet containing the DH parameters required for Key Exchange to the client. If the RSA algorithm is used for key exchange, skip this step.
- The Server sends a Server Hello Done packet to the client, indicating that all handshake messages have been sent.
- The Client sends a Client Key Exchange packet to the server. If the RSA algorithm is used for Key Exchange, the Client generates a pre-primary Key, encrypts it using the public Key in the server certificate, and contains the pre-primary Key in the packet. The server only needs to decrypt the pre-primary Key using its own private Key. If the DH algorithm is used for key exchange, the client contains its OWN DH parameters in the packets, and both parties calculate the same pre-primary key based on the DH algorithm. It should be noted that the key exchange is only the pre-master key, and this value needs further processing. The client and server are combined with two random number seeds, and both sides use PRF (Pseudorandom Function) to generate the same master key.
- The client sends a Change Cipher Spec packet to the server to indicate that a master key has been generated. The master key is used to encrypt messages symmetrically during subsequent transmission.
- The client sends a Finished Message to the server. This Message is Encrypted and displayed in the Wireshark as Encrypted Handshake Message. If the server can decrypt the packet content, the master key generated by the two parties is the same.
- The server sends a New Session Ticket packet to the client. Only the server can decrypt the Session Ticket packet, and the client saves the packet for quick Session recovery during the TLS re-handshake, reducing the round-trip delay.
- The server sends a Change Cipher Spec packet to the client, indicating that a master key has been generated. The master key is used to encrypt messages symmetrically during subsequent transmission.
- The server sent a Finished packet to the client. If the client can decrypt the packet, the master keys generated by the two sides are the same. At this point, all handshake negotiations are complete.
Sending an HTTP request
After establishing a secure encrypted channel, the browser sends an HTTP request. A request message consists of a request line, a request header, a blank line, and an entity (the Get request does not have one). The request header consists of the general header, request header, entity header and extension header. The general header indicates that both request packets and response packets can be used, for example, Date. The request header is meaningful only in the request message, which is divided into Accept header, conditional request header, security request header and proxy request header. Entity head acts on entity content, which is divided into content head and cache head. The extension header represents the user-defined header and is added with an X- prefix. Also note that HTTP request headers are case insensitive and encoded based on ASCII, whereas entities can be encoded in other ways, determined by content-Type.
Return the HTTP response
After receiving and processing the request, the server returns an HTTP response. A response packet is basically the same as a request packet, consisting of a response line, a response header, a blank line, and an entity. Unlike request headers, response headers have their own Set of response headers, such as Vary and set-cookie, which are shared by common, entity, and extended headers. In addition, the browser and the server must ensure the order of HTTP transmission, and the order of request/response in the queue they maintain must correspond one to one; otherwise, the order will be out of order and errors will occur.
Maintain connection
The server does not disconnect from the client immediately after completing an HTTP request. Connection: keep-alive in HTTP/1.1, Connection: keep-alive is enabled by default. It represents a persistent Connection, so that new requests can be processed in the near future without the need to re-establish a Connection, which increases the slow startup overhead and improves the throughput of the network. In reverse proxy software Nginx, the default value of the persistent connection timeout is 75 seconds. If no new request arrives within 75 seconds, the client is disconnected. In addition, the browser sends a TCP keep-alive detection packet to the server every 45 seconds to check the TCP connection status. If no ACK response is received, the browser disconnects from the server. Note that HTTP keep-alive and TCP keep-alive are both alive mechanisms, but they are completely different. One works at the application layer and the other at the transport layer.
disconnect
- The server sends an Alert packet to the client. The type of the Alert packet is Close Notify. The Alert packet notifies the client that it will no longer send data and is about to Close the connection.
- The server actively closes the connection by calling the close function and sends the packet with the FIN flag bit and serial number M to the client.
- The client acknowledges receiving the packet and sends the packet with the ACK flag bit to the server. The acknowledgement number is M +1.
- After sending all data, the client sends the packet with the FIN flag bit and the serial number n to the server.
- The server acknowledges receipt of the packet and sends the packet with the ACK flag bit and serial number N +1 to the client. The client enters the CLOSED state immediately after receiving the confirmation group. At the same time, the server enters the CLOSED state after waiting for two Maximum Segment Lifetime (MSL).
Browser parsing process
Modern browser is an extremely large software, in some extent even no less than an operating system, it is by multimedia support, graphics display, GPU rendering, process management, memory management, sandbox mechanism, storage system, network management and other large and small hundreds of components. While developers don’t need to worry about the underlying implementation details when developing Web applications, they can present rich content simply by handing over the page code to the browser for calculation. But page performance is not just about how the browser is implemented, it is also about the developer’s level of familiarity with the tools, and code optimization is endless. Obviously, understanding the fundamentals of browsers, W3C technical standards, and network protocols is very helpful in designing and developing a high-performance Web application.
When we use Chrome, the engine behind it is Google’s open source Chromium project, and the core of Chromium is the Blink rendering engine (based on Webkit) and JavaScript engine V8. Before explaining how browsers parse HTML files, let’s take a quick look at Chromium’s multi-process multithreaded architecture (Figure 5), which includes multiple processes:
- A Browser process
- Multiple Renderer processes
- A GPU process
- Multiple NPAPI Render processes
- Multiple Pepper Plugin processes
And each process contains several threads:
- One main thread
- In the Browser process: Render the updated interface
- In the Renderer process: parse the render update interface using the held kernel Blink instance
- One IO thread
- In the Browser process: handles IPC traffic and network requests
- In the Renderer process: Handles IPC communication with the Browser process
- A dedicated set of threads
- A common thread pool
Chromium supports many different ways to manage Renderer processes, not just for each Tab page opened, but also for iframe pages, and each Renderer process is a separate sandbox, isolated from each other.
- Process-per-site-instance: starts a Process per domain name, and new pages opened from a page link share a Process (except for the noopener attribute), which is the default mode
- Process-per-site: starts a Process for each domain name
- Process-per-tab: starts a Process for each TAB page
- Single process: All pages share a Single process
When the Renderer process needs to access network request modules (XHR, Fetch) and Storage systems (synchronous Local Storage, synchronous Cookie, asynchronous Cookie Store), The RenderProcess global object is called to establish an IPC channel between the IO thread and the RenderProcessHost object in the Browser process. The underlying IPC channel is implemented through socketPair. Because of this mechanism, Chromium can better manage resources and schedule resources in a unified manner, effectively reducing network and performance overhead.
Main process
Page parsing takes place in the Renderer process, which reads up to 8KB at a time from the network buffer by receiving and parsing the HTML content through the Blink instance held in the main thread (Figure 6). The browser parses THE HTML content line by line from top to bottom, and builds a DOM tree through lexical analysis and syntax analysis. When an external CSS link is encountered, the main thread calls the network request module to asynchronously acquire the resource, and continues to build the DOM tree without blocking. After the CSS is downloaded, the main thread parses the CSS content at the appropriate time, and builds the CSSOM tree through lexical analysis and syntax analysis. The browser combines DOM tree and CSSOM tree to construct the Render tree, and calculates the layout attributes, geometric attributes of each Node and its position in the coordinate system, and finally draws and displays it on the screen. When an external JS link is encountered, the main thread calls the network request module to obtain resources asynchronously. As JS may modify the DOM tree and CSSOM tree, resulting in backflow and redrawing, the DOM tree construction is blocked. But the main thread does not hang, and the browser uses a lightweight scanner to find external resources that need to be downloaded later, making network requests ahead of time, while internal resources such as document.write are not recognized. When the JS download is complete, the browser calls the V8 engine to parse and compile the JS content in the Script Streamer thread, and execute it in the main thread (Figure 7).
Rendering process
Once the DOM tree is built, it goes through several transformations, and they have multiple intermediate representations (Figure 8). The layout is computed, the drawing style is converted to a RenderObject tree (also known as a Render tree). This is then converted to a RenderLayer tree. When the RenderObject has the same coordinate system (canvas, absolute), they are merged into a RenderLayer, which is synthesized by the CPU. The RenderLayer is then converted to a GraphicsLayer tree. When the RenderLayer meets compositing layer conditions (such as transform, known as hardware acceleration), it gets its own GraphicsLayer, otherwise it merges with the parent node, which is also synthesized by the CPU. Finally, each GraphicsLayer has a GraphicsContext object that is responsible for drawing layers into bitmaps as textures to be uploaded to the GPU, which is responsible for synthesizing multiple textures to be displayed on the screen.
In addition, to improve rendering performance, the browser has dedicated Compositor threads that handle layer composition (Figure 9) and some interaction events (such as scrolling, touching) that respond directly to UI updates without blocking the main thread. The main thread synchronizes the RenderLayer tree to the Compositor thread, which starts multiple Rasterizer threads for rasterization, converting vertex data into tiles in the visual area, and delivering it to the GPU for final composite rendering.
Page life cycle
The page goes through multiple state changes and event notifications from request initiation to jump, refresh, or close, so it’s important to understand the lifecycle of the process. The browser provides Navigation Timing and Resource Timing apis to record the event time of each Resource. You can use this API to collect RUM (Real User Monitoring) data. Send to the back-end monitoring service to analyze the page performance to continuously improve the user experience. Figure 10 said HTML resource loading process of the event log, and the middle yellow part said other resources (CSS, JS, IMG, XHR) load event recording process, they all can be invoked by the window. The performance. However, () to obtain specific indicators data.
There are many ways to measure the performance of a page, but what gives the user an immediate sense of when the page is rendered, interacted, and loaded. The DOMContentLoaded event indicates that the DOM tree has been built and all nodes of the DOM tree can be safely accessed, binding events, and so on. The load event indicates that all resources have been loaded, images, backgrounds, and content have been rendered, and the page is in an interactive state. But so far, the browser has not been able to fully control the state of the application as Android and iOS apps do, reallocating resources and using memory properly when switching between front and back. In fact, modern browsers are already doing this, and since Chrome 68 has provided the Page Lifecycle API, defining a new browser Lifecycle (Figure 11) that will allow developers to build better applications.
You can now monitor the state changes triggered by page transitions and user interactions by binding all lifecycle listening events to Window and Document (Figure 12). However, the developer can only sense when an event occurs and cannot directly capture the page STATE at a given moment (STATE in Figure 11). Even so, using this API, you can make your script perform a task or interface UI feedback at the right time.
conclusion
There is not enough space in this article to be a comprehensive interview guide. It only covers some of the key paths and knowledge points. There are also many areas that are not analyzed in depth in this paper. First, they are too far from the theme, and second, they are limited in personal ability, such as coding rules, encryption algorithms, summarization algorithms, routing algorithms, compression algorithms, protocol standards, caching mechanism, V8 principle, GPU principle, etc. If you want to thoroughly understand the technical details and principles of each part, you can consult other books, materials, and source code.
⚠️ Finally, pay attention to prevent taking off.
reference
- UNIX Network Programming Volume 1: Socket apis
- UNIX Network Programming Volume 2: Interprocess Communication
- The Definitive GUIDE to HTTP
- HTTPS authoritative Guide
- Inside WebKit technology
- TCP/IP Volume 1: protocol
- Load balancing infrastructure
- Load balancing Four SLB high availability layers
- Database remote live solution
- Dissecting TLS Using Wireshark
- How Browsers Work
- Multi-process Architecture
- How Blink works
- Life of a Pixel 2018
- Multi-process Resource Loading
- Process Models
- How Chromium Displays Web Pages
- Threading and Tasks in Chrome
- V8 Background Compilation
- JavaScript engine fundamentals
- Compositor Thread Architecture
- GPU Accelerated Compositing in Chrome
- Page Lifecycle API
- Web Fundamentals