Browser from the input url to the display of the web page is generally divided into the following steps:

  1. DNS resolution: Resolves a domain name into an IP address
  2. TCP connection: TCP three-way handshake
  3. Sending an HTTP request
  4. Server redirect
  5. The server processes the request and returns HTTP packets
  6. The browser parses the rendered page
  7. Disconnect: TCP wave four times

What url was entered

URI: Uniform Resource Identifier identifies a Resource

URL:Uniform Resource Location Specifies the Uniform Resource locator, which provides the path for finding resources

URN: Universal Resource Name Specifies the Name of the Universal Resource that identifies the Resource with the Name of the specific namespace

Urls and UrNs are both subsets of URIs

I’m going to analyze urls. A very common URL is www.baidu.com/ in the format of protocol/domain name

Url format standard is: scheme: / / host. Domain: port/path/filename

  • Scheme: Defines the types of Internet services. The most common types are HTTP, HTTPS, FTP, and file
  • Host: domain host (WWW is the default host for HTTP)
  • Domain – Defines Internet domain names, such as baidu.com
  • Port – Defines the port number on the host (the default HTTP port number is 80)
  • Path – Defines the path on the server (if omitted, the document must be at the root of the site)
  • Filename – Defines the name of the document/resource

Domain name Resolution (DNS)

The browser cannot directly find the server using the domain name but the IP address. Resolve the domain name to an IP address before finding the server using the IP address.

The DNS provides the function of searching IP addresses by domain names or reverse-searching domain names from IP addresses.

Procedure For querying IP addresses by browser

  1. Read the browser cache first. The browser caches DNS records at a certain frequency
  2. View the local hosts file
  3. System cache
  4. Read the router DNS cache
  5. Local DNS Server
  6. DNS root server. If the local DNS server cannot be found, it sends a recursive query request to the root server (DNS server first asks the IP address of the root DNS server.

Optimize performance by using DNS preresolution

  1. Use meta information to tell the browser that the current page is to be pre-resolved by DNS

    <meta http-equiv="x-dns-prefetch-control" content="on" />
    Copy the code
  2. Use the link tag in the page header to force DNS pre-resolution

    <link rel="dns-prefetch" href="http://bdimg.share.baidu.com" />
    Copy the code

TCP connection (three-way handshake)

After finding the IP address of the server, the browser initiates a TCP three-way handshake to synchronize the serial number and confirmation number of the client and server, and exchange TCP window size information before sending data.

Three handshake process

  1. The client sends a packet with SYN=1, Seq=X to the server port
  2. The server sends back a response with SYN=1, ACK=X+1, Seq=Y to confirm that the client is capable of sending data.
  3. The client sends back a packet with ACK=Y+1 and Seq=Z, indicating that the handshake ends

The purpose of the three-way handshake is to prevent the invalid connection request segment from suddenly being sent to the server and causing an error.

Sending an HTTP request

After the TCP three-way handshake is complete, HTTP request packets are sent

A request message consists of a request line, a request header, a blank line, and a request body

  • The request line
GET/API/user/detail HTTP / 1.1Copy the code

The request line contains the request method, URL, and protocol version.

  • Request header

    The request header contains a lot of useful information about the client environment and the request body

    Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate Accept-Language: zh-CN,zh; Q = 0.9 Connection: keep alive - cookies: SESSION = ZmZkMDVkNzktMGMwZS00NTNkLTk3Y2MtZGUxNDA2MTY3MDBk Host: icsapi.aegs.ft.ztosys.com Origin: http://ics.aegs.ft.ztosys.com Referer: http://ics.aegs.ft.ztosys.com/ User-Agent: Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_15_7)Copy the code
  • A blank line

    A blank line between the request header and the request body indicates that the request header has ended, followed by the request body

  • Request body

    If the method field is GET, this item is empty and has no data

    If the method field is POST, this is usually where the data is to be submitted

    productId=327&pageNum=1&pageSize=30
    Copy the code

Server redirect

Redirects are common: temporary redirects and THE conversion of HTTP requests to HTTPS requests during site maintenance or downtime

Redirect Type: Permanent redirect: 301, 308 Temporary redirect: 302, 303, 307 Special redirect: 300, 304

Expand the difference between 1:301 and 302

The 302 redirect is only a temporary redirect, the search engine will grab the new content and keep the old address, because the server returns 302, so the search engine considers the new URL to be temporary. The 301 redirect is a permanent redirect, the search engine in the crawl new content at the same time will be replaced by the old URL to redirect after the URL.

Extension 2: Approach to front end redirection

  1. HTML redirect
 <meta http-equiv="Refresh" content="0; URL=http://example.com/" />
Copy the code
  1. JavaScript redirection mechanism window.location

The server processes the request and returns HTTP packets

The server receives the request, processes it, and returns its processing result, which is an HTPP response.

A response packet consists of three parts: request line, header, and body

  • The status line

    The status line is described by the protocol version, numeric status code, and corresponding status. Each element is separated by a space.

    HTTP / 1.1 200 OKCopy the code

    Status code:

    1xx: information status code: indicates that the server has received the client request and the client can continue to send the request.

    2xx: success status code, which indicates that the server has successfully received and processed the request.

    3xx: redirection status code, indicating that the server requires client redirection.

    301: Moved Permanently redirected Permanently

    302: Found Temporary redirection

    The server content is Not updated and can be read directly from the browser cache

    4xx: indicates the client error status code, indicating that the client request contains invalid content.

    400: Bad Request Indicates that the client Request has syntax errors and cannot be understood by the server

    401: Unauthonzed indicates that the request is unauthorized. This status code must be used with the WWW-Authenticate header field

    403: Forbidden Indicates that the server receives a request but refuses to provide the service. Usually, the reason for not providing the service is given in the response body

    404: Not Found The requested resource does Not exist, for example, an incorrect URL was entered

    5xx: indicates the server error status code. It indicates that an unexpected error occurs because the server fails to properly process requests from clients.

    500: Internel Server Error Indicates that an unexpected Error occurs on the Server. As a result, the request from the client cannot be completed

    503: Service Unavailable Indicates that the server is currently unable to process requests from clients. After a period of time, the server may return to normal

  • In response to the head

    The response header contains additional information about the response packet and consists of a name/value pair

    Access-Control-Allow-Credentials: true Access-Control-Allow-Origin: http://ics.aegs.ft.ztosys.com Cache-Control: no-cache, no-store, max-age=0, must-revalidate Connection: keep-alive Content-Length: 240 Content-Type: application/json; charset=UTF-8 Date: Sat, 20 Feb 2021 02:22:39 GMT Expires: 0 Pragma: no-cache Vary: Origin Vary: Access-Control-Request-Method Vary: Access-Control-Request-Headers X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=blockCopy the code
  • In response to the body

    The server returns specific data for the client

The browser parses the rendered page

When the browser receives the HTML file, it begins parsing the HTML, even though the HTML file has not been transferred. The process of parsing is from the top down, in order. HTML corresponds to the generation of DOM, CSS corresponds to the generation of CSSOM, and finally generates the render tree. Then calculate according to the layout, and finally draw on GPU.

Ps: During the DOM tree construction, if JS is encountered, the DOM tree and CSSOM tree construction will be blocked, and JS files will be loaded first. After loading, the DOM tree and CSSOM tree will be constructed again.

The rendering steps are as follows:

  1. Parse out the DOM tree from the HTML

    DOM tree parsing is a depth-first traversal. That is, all children of the current node are built before the next sibling node is built.

  2. Generates a CSS rule tree based on CSS parsing

  3. Combine DOM tree and CSS rule tree to generate render tree

After the DOM tree and CSS rule tree are all ready, the browser starts building the render tree.

  1. Layout calculation

    The location and size of each render object are calculated from the render object information in the render tree

  2. Draw the page based on the calculated information

    After the layout calculation is complete, the browser renders the elements on the page. After being processed by the rendering engine, the entire page is displayed

    Redraw: Attributes such as the background color, text color, etc. of an element that do not affect the layout around or inside the element will only cause the browser to redraw.

    Backflow: If the size of an element changes, the render tree needs to be recalculated and re-rendered

    Disconnect: TCP wave four times

    When data transfer is complete, you need to disconnect the TCP connection and initiate the TCP wave four times

    1. The browser sends a FIN to disable data transfer from the browser to the server, and the browser enters the FIN_WAIT_1 state.

    2. After receiving the FIN, the server sends an ACK to the browser. The ACK sequence number is +1 (the same as SYN, one FIN occupies one sequence number). The server enters CLOSE_WAIT state.

    3. The server sends a FIN to disable data transfer from the server to the browser, and the server enters the LAST_ACK state.

    4. After the browser receives the FIN, it enters the TIME_WAIT state and sends an ACK to the server. The server enters the CLOSED state and waves four times.

    ACK: Used to acknowledge the received data, which is represented by the acknowledgement sequence number.

    SYN: Used as a synchronization signal for establishing a connection

    FIN: Indicates that no data needs to be sent, which usually means that the established connection needs to be closed.

    Why is it three handshakes to establish a connection, but four waves to close it?

    This is because in LISTEN state, the server receives a SYN packet for establishing a connection and sends the ACK and SYN packets to the client. And close the connection, when I received the other side of the FIN message just said to each other can no longer send data but also receives the data, the board may not all data are sent to each other, so their can immediately close, also can send some data to each other, then send the FIN message now agreed to close the connection to the other side, therefore, Your ACK and FIN are usually sent separately.