preface

  • A common interview question is: What happens when the browser goes from URL input to page presentation? This is one of the classic front-end interview questions, and here’s a guide to understanding exactly what the interviewer is asking you to answer.

An overview of the process

Enter the URL --> DNS resolution --> browser cache --> TCP three-way handshake --> send HTTP request --> HTTP request response --> browser resolution render page --> TCP four-way waveCopy the code

What is theURL

  • URL (Uniform Resource Locator)Unified Resource Locator is used to locate resources on the Internet, commonly known as urls. Follow the grammar rules:scheme://host.domain:port/path/filename
    1. Scheme – Type of Internet service. Common protocols include HTTP, HTTPS, FTP, and File. The most common type is HTTP, and HTTPS is used for encrypted network transmission.

    2. Host – Domain host (the default host for HTTP is WWW).

    3. Domain – Internet domain name, such as juejin. Cn.

    4. Port – Specifies the port number on the host. The default port numbers are HTTP /80 and HTTPS /443.

    5. Path – The path on the server (if omitted, the document must be at the root of the web site).

    6. Filename – The name of the document/resource.

DNSDomain name resolution

  • We do not know which server to visit from the URL, so we need to get the corresponding IP Address (Internet Protocol Address, short for IP Address) through the DNS domain name resolution.

  • DNSAnalysis of the general process:

    1. The browser first checks its cache to see if the domain name has been resolvedIPAddress (inchromeEnter a value in the browser address boxchrome://appcache-internals/You can seechromeThe local cache address of.

    1. If there is no match in the browser cache, the browser checks the operating system cache for parsed results. The operating system also has a domain name resolution process. (On Windows, this can be done in a file called hosts on drive C. If you specify an IP address for a domain name, the browser will use that IP address first. Even if parsing fails, subsequent steps are not continued.

    2. This server is usually located somewhere in your city, not far away from you, and it’s a very good server. It caches the resolution results, and about 80% of the resolution is done here.

    3. If LDNS is still not hit, the Root Server is directly jumped to request resolution.

    4. The root DNS Server returns to the LDNS the address of the primary DNS Server (gTLD Server, such as.com.

    5. The LDNS then sends the request to the gTLD Server returned in the previous step.

    6. The gTLD Server that accepts the request finds and returns the address of the Name Server corresponding to the domain Name, which is the domain Name Server registered for the site.

    7. The Name Server finds the target IP address based on the mapping table and returns it to the LDNS.

    8. LDNS caches this domain name and the corresponding IP.

    9. The LDNS returns the resolution result to the user. The user caches the resolution result to the local system cache based on the TTL value. The domain name resolution process is complete.

Browser cache

  • The browser cache is divided into:
    • Strong cache (if the resource exists and is valid to read directly from the local disk cache, no connection is required).

    • Negotiation cache (the server determines whether the data is updated according to last-Modified + ETAG. If the data is updated, 200 + new data will be returned; otherwise, 304 will be returned to read the local resource).

Strong cache

  1. Expires (Sets an expiration time)

    • ExpiresHTTP / 1.0Represents the expiration time of the cache resource. If the system time is less than that, the request will not be sent.Since the system time can be changed, changing the time may not meet expectations.
  2. Cache-Control

    • Cache-ControlHTTP / 1.1New field, mainly used to control the web page cache. Contains the following configurable values:
      • Public: The response content is cached by the client and proxy server.

      • Private: The response content can only be cached by clients.

      • Max-age: indicates the validity period of the cached response, that is, the response will expire in xx seconds.

      • S-maxage: indicates that the response cached on the proxy server becomes invalid after XXX seconds. The priority of s-maxage is higher than that of max-age.

      • No-store: indicates that no cache is used for the response content of the server.

      • No-cache (this command is used to negotiate cache) : Indicates that the client sends a request to the server to verify whether the cache content is updated before accessing data from the secondary cache. If yes, the server returns a new response packet.

  3. Expires versus cache-control

    • Expireshttp1.0The product ofCache-Controlhttp1.1Of the product.ExpiresIn fact, it is an outdated product, at the present stage of its existence is only a compatibility writing method.
    • If both exist together,Cache-ControlPriority overExpires.

Negotiation cache:

  • When a client requests a resource again, it sends a request to the server to verify the validity of the current resource.
  1. Last-modified (to determine whether data is cached or not based on the file modification time)

    • When the browser accesses the resource for the first time and the server returns the resource, the last-Modified field is added to the response header. The value is the Last modification time of the resource on the server. The browser caches the file and header after receiving the resource.

    • Because of the no-cache directive, the browser sends a request to the server to verify whether the cache has been updated. An if-Modified-since field is added to the request message, and the value is last-Modified in the previous cache identifier.

    • When the server receives the request, it compares the value in if-modified-since with the time when the resource was last Modified in the server.

    • If there is no change, an empty packet with status code 304 is returned, and the client reads the data directly from the cache.

    • If the time of if-modified-since is less than the time of the last modification of the resource on the server, the file has been updated, and the new resource with status code 200 is returned.

    • Finally, the browser caches the new response message and the corresponding cache id.

  2. Etag method (to cache data based on whether the contents of the file have been modified)

    • The Etag field is a unique identifier (generated by the server) that is returned to the current resource file when the server responds to a request. The Etag is regenerated whenever the resource changes.

    • The process is similar to last-Modified, except that the comparison of time is changed to the comparison of the unique identifier of the resource.

    • The server adds an Etag field to the first response packet, which stores the unique identifier of the current resource file.

    • The next time the browser sends a request, it adds the if-none-match field to the request header. The value of the if-none-match field is the previous Etag value.

    • The server compares the value of this field to the ETag of the current resource file upon receiving the request.

    • If they are the same, the resource is not updated and an empty packet with the status code 304 is returned. Indicates that the browser retrieves data from the local cache.

    • If not, a new resource with status code 200 is returned.

  3. Last-modified compared with Etag

  • Etag is superior to Last-Modified in accuracy. In the last-Modified method, if the server modifies the file at an imperceptible time, the last-Modified modification does not actually reflect the modification, and the client still retrieves the old data.

  • In terms of performance, Etag is inferior to Last-Modified, which only records the time, whereas Etag requires the server to compute a hash value through an algorithm.

  • The server verification takes precedence over the Etag.

Caching mechanisms

  • The strong cache takes precedence over the negotiated cache. That is, the strong cache is matched before the negotiated cache is matched.

  • Strong cache (ExpiresCache-Control) if it takes effect, cache is directly used instead of negotiation cache.

  • Cache-ControlValues are not forno-storeAnd if the strong cache is not hit, the negotiation cache (Last-Modified / If-Modified-SinceEtag / If-None-Match).

    • Negotiation cache It is up to the server to decide whether to use the cache. If the negotiation cache is invalid, the request cache is invalid and returns200, returns the resource and cache id, and stores it in the browser cache. If effective, return304Continue to use caching.

TCPThree handshakes establish a connection

  • Purpose: To prevent an error from occurring when an invalid connection request segment is suddenly sent to the server.

  • The process is as follows:
    • First handshake: The client sends a packet with SYN=1, Seq=X to the server port.
    • Second handshake: The server sends back an acknowledgement packet with SYN=1, ACK=X+1, Seq=Y.
    • Third handshake: The client sends back a packet with ACK=Y+1 and Seq=Z, indicating that the handshake is over.

sendHTTPrequest

  • aHTTPThe request message is sent by the request line (request line), request header (headers), blank line (blank line) and request data (request body)4It’s made up of parts.
  1. The request line contains the request method,URLProtocol version
    • The request method contains8A:GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, TRACE.HTTP request methods

    • The URL is the requested address, which is specified by < protocol > : //< host > : < port >/< path >? < parameters > composition.
    • The protocol version ishttpThe version number.Major version. Minor versionThe common ones areHTTP / 1.0,HTTP / 1.1HTTP / 2.0.
    POST /detanx HTTP/1.1 // Request lineCopy the code
  2. Request header
    • The request header contains additional information about the request and consists of a keyword/value pair, one on each line, with the keyword and value colons":"Space.
    • Request header/response header reference table
  3. The request data
    • It can hold data for multiple request parameters, including carriage returns, newlines, and request data. Not all requests have request data, such as somegetRequest (to get a list of all countries, etc.)
POST /detanxHTTP/1.1 // request line // request header Host: localhost user-agent: Mozilla/5.0 (Windows NT 5.1; The rv: 10.0.2) Gecko / 20100101 Firefox / 10.0.2 Accept: text/HTML, application/XHTML + XML, application/XML. Q = 0.9, /; Q = 0.8 Accept - Language: useful - cn, useful; Q =0.5 Accept-encoding: gzip, deflate Connection: keep-alive Referer: http://localhost/... ---- blank line ---- username=detanx&password=Aa123456// Request dataCopy the code

HTTPRespond to the request

  • HTTPThe response message is sent by the status line (status line), corresponding head (headers), blank line (blank line) and response data (response body)4It’s made up of parts.
  1. The status line
    • The status line by3Protocol version, status code, and status code scan. The protocol version is the same as the request packet, and the status code description is a simple description of the status.Description of the status code

  2. Response headers
    • The response header is the same as the request header, only different from the additional information contained in the request header.
    • Request header/response header reference table
  3. The response data
    • Depending on the type of request, the data format of the response may be binary file streams,JSONObject, string,HTMLFiles, etc.
State of HTTP / 1.1 200 OK / / / / response headers Date: the Sun, 17 Mar 2013 08:12:54 GMT Set - cookies: PHPSESSID = c0huq7pdkmm5gg6osoe3mgjmm3; path=/ Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Content-Length: 4393 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: application/json ... // ---- blank line ---- {name: 'detanx'} // Response dataCopy the code

The browser parses the rendered page

  • There are five steps that browsers take to parse a rendered page:
    • According to theHTMLResolve theDOM
    • According to theCSSParsing generatedCSSThe rule tree
    • CSSRule tree attached toDOMTree, construct generated render (Render) tree
    • Calculate the information for each node according to the render tree
    • Draw the page based on the calculated information

According to theHTMLResolve theDOM

  • readHTMLDocumentation, buildDOMTree in the process of encountering an inline style declaration or script declaration if not setdeferasyncProperty (see related supplement below), pauses document parsing, creates a new network connection, and starts downloading style files and script files.
  • According to theHTMLParse the tag into the structureDOMThe tree,DOMThe process of tree parsing is a depth-first traverse. That is, all children of the current node are built before the next sibling node is built.

According to theCSSParsing generatedCSSThe rule tree

  • When readCSSIs triggered to parse the CSS rule tree.
  • untilCSSThe rule tree andDOMThe browser does not render until the tree has been parsed.

CSSRule tree attached toDOMTree, construct generated render (Render) tree

  • whenCSSThe rule tree andDOMOnce you’ve parsed all the trees, you’re going to putCSSRule tree attached toDOMThe tree constructs what the browser needs to renderRenderThe tree.
  • ifDOMThe tree beforeCSSRule tree construction is complete, then inCSSAfter the rule tree is built, the page will be redrawn to the newly builtCSSRules apply to render trees.

Calculate the information for each node according to the render tree

  • Layout: Through the rendering tree rendering object information, parsingposition,overflow,z-indexdisplayAnd so on, calculate the position and size of each render tree node.
  • Backflow: After the layout is complete, it is found that some part of the layout has changed and needs to be rerendered. Last to call the operating systemNative GUI APIFinished drawing (repain).
  • Render tree nodes:Geckocalledframe, and inwebkitcalledrenderer.

Draw the page based on the calculated information

  • In the rendering stage, the system will traverse the rendering tree and call the”paint“Method to display the contents of the renderer on the screen.

scriptThe labeldeferasyncAttribute added

  • defer: Start a new thread to download the script file and execute the script after the document is parsed.
    • deferOnly applies to external scripts ifscriptTag not specifiedsrcProperty, just inline script, do not usedefer.
    • Multiple statementsdeferScripts are downloaded and executed in sequence.
    • deferThe script will inDOMContentLoadedloadEvent before execution.
  • async(HTML5New properties) : asynchronous download script files, download the execution code immediately explained.
    • Only apply to external scripts, this point anddeferConsistent.
    • Multiple statementsasyncScripts are not in order, and are downloaded and executed asynchronously.
    • asyncWill be inloadEvents are executed before, but are not guaranteed withDOMContentLoadedOrder of execution.
  • Note:DOMContentLoadedThe event is when initialHTMLDocuments are fully loaded and parsed (i.e., allDOMFully parsed), no need to wait for stylesheets, images, and subframes to finish loading. whileonloadThe event does not fire until all the elements of the page, including images and scripts, have loaded, so it is more important thanDOMContentLoadedDo it later.

Refluxing and repainting

  • backflow
    • whenRender TreeSome (or all) of the elements need to be rebuilt because of changes in their geometry, size, layout, hiding, and so on. Each page needs to be backflowed at least once, even when the page is first loaded, which is when backflowed is bound to happen because of the buildRender Tree.
  • redraw
    • whenRender TreeSome elements need to update their attributes, and these attributes only affect the appearance and style of the element, not the layout, for examplebackground-color,colorAnd so on.
  • Backflow will certainly cause redrawing, and redrawing will not necessarily cause backflow.

TCPFour waves to disconnect

  • The process is as follows :(active –> browser, passive –> server)
    • First wave: The active sends a packet, such as Fin, Ack, or Seq, to the passive, indicating that no data is transmitted. And enter the FIN_WAIT_1 state.
    • Second wave: The passive sends a packet, such as Ack or Seq, indicating that it agrees to close the request. The host actor enters the FIN_WAIT_2 state.
    • Third wave: The passive sends a Fin, Ack, or Seq packet to the active party to close the connection. And enter the LAST_ACK state.
    • Fourth wave: The active sends Ack and Seq packets to the passive. Then enter the wait TIME_WAIT state. The passive party closes the connection after receiving the packet segment from the active party. If the active party does not receive a reply after waiting for a certain period of time, it will normally shut down.

Refer to the link

  • What happens when you go from URL input to page presentation?
  • Explain the whole process of DNS domain name resolution
  • 2020 Front-end Interview part II – Browser caching mechanisms
  • How does the browser render the page?

Past wonderful

  • Basic specifications for front-end development
  • Build from 0 Vite + Vue3 + element-plus + VUE-Router + ESLint + husky + Lint-staged
  • “Front-end advanced” JavaScript handwriting methods/use tips self-check
  • An introduction to JavaScript design patterns and creative patterns
  • Public account open small program best solution (Vue)
  • Axios you probably don’t know how to use

“Likes, favorites and comments”

❤️ follow + like + favorites + comments + forward ❤️, creation is not easy, encourage the author to create a better article, thank 🙏 everyone.