1. What happens from URL input to page presentation
- DNS resolution: Resolves a domain name into an IP address
- TCP connection: TCP three-way handshake
- Sending an HTTP request
- The server processes the request and returns HTTP packets
- The browser parses the rendered page, builds the DOM tree, style tree, render tree, layout, and painting
- Disconnect: TCP wave four times
A:
What is the URL
Uniform Resource Locator (URL) is a Uniform Resource Locator (URL). Used to locate resources on the Internet. URL provides an abstract identification method for the location of resources, and uses this method to locate resources, so as to carry out various operations on resources, add, delete, change and check.
The syntax rules of such as http://www.w3school.com.cn/html/index.asp, abide by the following:
scheme://host.domain:port/path/filename
Copy the code
Each part is explained as follows:
- The scheme definition –agreementThe type of. Common protocols are
http
,https
,ftp
(File Transfer Protocol),file
(protocol for local files), the most common type of which is HTTP, while HTTPS is an encrypted network transfer. - Host – domain definitionThe host(The default host for HTTP is WWW), of course
https://zhidao.baidu.com
orhttps://map.baidu.com
Other host names in it - Domain – Defines the InternetThe domain name, such as
W3school.com.cn, baidu.com
- Port – Defines the port on the hostThe port numberThe default HTTP port number is
80
, HTTPS by default443
) - Path – Defines the path on the server (if omitted, the document must be at the root of the web site).
- Filename – Defines the name of the document/resource
2. Domain Name Resolution (DNS)
After entering a web address in the browser, the domain name must be resolved first, because the browser does not directly find the corresponding server through the domain name, but through the IP address. —- Computers can be assigned IP addresses as well as host names and domain names. Such as www.hackr.jp. Why didn’t you just give it an IP address in the first place? This saves parsing. So what is an IP address
2 to 1. The IP address
An IP address is an Internet protocol address. It shields physical address differences by assigning a logical address to every network and every host on the Internet. An IP address is a 32-bit binary number, for example, 127.0.0.1 is the local IP address. A domain name is the equivalent of a semantic IP address for easy memorization and communication. It is better to remember a computer name as a combination of letters and numbers than as a set of pure numbers for an IP address. But getting computers to understand names is relatively difficult. Because computers are better at processing long strings of numbers. In order to solve the above problems, DNS service came into being.
2-2. What is domain name resolution
The DNS provides the service of searching IP addresses by domain names or reverse-searching domain names from IP addresses. DNS is a web server. Our domain name resolution is simply a record of information on DNS. Example: baidu.com 220.114.23.56 (Server external IP address) 80 (Server port number)
2-3. How does the browser query the IP address of the URL by domain name
-
The browser checks whether the hosts file has an IP address corresponding to the domain name. If yes, the browser directly sends an HTTP request to the IP address. If the query fails, proceed to step 2. (hosts file location: C:\Windows\System32\drivers\etc\hosts)
-
The browser sends a DNS resolution packet for resolving domain names to the local DNS server. After receiving the request, the local DNS server checks the cache and returns the corresponding record. If no, go to step 3.
-
The local DNS server does not query the corresponding record in the cache. Therefore, the local DNS server sends a query request to the root DNS server. After receiving the request, the DNS root server queries the IP address of the TOP-LEVEL domain server corresponding to the top-level domain name and sends a reply packet to the local DNS server.
-
After receiving the reply packet, the local DNS server obtains the IP address of the top-level domain server and sends a DNS request packet requesting domain name resolution to this address.
-
After receiving the request, the TLD server checks whether the cache has a corresponding record. If it does not, the TLD server queries the secondary domain server address corresponding to the domain name and returns it to the local DNS server.
-
After receiving the response packet, the local DNS server obtains the IP address of the secondary domain server and sends a DNS request packet requesting domain name resolution to this IP address.
-
After receiving the request, the secondary domain server checks whether the cache has the corresponding record. If it does not, the secondary domain server queries the tertiary domain server address corresponding to the domain name, and returns the tertiary domain server address corresponding to the domain name to the local DNS server.
-
After receiving the reply packet, the local DNS server obtains the IP address of the tertiary domain server and sends a DNS request packet requesting domain name resolution to this IP address.
-
After receiving the request, the three-level domain server queries the corresponding record in the DNS zone database and returns the corresponding record
-
After receiving a DNS reply packet from the level-3 domain server, the local name server returns a DNS reply packet to the user and saves the record in the cache
-
The browser gets the IP address for the domain name and can then make an HTTP request
Explanation of each domain name server
- The root serverIt is used to manage the home directory of the Internet, and the top-level domain is as follows:
https://www.baidu.com
In the.com
- The secondary domain, such as:
https://www.baidu.com
In thebaidu.com
- Tertiary domains, such as:
https://a.www.baidu.com
In thea.baidu.com
2-4. Summary
The browser sends the domain name to the DNS server. The DNS server searches for the IP address corresponding to the domain name and returns the IP address to the browser. The browser injects the IP address into the protocol and sends the request parameters to the corresponding server.
3. TCP three-way handshake
The next step is to send an HTTP request to the server. The HTTP request is divided into three parts: TCP three-way handshake, HTTP request response information, and closing the TCP connection.
Before the client sends data, it initiates a TCP three-way handshake to synchronize the serial number and confirmation number of the client and server, and exchange TCP window size information.
3-1. The TCP three-way handshake process is as follows:
- The client sends a tape
The SYN = 1, Seq = X
(The first handshake, initiated by the browser, tells the server I’m going to send the request) - The server sends back a tape
SYN=1, ACK=X+1, Seq=Y
(Second handshake, initiated by the server, telling the browser I’m ready to accept, send it now) - The client sends back another tape
ACK = Y + 1, Seq = Z
(A third handshake, sent by the browser to tell the server I’m sending soon, get ready to accept it)
Description of each field:
- SYN: Establishes a connection
- Seq is the serial number, which is used to connect and transmit data later
- An ACK is an acknowledgement of the received packet and is the serial number of the packet that is expected to continue to be sent. Since X has been received, the next X+1 is expected to be received
3-2. Why three handshakes
This is to prevent invalid connection request messages from reaching the server.
Because there are probably A first send A connection request message, but due to network problems, has not reached host B, at that time, A host will timeout retransmission the newspaper article, then B host response to the request message, but not good, the first message has to host B, then B host will take it as A new connection request, if there are only two shake hands, In this case, host B also establishes A connection with the TCP connection request. However, in the case of A three-way handshake, host A ignores the confirmation packet sent by host B and does not establish A TCP connection.
4. Send the HTTP request
After the TCP three-way handshake is complete, HTTP request packets are sent. A request packet consists of a request line, a request header, and a request body, for example:
4-1. Request line —– contains request method, URL, and protocol version
There are eight request methods: GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, and TRACE. The URL is the requested address, which is specified by < protocol > : //< host > : < port >/< path >? < Parameter > Component protocol version indicates the HTTP version number
POST/chapter17 / user. HTTP / 1.1 HTMLCopy the code
In the preceding code, POST indicates the request method, /chapter17/user. HTML indicates the URL, and HTTP/1.1 indicates the protocol and the protocol version. The popular version is Http1.1
4-2. Request Headers —— contains additional information about the Request. It consists of two pairs of keywords and values, one for each line, separated by colons (:).
The request header notifies the server that there is information about the client request. It contains a lot of useful information about the client environment and the request body. Some of them are:
-
Accept
- Function: Indicates the media type that can be accepted by the browser
- For example,:
Accept: text/html
This means that the browser can accept the type of text/ HTML that the server sends back. If the server cannot return text/ HTML data, the server should return a 406 error (Non acceptable). - Wildcard: indicates any type. For example, Accept: * / * indicates that the browser can handle all types.
-
The Accept – Encoding:
- Function: The browser declares the encoding method it receives, usually specifying the compression method, whether it supports compression, and what compression method (gzip, deflate) is supported, (note: this is not character only encoding)
- For example: accept-encoding: gzip, deflate, br
-
Accept-Language
- Function: The browser declares the language it receives.
- Chinese is a language. Chinese has many character sets, such as BIG5, GB2312, GBK and so on
- For example, accept-language: zh-cn,zh; Q = 0.9
-
Connection
- role:
Connection: keep-alive
When a web page is opened, the TCP connection between the client and the server for the transmission of HTTP data is not closed. If the client visits the web page on the server again, the TCP connection will continue to be usedHas been established
theThe connection
. - For example,:
Connection: close
Indicates that after a Request is completed, the TCP connection used to transmit HTTP data between the client and server is closed. When the client sends a Request again, the TCP connection needs to be re-established.
- role:
-
Host (this header field is required when sending a request)
- role: Indicates the host name of the server to be accessed, for example, Baidu
www.baidu.com
. This value can be obtained from the URI accessed at crawler time. For example, we type in the browser:www.baidu.com
, it is:Host: www.baidu.com
- role: Indicates the host name of the server to be accessed, for example, Baidu
-
Referer
- The browser
The last visit
The web pageurl
When the browser sends a request to the Web server, it usually carries the Referer, which tells the server from which page I am linked, and the server can obtain some information for processing. For example, when I link to a friend’s site from my home page, his server can calculate from HTTP Referer how many users click on my home page to visit his site every day.
- The browser
-
User-Agent
- Action: Tells the HTTP server the name and version of the operating system and browser used by the client.
- For example, user-agent: Mozilla/5.0
-
Cookie
- Cookies are used to store some user information so that the server can identify the user’s identity (most of the websites that need to log in will be more common), such as cookies will store some user’s user name and password, when the user logs in, a Cookie will be generated on the client to store relevant information. In this way, the browser will verify that you are a legitimate user by reading the cookie information on the server and then allow you to view the corresponding webpage. Of course, the data in the cookie is not only the above scope, but also a lot of information can be stored in the cookie, such as sessionID and so on.
-
If-Modified-Since
- role: the browser side of the page cache
Last Modified time
Send it to the server, and the server will match this time withThe server
To compare the last modification time of the actual file. If the time is consistent, 304 is returned and the client uses the local cache file directly. If the time is inconsistent, 200 and the new file contents are returned. Once received, the client will discard the old file, cache the new file, and display it in the browser - For example,:
Mon, 17 Aug 2015 12:03:33 GMT
- role: the browser side of the page cache
-
If-None-Match
- role:
If-None-Match
andETag
Work together by adding ETag information to the HTTP Response. When the user requests the resource again, if-none-match information (the value of ETag) is added to the HTTP Request. If the server verifies that the resource’s ETag has not changed (the resource has not been updated), a 304 status is returned telling the client to use the local cache file. Otherwise it will return 200 status and new resources and Etag. Using such a mechanism will improve the performance of the site - For example,:
If None - Match: W/" 3119-14370384 74000"
。
- role:
-
Authorization
‘- When the client receives a WWW-authenticate response from the WEB server, the header responds with its own authentication information to the WEB server. The main purpose is to verify authorization and ensure compliance with the requirements of the server
- Proxy-Connection
- When using a proxy server, this specifies whether the proxy server uses long links. However, if there is a discrepancy between the data from client to proxy server and from proxy server to requested server, the information will not be available, but in most cases, it will still be valid.
4-3. The request body, ——-, can carry data of multiple request parameters, including carriage return, line feed and request data. Not all requests have request data.
name=tom&password=1234&realName=tomson
Copy the code
The above code, carrying the name, password, realName three request parameters.
5. The server processes the request and returns the HTTP packet
5 to 1. The server
The server is a high-performance computer in the network environment. It listens to the service requests submitted by other computers (clients) on the network and provides corresponding services, such as web service, file download service, mail service and video service. The main function of the client is to browse the web, watch videos, listen to music and so on, which are completely different. The application that handles requests, the Web Server, is installed on each server. Common Web server products include Apache, Nginx, IIS, or Lighttpd.
Web server plays the role of control. For the requests sent by different users, it will combine configuration files and entrust different requests to the programs that process the corresponding requests on the server for processing (such as CGI scripts, JSP scripts, Servlets, ASP scripts, server-side JavaScript, Or some other server-side technology, etc.), and returns the result of daemon processing as a response.
5-2.MVC background processing stage
There are many frameworks for backend development, but most of them are built according to the MVC design pattern. MVC is a design pattern that divides an application into three core parts: model, view, and controller. Each of them handles its own tasks, separating input, processing, and output.
5-2-1 view
It is provided to the user’s operation interface, is the shell of the program.
5-2-2, Model
The model is mainly responsible for data interaction. Of the three parts of MVC, the model has the most processing tasks. A model can provide data for multiple views.
5-2-3, Controller (Controller)
It is responsible for selecting data from the “model layer” according to user input instructions from the “view layer”, and then performing corresponding operations on it to produce the final result. The controller is a manager who receives requests from the view and decides which model artifact to call to process the request, and then determines which view to use to display the data returned by the model process.
5-3. HTTP response packets
A response packet consists of a request line, a header, and a body. As shown below:
(1) The response line contains: protocol version, status code, and status code description
The status code rules are as follows:
- 1XX: Indicating message – indicating that the request has been received and processing continues.
- 2xx: Success: The request is successfully received, understood, or accepted.
- 3xx: Redirect – Further action must be taken to complete the request.
- 4XX: Client error – The request has a syntax error or the request cannot be implemented.
- 5xx: Server side error — the server failed to fulfill a valid request.
(2) The response header contains additional information of the response packet, consisting of name/value pairs
(3) The response body contains carriage return character, newline character and response return data. Not all response messages contain response data
6. Browsers parse rendered pages **
Now that the browser has the response text HTML, let’s talk about the browser rendering mechanism
There are five steps that browsers take to parse a rendered page:
- Parse out the DOM tree from the HTML
- Generates a CSS rule tree based on CSS parsing
- Combine DOM tree and CSS rule tree to generate render tree
- Calculate the information for each node according to the render tree
- Draw the page based on the calculated information
6-1. Parse the DOM tree according to HTML
According to the content of HTML, tags are parsed into a DOM tree according to the structure. The DOM tree parsing process is a depth-first traversal. That is, all children of the current node are built before the next sibling node is built. If a script tag is encountered during the process of reading an HTML document and building the DOM tree, the building of the DOM tree is suspended (due to a single js thread) until the script is finished executing.
6-2. Generate a CSS rule tree based on CSS parsing
Js execution is paused while the CSS rule tree is parsed until the CSS rule tree is ready. Browsers do not render until the CSS rule tree is generated.
6-3. Generate rendering tree by combining DOM tree and CSS rule tree
After the DOM tree and CSS rule tree are all ready, the browser starts building the render tree. Simplifying CSS can also speed up the building of CSS rule trees, resulting in faster pages.
6-4. Calculate the information of each node according to the render tree (layout)
Redraw:
Attributes such as the background color, text color, etc. of an element that do not affect the layout around or inside the element will only cause the browser to redraw.
Return:
If the size of an element changes, the render tree needs to be recalculated and re-rendered
When does backflow occur?
- Add or remove visible DOM elements
- Element position changes;
- Element size changes — margins, padding, borders, width, and height
- Content changes – such as changes in text or image size resulting in changes in the width and height of the calculated values;
- Page render initialization;
- The browser window size changes — when the resize event occurs;
6-5. Draw the page according to the calculated information
In the paint phase, the system traverses the render tree and calls the renderer’s “paint” method to display the renderer’s contents on the screen.
7. Disconnect
When data transfer is complete, you need to disconnect the TCP connection and initiate the TCP wave four times.
- If the sender sends a packet to the passive party, such as Fin, Ack, or Seq, no data is transmitted. And enter the FIN_WAIT_1 state. (First wave: it is initiated by the browser and sent to the server. I have sent the request message. You are ready to close it.)
- The passive sends Ack and Seq packets, indicating that it agrees to close the request. The host initiator enters the FIN_WAIT_2 state. (Second wave: from the server, telling the browser that I’ve received my request and I’m ready to close, so are you)
- The passive sends a Fin, Ack, or Seq packet to the initiator to close the connection. And enter the LAST_ACK state. (Third wave: initiated by the server to tell the browser that I have sent the response message and you are ready to close it)
- The packet segment, such as Ack and Seq, is sent to the passive party. Then enter the wait TIME_WAIT state. The passive party closes the connection after receiving the packet segment from the initiator. If the initiator waits for a certain period of time and does not receive a reply, the system shuts down normally. (Fourth wave: initiated by the browser to tell the server, I have received the response message, I am ready to close, you are ready to do the same)
8. Event Loop in browser
- First of all,
The whole script
(As the firstMacro task
At the start of execution, all code is divided intoSynchronous and asynchronous tasks
Two parts synchronous
The mission will go straight inThe main thread
Executed in sequenceasynchronous
Tasks are subdivided intoThe macro
The tasks andmicro
task- The macro task enters the Event Table and registers the callback function in the Event Table. Whenever the specified Event completes, the Event Table moves the function to the Event Queue
- The microtask will also go to another Event Table and register a callback function in it, which will be moved to the Event Queue whenever the specified Event completes
- When tasks in the main thread are completed,
The main thread is empty
, will checkMicro tasks
If there are any tasks, execute them all; if not, execute the next macro task - This process is repeated over and over again. This is called an Event Loop, a relatively complete Event Loop
Case study:
function test() { console.log(1) setTimeout(function () { // timer1 console.log(2) }, 1000) } test(); setTimeout(function () { // timer2 console.log(3) }) new Promise(function (resolve) { console.log(4) setTimeout(function () { // timer3 console.log(5) }, 100) resolve() }).then(function () { setTimeout(function () { // timer4 console.log(6) }, 0) console.log(7) }) console.log(8)Copy the code
Prints: 1,4,8,7,3,6,5,2
Resolution:
- JS is executed sequentially from top to bottom
- Execute to test(), the test method is synchronized, execute directly, console.log(1) prints 1
- In the test method, setTimeout is an asynchronous macro task, and the callback is called timer1 and put into the macro task queue
- Then execute the test method with a setTimeout for the asynchronous macro task, call it timer2 and put it in the macro task queue
- And then we execute the promise, new promise is the synchronization task, execute it directly, print 4
- The setTimeout in new Promise is an asynchronous macro task, and the callback is called timer3 and put into the macro task queue
- Promise.then is the microtask, put on the microtask queue
- Console. log(8) is a synchronization task that is executed directly and prints 8
- When the main thread task completes, check for promise. then in the microtask queue
- SetTimeout is an asynchronous macro task. Call it timer4 and put it in the macro task queue
- Console. log(7) in the microtask queue is a synchronization task, executed directly, printing 7
- The microtask completes, and the first loop ends
- Check the macro task Queue, which contains timer1, Timer2, timer3, timer4, four timer macro tasks, according to the timer delay time can be executed in order, namely, Event Queue: Timer2, Timer4, Timer3, and Timer1 are placed at the end of the execution stack.
- Run timer2, console.log(3) for synchronization and print 3
- Check that there are no microtasks. The second Event Loop ends
- Run timer4, console.log(6) for synchronization and print 6
- Check that there are no microtasks. The third Event Loop ends
- Perform timer3, console.log(5) synchronization and print 5
- Check that there are no microtasks. The fourth Event Loop ends
- Perform timer1, console.log(2) synchronization and print 2
- Check that there are no microtasks or macro tasks. The fifth Event Loop ends
Click for details
9. Redraw and reflow
When the style of an element changes, the browser needs to trigger an update to redraw the element. In this process, there are two types of operations, namely redraw and backflow.
Redraw (repaint) :
When the changes to the element style do not affect the layout, the browser will use redraw to update the element, which requires only re-pixelated rendering at the UI level with less loss
Reflux (reflow) :
When an element’s size, structure, or triggers certain attributes, the browser rerenders the page, called backflow. In this case, the browser needs to recalculate, and after the calculation, the page layout needs to be rearranged, so it is a heavy operation.
Actions that trigger backflow:
- Page first render
- The browser window size changes
- Element size, position, and content change
- Element font size changes
- Add or remove visible DOM elements
- Activate CSS pseudo-classes (such as: :hover)
- Query some properties or call some methods
- ClientWidth, clientHeight, clientTop, clientLeft
- OffsetWidth, offsetHeight, offsetTop, offsetLeft
- ScrollWidth, scrollHeight, scrollTop, scrollLeft
- getComputedStyle()
- getBoundingClientRect()
- scrollTo()
Redraw does not necessarily trigger redraw. The cost of redrawing is low and the cost of backflow is high.
How to reduce backflow:
css
- Avoid table layouts
- Apply the animation to an element whose position attribute is absolute or fixed
javascript
- Avoid frequent operation styles, you can modify them all at once
- Use class for style changes whenever possible
- To reduce the number of dom additions and deletions, you can use strings or DocumentFragments to insert the DOM at one time
- Limit optimization, modify the style can be changed after display: None
- Avoid triggering the methods mentioned above that trigger backflow more than once, and use variables if possible
Stored 10.
We often need to store some data in our business, which can be divided into transient storage and persistent storage.
- Ephemeral, we only need to store the data in memory, available only at run time
- Persistent storage, can be divided into
The browser
与The server side
Browser:
- Cookie: usually used to store user identity, login status and other automatic carry in HTTP, the size of the upper limit is 4K, you can set the expiration time
- LocalStorage/sessionStorage: long-term storage/window closed deletion, the volume limit is 4 to 5M
- indexDB
Server:
- Distributed cache Redis
- The database
11. HTTP/HTTPS protocol
11-1. [1.] Protocol defects:
- Unable to reuse the link, disconnect upon completion, restart slowly and TCP 3 – way handshake
- Head of line blocking: Blocking is blocking, causing requests to interact with each other
11-2. [1.1] Improvement:
- Long connection (default keep-alive), multiplexing
- The host field specifies the corresponding virtual site
New features:
- Breakpoint continuingly
- The identity authentication
- State management
- Cache cache
- Cache-Control
- Expires
- Last-Modified
- Etag
11-3. [2.0] :
- multiplexing
- Binary framing layer: between the application layer and the transport layer
- The first compression
- Server push
11-4. HTTPS: a secure network transmission protocol
– Certificate (public key)
- SSL encryption
- Port 443
11-5. Cache strategy: It can be divided into strong cache and negotiated cache
The 11-5-1 strong cache
- Cache-Control: The browser determines whether the cache is expired,
No overdue
, directly useStrong cache
, the max-age priority of cache-controlabove
Expires - Expires: The browser determines whether the cache Expires
A negotiated cache is used when the cache has expired
11-5-2 Negotiation cache
1. Unique identification scheme:
- Etag(
response
Carry) - If-None-Match(
request
Carry, last returnEtag
) — The server determines whether the resource has been modified
ETag/ if-none-match ETag Ensures that each resource is unique. Resource changes lead to ETag changes. If the ETag value changes, the resource status has been modified. The server determines whether the cache is hit based on the if-none-match value sent by the browser.
Disadvantages:
This time is an absolute time. If the local time of the client is changed, the time difference between the server and the client becomes larger, resulting in cache confusion.
2. Time plan for last modification:
Last-modified (response) & if-modified-since (request, Last returned last-modified)
- If so, 304 is returned to inform the browser to use the cache
- If they are inconsistent, the server returns a new resource
Last-modify/if-modify-since When the browser requests a resource for the first time, last-modify is added to the header returned by the server. Last-modify indicates the time when the resource was Last modified. When the browser requests the resource again, The sent request header contains if-modify-since, which is the last-modify returned before the cache. After receiving if.modify-since, the server determines whether the resource matches the cache based on the last modification time.
Last-modified faults:
- If the contents are not changed periodically, the cache becomes invalid
- The minimum granularity is only up to S. Changes within S cannot be detected
Etag has a higher priority than last-Modified
12. Cross domain
What is the same Origin policy?
The same origin policy means that the protocols, ports, and domain names are the same.
Restrictions on non-homology:
Cookies cannot be read, DOM cannot be retrieved, and Ajax requests cannot be sent.
What is cross-domain:
Requests are made between two domains (different protocols, ports, and domain names).
Cross-domain solutions:
- JSONP, by dynamically creating a Script tag, the SRC attribute of the script tag is not restricted across domains.
- Cors, the server adds some headers in response:
- Access-Control-Allow-Origin: http://ip:port
- Nginx does reverse proxy
- Proxy for development environments that use Webpack-dev-server across domains