What happens from the URL to the page presentation

Let’s start with a “classic front end test” — what happens from the URL to the page presentation:

  1. Url resolution, according to the DNS system for IP search;

① Web standards stipulate that urls can only be numbers and letters, with a small number of special symbols. Urls can carry parameters. Ambiguity can occur if the URL is not escaped. Such as? Key =value may include the = symbol in key itself; Url encoding is utF-8 standard, but not in all browsers for all cases. In the case of JS, encodeURIComponent and encodeURI ensure utF-8 encoding. DNS resolution process is hosts(mapping IP) -> local DNS parser (cache) -> DNS server configured on computer -> global root DNS server front-end DNS optimization is added to head:

  1. After the IP address is found, the TCP three-way handshake is used to send the HTTP request.
  2. After the link is created, request the HTML file/resource. If it is in the cache, take it directly. Otherwise, request the backend.

When the browser successfully loads the resource for the first time, the server returns 200. In this case, the browser not only downloads the resource, but also caches the response header. When a resource is loaded next time, it is processed by strong cache. Cache-control has the highest priority. For example, cache-control:no-cache directly enters the negotiation cache step. If Max -age= XXX, the current time is compared with the last time 200 was returned (compare date attribute). If Max -age is not exceeded, the strong cache is hit and the file is read directly from the local cache. If there is no cache-control field, expires is compared. The negotiated cache phase compares two fields: if-modified-since and if-none-match, which follow the request to the server in the header. If-none-match-etag is first compared, If the negotiation cache is matched, 304 is returned. If not, return the new resource & 200. Then if-modified-since -last-modified (compared to the last modification of the file obtained by the server), or 304 If the negotiated cache is matched. Otherwise return the new last-Modified value and the file as well as 200;

  1. The server will return HTML after processing it;
  2. TCP waved four times to end the request.
  3. Browser parse HTML, “Build DOM tree” -> “Build CSSOM tree” -> “execute JS” -> “generate render tree based on DOM and CSSOM merge” -> “Render” -> “Layout” -> “Draw”

A complete request is officially terminated. SRC = img; SRC = script; SRC = get;


Headers in requests and responses

There are some header fields mentioned in the cache above, let’s look at them:

As you can see, a request consists of two parts: request and response. They are:

  • The request line
  • Request header
  • Request body
  • Status line (response)
  • Response header (response)
  • Response body (response)

Response headers

The “request headers” and “response headers” are the ones we care about. Let’s start with the response headers — they’re all controlled by the server for different functions. The more common ones are:

  • Etag: unique identifier of a resource
  • Last-modified: Indicates the last modification time of the requested resource
  • cache-control
  • expires
  • Date
  • Access – Control – Allow department
  • Set-cookie: A Cookie used by the server to Set the state (the expires directive is also present, which only controls the lifetime of the Cookie and defaults to browser closure if not specified) (This can also be used for Web security defense through Secure and HttpOnly)

Say firstAccess-Control-AllowThey are used to cross domains and set the form of requests that the server is allowed to receive.

Then there is Date: the time the server sent the message. The value should be the time when the last request was 200, which is used to compare the next request strong cache phase with the max-age attribute. It is a GMT time; Then there’s Expires: It’s an absolute time that specifies when a cache will expire. Because of the limitation of absolute time on application scenarios, max-age comes into being: relative time (relative to the time when a resource was last requested);

The most important thing is the cache-control. It contains several values:

  • Public: indicates that the response can be cached by any object.
  • Private: The response can only be cached by clients.
  • No-store: indicates that the requested resources cannot be cached.
  • No-cache: the resource is cached, but a request must be made to the server (skip the strong cache and enter the negotiated cache directly, comparing Etag with last-Modified)
  • max-age
  • s-maxage
  • .

Some classic questions

1. No-store and no-cache catch my attention: What are they for? Is it true by definition? After verification, for no-store, as it defines, the resource is not cached in any way, and the next request will still be considered by the server as “first request”; Max-age =0. On the next request, the browser still sends a request to the server (with an Etag), even though it is in the cache. If the resource has not changed, the browser sends an Etag to the server. Return 304(return only) to indicate that the resource is still available and the browser will retrieve the resource from the cache. Otherwise return with a new request and re-cache.

2. Why does no-cache have a higher priority than no-store? The author searched through the data and did not find this problem, but ACCORDING to the above description, I can guess: it is to ensure the timeliness of the file and take into account the performance. If the request finds that the file hasn’t changed, it doesn’t actually need to retransfer the resource and fetch it to the page. It’s faster to fetch it from the cache.

3. Why does cache-control (when max-age is specified) take precedence over Expires? The reason is that Expires caching works by comparing the client’s time with the time returned by the server. The client and the server have one side of the time is not accurate) error, then the forced cache will be directly invalid, so the existence of the forced cache is meaningless; And as mentioned earlier, expires is an absolute time, which means you specify a year, a month, a day, a few minutes, a few seconds, which is pretty inconvenient in most situations.

Etag, if-none-match, last-modified, and if-Modified-since are used as header fields in the cache negotiation phase. Etag takes precedence over if-none-match, last-modified, and if-modified-since headers.

  • First, if-modified-since can only check the time at the second level (or the Unix record MTIME can only be accurate to the second), so it can’t do anything about files that change very frequently.
  • Second, the last-modified date is unreliable. Sometimes a developer will upload all files to the server after fixing something, resetting the last-Modified date for all files even if the content is only modified on a subset.
  • Etag, on the other hand, is a unique server identifier of the resource generated by the developer. It is essentially a hash value generated using a hash function like SHA256.

Request header

The “request headers” in HTTP requests are equally good. Here are some common ones:

  • Accept: the encoding, content type and other Settings that are acceptable to the client
  • Cache-control: indicates the caching mechanism of the resource
  • Connection: Whether a persistent Connection is required (http1.1 enabled by default)
  • Cookie: When a request is sent, all cookies saved under the domain name of the request are sent together
  • Content: Information requested, such as: The length of the request body (the request body is the content after two cr-LF character groups at the end of the HTTP header, such as the form data submitted by POST, Does not contain the data length of the request line and HTTP header), whether to Base64 binary encoding of the request body content based on MD5 algorithm, set the MIME type of the content in the request body (for POST and PUT requests)
  • Date: indicates the Date when the request is sent
  • Origin: Protocol name + domain name of the resource that sent the request
  • Referer: The address of the resource that sent the request
  • Upgrade: Transport protocol (websocket)
  • User-agent: indicates the User information
  • If-none-match: the value is the Etag of the server’s previous response, which is used to compare and determine whether the resource has changed
  • If-modified-since: Indicates whether the requested resource was Modified after a specified time
  • .

Cache-control is present in the request header and also in the response header. How do they work? In fact, the relationship is that the response header (backend setup) controls the caching on, while the request header (front-end Ajax setup) controls the caching off. Let’s use Node as a server:

// Front-end code
const ajax=new XMLHttpRequest();
ajax.open('get'.'http://localhost:8083/assets');
// ajax.setRequestHeader('cache-control','max-age=0');
ajax.onreadystatechange=function(){
	if(ajax.readyState==4 && ajax.status==200) {console.log(ajax.responseText);
	}
}
ajax.send()
Copy the code

If turned on, the request header and response header will look like this (each request will not be cached, but will be sent directly to the server) :

However, if the comment is disabled, it indicates that all backend Settings are dominant. (In this case, the Request Header does not contain cache-control.)

/ / koa code
const Koa=require("koa");
const Router=require("koa-router");
const cors=require('koa2-cors');
// Introduce static service modules
const staticFiles = require('koa-static');
const router=new Router();

const app=new Koa();

app.use(cors({
	origin:The '*'.credentials:true   // Controls how cookies are allowed to be received in KOA
}))


app.use(staticFiles(__dirname + '/ The HTML page for the experiment is all here '))
app.use(async(ctx,next)=>{
	console.log('This is a global test'.Date.now())
	await next()
})

router.get('/assets'.async (ctx,next)=>{
	console.log(ctx);
	ctx.set('cache-control'.'max-age=30');
	ctx.body='This is a test'
})

app.use(router.routes())
app.use(router.allowedMethods())

app.listen(8083);
Copy the code

We can see the effect in this case:

There are two other fields in the request header that need to be highlighted: Cookie — you’ve probably heard that cookies are automatically carried into the header and sent to the server. Sometimes we also use tokens (JWT) in conjunction with the backend. Here are the concepts of “simple requests” and “non-simple requests” :

  • Simple request: request method is one of GET, POST, HEAD, and HTTP header with/less than”Accept“,”Accept-Language“,”Content-Language“,”Last-Event-ID“,”Content-Type“AndCnotent-TypeIs limited to”application/x-www-form-urlencoded“,”multipart/form-data“,”text/plain“;
  • Non-simple request: the request has special requirements on the server, such as PUT, DELETE, orContent-TypeThe field type isapplication/json, or HTTP headers with custom headers (such as tokens);

They usually occur in cORS scenarios in “cross-domain requests” (which fall under the category of Ajax requests, which are also browser requests, and are included in this article). And then origin — it’s also very cross-domain, so let’s talk about it.


Cross domain resolution

Speaking of cross-domain, it simply means that the browser restricts the sharing of resources between two different sources. There are many cross-domain solutions for front-end JS, and here are the two most important ones: CORS and JSONP.

Principle of cors

For simple requests, the browser issues CORS requests directly. Specifically, you add an Origin field to the request header information (which is automatically added by the browser). If origin does not specify a licensed source, the server returns a normal HTTP response. The browser knows something is wrong when it sees that the access-Control-Allow-Origin field is not included in the response header and throws an error that is caught by the ONError callback of XMLHttpRequest.

Note: This case cannot be identified by the status code because the HTTP response may also have a status code of 200.

How do I determine if the source specified by Origin is licensed? This is how CORS solves cross-domain problems: The backend configures some response headers that specify the domain name, request method, and so on:

/ / for springMVC code
@Configuration
public class CrosConfig implements WebMvcConfigurer {
    @Override
    public void addCorsMappings(CorsRegistry registry) {
        registry.addMapping("/ * *")
                .allowedOrigins("*")
                .allowedMethods("*")
                .allowedHeaders("*"); }}Copy the code

To make the browser carry cookies in a simple request, you must manually set it in Ajax:

xhr.withCredentials=true;
Copy the code

Also, get the server to agree to it by specifying the Access-Control-expose-credentials field.

For cross-domain CORS that are not simple requests, the browser will first send a “pre-check request” of the type OPTIONS, and the request address is the same. The browser asks the server whether the domain name of the current web page is in the server’s permission list. The server processes the “pre-check request” and adds a verification field to the Response Header. After receiving the return value of the precheck request, the client prejudges the request and initiates the main request after the request passes the verification. In this case, the browser will only issue the main request if it receives a positive response after the first request, otherwise it will directly report an error!

The configuration of non-simple requests on the server is similar to that of simple requests, except that the Allow-Origin field must be determinate (not “*”)!

Some say yes and some say no. The failure of the author’s tests in the project may also be related to some other configuration of the project.

jsonp

Jsonp takes advantage of the fact that SRC attributes of tags such as Script and IMG are not constrained by the same origin policy. It is a way of dynamically generating script tags. Jsonp only allows GET requests!

The implementation pattern of JSONP is callback. That is, we must receive a callback to get the return value:

<! DOCTYPEhtml>
<html lang="en">
  <head>
    <meta charset="utf-8">
  </head>
  <body>
    <script type='text/javascript'>
      // The back end returns the directly executed method, which is equivalent to executing the method. Since the back end puts the returned data in the parameter of the method, it can get the res here.
      window.showLocation = function (res) {
        console.log(res)
        // Perform the Ajax callback
      }
    </script>
    <script src='http://127.0.0.1:8080? callback=showLocation' type='text/javascript'></script>
  </body>
</html>
Copy the code

As stated in this code, the principle of JSONP is: Create a callback function, then call the function on the remote service and pass the JSON data as an argument, completing the callback (essentially calling the function name on the server as a result and passing it back to the front end, which is equivalent to executing the function directly in JS).

// Node.js server code
const http = require("http");
const server = http.createServer();

server.on("request".(req, res) = > {
    res.setHeader("Content-Type"."text/html; charset=utf-8");
    res.end("ShowLocation (' I return value ')");   / /! Be careful here
});

server.listen(8080.() = > {
    console.log("Visit http://127.0.0.1:8080");
});
Copy the code

Why is JSONP in this way?

In general, we want this script tag to be called dynamically, not stuck in THE HTML and executed before the page is displayed, which is very inflexible. We can create a script tag dynamically using JS and pass it as a callback function name. The callback function parameter is passed as the result of the server calling this parameter. This gives you the flexibility to invoke the remote service.

Shortcomings of JSONP?

JSONP’s strengths are ease of use, direct access to response text, and support for two-way communication between the browser and the server, meaning that responses containing JSON-encoded data are automatically decoded (that is, executed). But it also shows that JSONP “depends” on code in other domains. If the other domains are not secure, there is a good chance that some malicious code will be embedded in the response, and there is no way to do anything but abandon the JSONP call; Second, it is not easy to determine whether a JSONP request has failed. Although HTML5 has added onError events to script elements, it is currently not well supported. To do this you must use a timer to detect whether a response has been received within the specified time.