Dopro. IO /http-cache-…

 

Configuring HTTP caching and CDN caching has always been an important and common approach in Web performance optimization. Proper HTTP cache and CDN cache configurations can relieve server pressure, alleviate network bottlenecks, and improve user experience. However, improper cache configurations may lead to problems such as resource failure to update in time, user experience difference, and even process errors. This article mainly explains the principle and configuration rules of HTTP cache and CDN cache, hoping to let you know what is a reasonable cache configuration, how to customize the cache scheme for your own project, and how to analyze and solve the cache problem if it happens.

First, let’s take a look at this scenario

A new feature was launched in project A, including logic changes and page UI updates. Xiaoming submitted the code as A project developer and pre-released it. Product manager, little red began to experience the new features, strangely, little red into the project did not see the latest features, after the Ming thought for a while, little red you click refresh to try again, and sure enough, after the refresh project has changed, new features, but have a new problem at this moment, the images in the project seems to be the old figure, xiao Ming and thinking for a while, Then in front of the computer tinker, let xiao Hong experience again, finally, this time features complete acceptance. In the above case, in fact, it includes the application of HTTP cache and CDN cache, of course, this is a negative lesson, in fact, in the process of line, we can not let every user click refresh to experience our new features, then how to solve the above problem, next dry products.

HTTP cache

Introduction to the

HTTP cache is the client cache. After receiving the response from the server as the client, the browser parses the header field of the response to analyze the corresponding cache rules and cache resources according to the rules. If a request matches the cache again, the browser directly reads the local cache and does not send a request.

Caching rules

The HTTP cache rules are controlled by the response header fields, where the key fields areExpires.Cache-ControlLast-ModifiedEtagFour fields,ExpiresandCache-ControlUsed to determine how long the cache is stored,Last-ModifiedEtagIs used to determine whether the cache is to be updated. Let’s look at the differences briefly.

  • Expires: A parameter used in HTTP1.0 to control the cache time. The response header contains the date/time after which the response expires.
  • Cache-control: parameter used to control cache time in HTTP1.1
    • Public: indicates that the response can be cached by any object, including the client that sent the request, the proxy server, and so on.
    • Private: Indicates that the response can only be cached by a single user and not as a shared cache (that is, the proxy server cannot cache it).
    • Max-age =

      : Specifies the maximum cache storage period, which is different from the cache request time in seconds. In this period, resources are accessed from the local cache without sending requests to the server. (Max-age takes precedence when present with Expires)
    • S-maxage =

      : The rule is the same as max-age and overrides max-age or Expires headers, but only applies to shared caches (such as individual agents) and is ignored in private caches. (When present with Expires or Max-age, s-Maxage takes precedence)
    • No-store: does not cache any content of the response from the server and requires a complete response from the server every time a resource is accessed
    • No-cache: Cache resources that are cached but expire immediately. Each request is compared with the server to verify whether the resources are modified. (equivalent to max-age=0)
  • Last-modified: The date and time when the resource identified by the source server was modified. Less accurate than Etag. containsIf-Modified-SinceorIf-Unmodified-SinceThis field is used in the header condition request.
  • Etag: THE HTTP response header is the identifier for a particular version of the resource. 

We can easily find an example using the Chrome console:

In the figure configuration

  1. Cache-control: max-age=31535000 Indicates that the cache is 31536000 seconds relative to the request time, that is, 365 days. Within this period, the cache is read directly from the local cache again without sending requests to the server.
  2. last-modified: Mon… If the cache time is expired, this field will be compared with the if-Modified-since field in the request. If the cache time is consistent, the previous cache will continue to be used. If the cache time is inconsistent, the cache will be considered invalid
  3. Expires: Expires is overwritten by cache-control in HTTP1.0.

Cache processes

How do caching rules come into play here, let’s look at some of the highlights

Focus on 1: Whether the cache expires

Based on the cache rule of the last response of the resource, the cache is considered to have not expired if the following conditions are met. It is important to note that determining whether the cache is expired is a client-side issue, not a server-side issue. If 1, 2, and 3 are met at the same time, the cache is considered not expired, and otherwise, it is expired

  1. Cache-control value of Max – age
  2. max-age > 0
  3. Current date < last request date + max-age

Concern 2: Asks whether the server resources are modified

The client stores Etag (If any) and Last-Modified (If any) in the resource cache for the first time, and writes Etag to if-none-match in the request header when the cache expires next time. If the last-Modified value is written in if-Modified-since in the request header, the server compares the Etag first and then last-Modified. If the Etag passes completely, the cache is regarded as unchanged. If the resource does not pass, the cache is considered as Modified and invalid

Focus on 3: caching rules

Cache rules are mainly represented by cache-control fields and Expires fields, and cache-Control is used when both occur. The details are as follows:

  1. Cache-control =no-store No resources are cached
  2. Cache-control =no-cache Cache but expires immediately
  3. Cache-control =max-age (s-maxage) = 0 cache but immediately expired (equivalent to no-cache)
  4. Cache-control =max-age (s-maxage) = seconds (seconds > 0) — Cache seconds based on the request time
  5. Cache-control = other According to the HTTP standard, the cache time is equal to 10% of the difference between the current time and last-Modified time, if it does not carry any tokens about caching. Cache-control =max-age= (date-last-modified) / 10 No explicit HTTP Cache Lifetime information was provided.Heuristic expiration policies suggest defaulting to: 10% of the delta between Last-Modified and Date.
  6. Expires = a past time or an invalid time that is cached but expires immediately, equivalent to cache-control=no-cache
  7. Expires = Future time, cached to the corresponding time

Cache configuration

As can be seen from the above rules and flow chart, the configuration of cache rules is not complicated, except that Etag and Last-Modified are used for cache comparison (in practice, only this function needs to be enabled). The only thing we need to focus on is cache-control (expires can be converted to max-age, not to mention again). The solution is as follows:

  1. cache-control: no-store: No caching, all resources are downloaded from the service on each access.
  2. Cache-control: no – or cachecache-control: max-age=0: Comparison cache: caches the current resources, but checks whether the resources are modified after each access.
  3. Cache-control: max-age=seconds //seconds > 0: indicates that the current resource is cached. In a certain period of time, the current resource is directly read from the local cache.

Note: Resources under strong cache are not unreachable. For example, CTRL + F5 in Chrome is equivalent to triggering Scheme 1 directly, and f5 or webView’s refresh key is equivalent to triggering Scheme 2 directly. However, these are based on client operations and are not recommended to be considered in actual projects.

In actual projects, the application of scheme 1 is almost invisible. Compared with scheme 2 and 3, Scheme 1 has no advantage. In the choice between option 2 and Option 3, we will differentiate between resources.

  • For img, CSS, JS, fonts and other non-HTML resources, we can directly consider scheme 3, and max-age configuration time can be as long as possible, similar to the case of cache rules,cache-control: max-age=31535000Configure the cache for 365 days. It is important to note that this configuration does not necessarily mean that the resources remain unchanged for a year. The fundamental reason is that the front-end build tools now include the concept of stamps in static resources (e.g. [hash] in webpack, gulp-rev in gulp). Each change changes the file name or increases the Query parameter, essentially changing the address of the request and eliminating cache updates.
  • For HTML resources, we recommend using a solution based on how frequently the project is updated. HTML as the entry file of front-end resources, once it is strongly cached, relevant JS, CSS, IMG and so on cannot be updated. For service projects with high frequency maintenance, you are advised to use solution 2 or Solution 3 but set max-age to a smaller value, such as 3600, which expires in one hour. For some active projects, major changes will not be made after the launch, and scheme 3 is recommended. However, max-age should not be set too large, otherwise users will not be able to update in time once bugs or unknown problems occur.

In addition to the above considerations, sometimes other factors will also affect the configuration of cache, such as QQ red envelope New Year’s Eve activities, high concurrency and large traffic is easy to bring great challenges to the server, at this time, we as a front-end development, we can adopt plan 3 to avoid the traffic pressure caused by users entering multiple times.

summary

For the configuration of HTTP cache, we always need to do two things, one is to clearly understand the principles and rules of HTTP cache, the second is to make clear that the configuration of cache is not one-time, according to different situations to configure different rules, can better play the value of HTTP cache.

CDN cache

CDN cache is a kind of server-side cache. CDN service providers cache the resources of the source station to high-performance accelerated nodes all over the country. When users access the corresponding service resources, they will be scheduled to the nearest node and the nearest node IP will be returned to them. Optimize access speed and experience for different users.

Caching rules

Different from the HTTP cache rule, this rule is not normative, but made by THE CDN service provider. We take Tencent Cloud as an example to open the CDN accelerated service configuration, and the panel is as follows.

 

It can be seen that the configuration items provided to us only have file type (or file directory) and refresh time. The meaning is also very simple. For different file types, the corresponding time is cached on the CDN node.

CDN operation process

 

It can be seen from the figure that the configuration of CDN cache mainly plays a role in the cache processing stage. Although the configuration items only have file type and cache time, the process is not simple. Let’s first clarify a concept — back to source. Many people misunderstand that access to CDN means putting resources on CDN, but in fact, this is not true. As shown in the figure, after access to CDN, our server is the source station. Generally, the source station will only receive requests from CDN nodes when CDN nodes have no resources or CDN resources are invalid. If we know the address of the source, we can access the source directly. After the concept of back source is clearly defined, the process of CDN is not so complicated. A simple understanding is that if there is no resource, it will go to the source station to read, and if there is resource, it will be directly sent to the user. Different from HTTP cache, there is no no-cache (max-age=0) in CDN. When the cache time is set to 0, files of this type are considered as uncached files, that is, all requests are directly forwarded to the source site. Only when the cache time is greater than 0 and the cache expires, will the cache be compared with the source site to see whether the cache is modified.

Cache configuration

CDN cache configuration is not troublesome, and in general, it is recommended to keep the HTTP cache configuration consistent. It should be noted that the CDN cache configuration is affected by the HTTP cache configuration, and different CDN service providers are not completely consistent. Take Tencent Cloud as an example, the following is particularly noted in the cache configuration document.

How does that affect us?

  1. If we set HTTP cachecache-control: max-age=600, namely the cache for 10 minutes, but the CDN cache configuration Settings file cache time for 1 hour, then can appear the following situation, file is accessed after 12 minutes and uploaded to the server, users to access resources, the response code is 304, compared to cache the unmodified, resource is still old, again an hour later access to updated with the latest resources
  2. What if we don’t set cache-control? In the HTTP cache, we said that if we don’t set cache-control, there will be a default cache time, but in this case, the CDN service provider will explicitly add it for us when there is no cache-control fieldcache-control: max-age=600.

Note: For question 1, is not without way, when we have to modify the file cache period, and not to want to affect the user experience, so we can use the CDN providers forced update cache function, main attention is, the mandatory update here is update server cache, the HTTP cache is still carried out in accordance with the rules of the HTTP headers own cache handling, It’s not going to be affected.

summary

The configuration of CDN cache is not complicated, but the complex situation is that the CDN cache configuration is affected by the HTTP cache configuration, and different CDN operators have different treatment for this effect. In actual use, it is recommended to find the corresponding precautions in the corresponding CDN service provider documents.

A combination of HTTP caching and CDN caching

Once we understand the HTTP cache configuration and the CDN cache configuration respectively, one more thing is to understand the flow of requests when the two are combined

 

When users visit our business server, the first is the HTTP caching processing, if HTTP cache by check, the direct response to the user, if not through the check, of CDN cache continue processing, after completion of the CDN cache handling returned to the client, the HTTP caching rules are stored by the client and response to the user. When we analyze the cache problem, we must analyze the two processes independently. Now we look at the error case at the beginning. It is obvious that the first problem is caused by the unreasonable configuration of HTTP cache, which leads to the user having to update resources forcibly.

conclusion

HTTP cache and CDN cache, as client cache and server cache respectively, jointly affect the flow direction of our Web requests. In order to do a good job in the configuration of cache, first of all, it is clear about the principle and configuration rules of cache, and secondly, it is to analyze the cache level based on the project, and deal with the specific situation.

Welcome to “Tencent DeepOcean” wechat public account, weekly push you front-end, artificial intelligence, SEO/ASO and other related original quality technical articles:

See xiaobian handling so hard, pay attention to a bai 🙂