At the beginning

This article was shared by FCC Shanghai offline and Chengdu wechat, and then compiled into a draft for updating. Considering that melon readers do not know what they have learned and my status as a popular science writer (determined by myself), I decide to introduce it from getting started to giving up, which roughly involves:

  • What is the CDN
  • Why do we use CDN
  • Access to the principle of
  • architecture
  • Application and trampling pits
  • Real world CDN

Because it is very difficult to talk about each place in detail, it is easy to skip tickets when it is difficult, and it will lead to a long length, so in fact, they are all popular science. If you want to go deeper, links will be provided in each place, so that you can have a targeted in-depth reading.

If there is something wrong, please criticize it. The bold part is hyperlink.

What is the CDN

Let’s start with a simple chestnut:


“Agriculture in Africa is not developed, everyone wants gold cordeas” — I believe you have probably seen the video from San Diego, USA, where Americans, Africans and Japanese are fighting together. If Goldcora is produced in only one place, the transportation costs in Africa and the capacity pressures on producers are huge.

So it’s very simple, we have foundries in every country in the world where gold corks are needed, producing gold corks — our CDN is the company producing gold corks, and each “node” is the foundries.

CDN, Chinese name is “content distribution network”, its role is to reduce the transmission delay, find the nearest node. In fact, although the Internet helps us to realize the global village, but the delay from China to Japan and from China to Taiwan is still different, which can be seen from ping and Traceroute.

The advantages of CDN

Access to accelerate

As a classical means of front-end performance, CDN is believed to have been used unthinkably. As mentioned above, it can be used as a means of acceleration to a large extent with reduced delay. In fact, the real CDN is not one node in one country, or even one operator, or one province or even region as mentioned above.


Reduce load on the source site (server)

A very simple problem to figure out, if the CDN can already help me return data, then the request will not reach the source site, the source site (server) load is reduced.

Withstand attack

Now that the load on the source site has been lightened, it’s possible to have a good laugh during a DDOS attack.


When Teacher Ruan was hit by DDOS, teacher Ruan later moved the content to GayHub…

Then I didn’t have to update the content, just recently, Teacher Ruan published a DDOS defense guide, and then was attacked, and then paralyzed, the defense guide said he was CC, and then moved to Tencent cloud, slap me in the face… Of course, in fact CC is not that difficult to defend, but not in the theme of sharing content, interested can talk later…

Inspired by Teacher Ruan, I paste up an architecture diagram, an idea diagram of the blog system, which is basically the same as Jekyll and Hexo in the market. At that time, MY idea is to extract all comments and so on, so that when I type, I will not have any impact on the blog itself. I went back and looked it up and found it was just the same — not bad!


So since static resources can be CDN, can our API, our MVC page also be CDN? The answer is yes, the key is this thing called total station acceleration.

What’s it like? In fact, the CDN is a cache, the difference is that the cache is placed on the network service provider node.

The simplest model is like this diagram from a book called Core Principles and Case Studies of Technical Architecture for Large Web sites.


Access to the principle of

What happens between the time we initiate a request and the time we reach the CDN node, and how does the CDN accelerate our request?


This one was also from the Internet.

First we type a url into the address bar, and the browser finds that there is no LOCAL DNS cache for that url, so it makes a request to the site’s DNS server.

The DNS server of the website is set with CNAME, which points to a CDN server, namely ali Cloud, Tencent Cloud, Cloudflare, etc., to request the intelligent DNS load balancing system in CDN.

Load balancing The system resolves domain names and returns the node with the fastest response to the user. The user then sends a request to the node.

If it is the first time to access the content, the CDN server will request the data from the source station and cache it; otherwise, it will directly find the data in the cache node and send the request result to the user.

For the simplest CDN system, only one DNS scheduling server and one node server is enough, but in complex applications, there will be multi-level Cache, multiple caches to work together.

Here’s something I wrote (actually excerpted) on my blog.

Set up the principle

If you want to set up a CDN, what do you need to do? Remember what we just said, the point is, the CDN is still a cache.


Take the ali cloud CDN architecture diagram, I have not built, if the explanation is wrong, please point out.

Since it is a cache, it is obvious that load balancing plus cache scheduling is a combination. According to the access principle we just talked about, the main focus in addition to load balancing and cache is a central DNS scheduler.

In fact, like the multi-level cache design of the computer and the back-end, the cache level of each level is larger than that of the first level, which can store more resources, but the response time is longer than that of the first level. If there is no hit in L1, we will go to L2, and then L2 will return to the source station. This can effectively avoid the problem of returning to the source site too often.


For the next diagram, WHICH I can’t make up, Ali mainly uses LVS + Tengine for load balancing and Swift for HTTP caching. That’s what they said, but it doesn’t really concern me, I’m just talking about the consistent Hash of the curly braces on the left.

Consistency Hash is the object map in 232 barrels of space, like a closed circle, of course, in fact, we will machine is mapped to the circle, such as using the alias or IP, clockwise will object to store content to the nearest machine, delete, and add a node, add and delete nodes, With clockwise migration, the original object is recalculated.

Need to know in detail on this we can see this article: blog.csdn.net/cywosp/arti…


Then finish this key algorithm, we almost digest – next door Ali cloud CDN is how to come out.

Of course it’s still none of our business…

Application and trampling pits

The most common application is for static resource acceleration in the front end. In fact, we can even make our own jsDelivr with CDN.

However, there are some basic laws we need to understand when using A CDN.

Cache Settings

If you want to Cache the CDN for a longer period of time, you need to set the CDN to a longer period of time. If you want to Cache the CDN for a longer period of time, you need to set the CDN to a longer period of time. That is s-maxage, which is used to set the proxy server cache time, overrides the max-age setting, so that we can use max-age for local cache and s-maxage for CDN cache time to avoid dirty data generation.

Cache hit ratio

For a cache, it’s also important to know whether your cache is useful or not, and this is measured by the cache hit ratio. If the resources are static, the hit ratio may decrease after the cache is refreshed, so the resources of CDN are not suitable for frequent refreshing. In other words, if the result of a request is frequently changed, then CDN basically has no meaning to exist.

Determines whether the cache is hit

No matter in our own development process, or in the process of helping customers debug, we will consider one thing — whether the resource hits the CDN, whether the problem is caused by CDN, at this time, we will show a wave of operations.


The CDN of each major manufacturer will have an X-cache that reads “Hit” or “Miss” and adds abbreviations such as “Memory” or “Disk” to indicate “Memory” or “Disk” and “Upstream” or “Miss” to indicate “missed”.

Resources preheating

In cache design, preheating is a very important link. When CDN is started at the beginning, there is no data cached on CDN. At this moment, a large number of requests are sent to the source station, which will definitely suspend the source station.

vary

In addition, many CDNS do not support the Vary header, so the Vary: Origin required by CORS is not guaranteed. In such cases, if you find that the Origin header is cached, you have to change the cross-domain header to * to match.

Range

In addition, if it is a large file, it is usually loaded by RANGE header fragment. But if the CDN is not sharded, it will repeatedly request complete resources from the source station, and the CDN will be in vain. Enabling RANGE back source can reduce traffic loss and correctly set RANGE back source. CDN cache can be correctly hit.

Selfless key HTTPS

In order to avoid tampering and hijacking, CDN must also use HTTPS, but this will lead to the transfer of certificates and private keys to the PLATFORM of CDN, which is a hidden danger for security.

So there is a selfless key solution, the user set up a private key server, by the CDN side to request the signature. The picture shows the implementation of Ali cloud, as long as KeyServer and configuration can be used on their own servers.


Here, when I shared it in FCC Shanghai, some students want to know more about it. Specifically, it involves a lot of HTTPS encryption, so I won’t expand it for the moment. Please take a look at an article sent by CloudFlare, and I happened to find the translated version: www.zcfy.cc/article/key…

Real world CDN


, for example, the node hang up, is that are a direct result of the loss of users, especially the size of static resource management in large companies rely on CDN, happen very serious consequences.

Secondly, cheap goods are not good: originally, CDN was very expensive in the era when there was only a website but no Ali Cloud. When Ali Tencent lowered the price of CDN, it also lowered the quality of CDN. The low access quality of some nodes would lead to very poor network quality for some users.

Then, a small popular science: what is the hybrid CDN — the term hybrid CDN looks very high-end, in fact, we use the CDN of many manufacturers, maybe including our own, and then choose the best one, but sometimes the service will be uncontrollable, and the quality of CDN will be further degraded.

conclusion

CDN this essence is a cache, just this cache the nearer you special special, or as a user can enjoy a little of that welfare development, but as a service to the developers of the enterprise, not only want to consider the merits of CDN, also want to know the pit of CDN brings, such ability of as users of CDN.