This article gives you an idea of the layered architecture of your multi-level cache

preface

With the rapid development of the Internet, cache technology has been widely used. When it comes to performance problems, both inside and outside the industry, the phrase “cache it” is a common refrain.

This is one-sided and even half-informed, but as professionals we need to have a deeper and broader understanding of caching.

Caching technology exists in every aspect of application scenarios. From browser requests, to reverse proxy servers, from in-process caching to distributed caching. Among them cache strategy, algorithm is also emerging in endlessly, today with you into the cache.

The body of the

Caching is familiar to every developer, and we use caching to improve application performance, but where and how?

For a web site that needs to improve performance, caching can be in the browser, it can be in the reverse proxy server, it can be in the application process, it can be in a distributed caching system.

From user request data to data return, data through the browser, CDN, proxy server, application server, and database each link. Caching can be used at every step.

From the browser/client to request data, through HTTP with CDN to obtain the change of data, to the proxy server (Nginx) can obtain static resources through the reverse proxy.

Further down, the application server can retrieve data through in-process (in-heap) caches, distributed caches, and so on. If none of the above caches hit the data, the source is returned to the database.

The order of cache request is: user request → HTTP cache → CDN cache → proxy server cache → in-process cache → distributed cache → database.

It seems that every part of the architecture of the technology can add caching, and see how each part applies caching technology.

1. The HTTP cache

When a user requests the server through the browser, an HTTP request is made. Caching each HTTP request can reduce the pressure on the application server.

When the first request is made, the local cache of the browser does not cache the data. The data is fetched from the server and stored in the browser cache. The next request will read the local or service information according to the cache policy.

General information is passed through HTTP request headers. At present, there are two common cache methods:

Mandatory cache
Compared to the cache

1.1. Mandatory caching

If the browser’s local cache library stores the cached information, you can directly use the cached data if the cached data is not invalid. Otherwise, you need to retrieve the data again.

This caching mechanism seems straightforward, so how do you determine if cached data is invalid? There are two fields in the HTTP Header that you need to focus on here: Expires and cache-Control.

Expires Indicates the expiration date returned by the server. When a client first requests a resource, the server returns the expiration date. If the client requests the server again, the request time is compared to the expiration time.

If the request time is less than the expiration time, then the cache is not expired and you can use the information from the local cache library directly.

Otherwise, the data has expired. You must obtain the information from the server again. After obtaining the information, the latest expiration time is updated.

This approach was used more frequently in HTTP 1.0 and will be replaced by cache-control in HTTP 1.1.

Cache-control has a max-age attribute, in seconds, that indicates when cached content will expire on the client.

For example, if the value of max-age is 60 seconds, the current cache has no data. After the client makes the first request, the data is put into the local cache.

If the client sends a request within 60 seconds, it will not request the application server, but will return the data directly from the local cache. If the interval between two requests is more than 60 seconds, the data needs to be retrieved from the server.

1.2. Compare caches

You need to compare the cache flags before and after to determine whether to use the cache. The first time the browser requests it, the server returns the cache id along with the data, and the browser backs up both to a local cache. When the browser requests again, the backup cache id is sent to the server.

The server makes a judgment based on the cache id. If the judgment data does not change, the server sends the 304 status code to the browser.

The browser can then use the cached data. The server just returns the Header, not the Body.

The following are two identification rules:

1.2.1. The last-modified/If Modified – Since the rules

On the first request from the client, the server returns the Last time the resource was Modified, denoted last-modified. The client caches this field along with the resource.

After last-Modified is saved, the last-Modified-since field will be sent on the next request.

When the client requests the server again, it sends last-Modified to the server along with the requested resource. In this case, last-Modified is named if-Modified-since, which stores the same content.

The server receives the request and compares the if-Modified-since field to the last-Modified field saved on the server:

If the last-modified time on the server is longer than the requested if-modified-since, the resource has been Modified, and the resource (including Header+Body) is returned to the browser with a status code of 200.
If the last modification time of the resource is less than or equal to if-modified-since, the resource has not been Modified. Only the Header is returned, and the status code 304 is returned. The browser receives this message and can use the data from the local cache.

Note: Last-modified and if-Modified-since refer to the same value, but are called differently on the client and server sides.

ETag/if-none-match rule 1.2.2

The server generates an ETag for each resource when the client first requests it. The ETag is a unique Hash string generated for each resource, which changes as the resource changes. The ETag is then returned to the client, which caches both the requested resource and the ETag locally.

Once the ETag is saved, it will be sent as an if-none-match field on the next request.

When the browser requests the same resource a second time, it sends the ETag corresponding to the resource to the server. ETag is converted to if-none-match on request, but its contents remain the same.

When the server receives the request, it compares if-none-match with the ETag of the resource on the server:

If they are inconsistent, the resource has been modified. The resource (Header+Body) is returned with the status code 200.
If yes, the resource has not been modified. Header is returned with the status code 304. The browser receives this message and can use the data from the local cache.

Note: ETag and if-none-match refer to the same value, but are called differently on the client and server sides.

2. The CDN cache

HTTP caching is used to cache static data from the server to the client/browser.

If a layer of CDN is added between the client and server, the CDN can be used to provide caching for the application server. If CDN is cached on the CDN, the application server need not be requested. And the two policies mentioned by THE HTTP cache can also be executed on the CDN server.

The full name of CDN is Content Delivery Network.

Let’s see how it works:

The client sends the URL to the DNS server.
DNS directs requests to the DNS load balancer in the CDN through domain name resolution.
The DNS load balancer informs the DNS of the latest CDN IP address, and the DNS informs the client of the latest CDN IP address.
The client requests the nearest CDN node.
The CDN node returns the resources from the application server to the client and caches the static information. Note: the client’s next interaction is with the CDN cache, which can synchronize cached information with the application server.

CDN accepts requests from clients. It is the server closest to the clients. It links multiple servers behind it and plays a role of caching and load balancing.

3. Load balancing cache

After client (HTTP) caching and CDN caching, we are getting closer to application services, where requests pass through the load balancer before they reach the application service.

Although its main job is to load balance the application server, it can also be used as a cache. You can cache some infrequently modified data here, such as user information and configuration information. This cache is refreshed periodically by the service.

Taking Nginx as an example, let’s see how it works:

Before a user request reaches the application server, the user accesses the Nginx load balancer. If any cached information is found, the request is directly returned to the user.
If no cached information is found, Nginx goes back to the application server to get the information.
In addition, there is a cache update service that periodically updates relatively stable information from the application server to the Nginx local cache.

4. In-process cache

Through the client, CDN, and load balancer, we finally arrived at the application server. What about the cache in the process where applications are deployed on the application server and run as processes?

The in-process cache, also known as the managed heap cache, in the case of Java, sits on top of the JVM’s managed heap and is affected by the managed heap reclamation algorithm.

Since it runs in memory and responds quickly to data, we usually put hot data here.

In case of an in-process cache hit, we search for out-of-process caches or distributed caches. This cache has the advantage of being the fastest cache with no serialization or deserialization. The disadvantage is that the cache space can not be too large, which affects the performance of the garbage collector.

The popular implementations include Ehcache, GuavaCache and Caffeine. These architectures make it easy to put hot data in an in-process cache.

Here we need to pay attention to several cache collection strategies. The specific implementation architecture may have different collection strategies, but the general idea is the same:

FIFO (First In First Out) : First In First Out (FIFO) algorithm. The data that is First put into the cache is First removed.
LRU (Least Recently Used) : removes the data that has not been Used for a long time from the cache.
LFU (Least Frequently Used) : Indicates the Least Frequently Used algorithm. Data that is Least Frequently Used in a period of time is removed from the cache.

In today’s distributed architecture, in-process caching in multiple applications can lead to data consistency problems.

Two options are recommended:

Message queue modification scheme
Timer Modification scheme

4.1. Message queue modification scheme

After modifying its cache data and database data, the application sends a notification of data change to the message queue. Other applications subscribe to the notification and modify the cache data upon receiving the notification.

4.2. Timer modification scheme

To avoid coupling, reduce complexity, and be insensitive to “real-time consistency” cases. Each application starts a Timer to retrieve the latest data from the database and update the cache.

However, after some applications update the database, other nodes will read dirty data between getting data through Timer. Here, the frequency of Timer needs to be well controlled, and the application and scenes with low real-time requirements need to be well controlled.

What are the use scenarios for in-process caching?

Scenario 1: Read-only data, which can be loaded into memory at process startup. Of course, loading data into an out-of-process caching service such as Redis also solves this problem.
Scenario 2: High concurrency, consider using in-process caching, for example, seckill.

5. Distributed cache

With in-process caching, the transition to out-of-process caching is natural. Unlike in-process caching, out-of-process caching exists outside of the process in which the application is running, has a larger cache capacity, and can be deployed to different physical nodes, usually as a distributed cache.

Distributed cache is a cache service separated from applications. It is an independent application or service and is isolated from local applications. Multiple applications can directly share one or more cache applications or services.

Since it is a distributed cache, the cached data is distributed among different cache nodes, and the size of the data cached by each cache node is usually limited.

Data is cached to different nodes. To facilitate access to these nodes, you need to introduce a caching proxy, similar to Twemproxy. It helps the request find the corresponding cache node.

Also, if the number of cache nodes is increased, the proxy can only recognize and fragment the new cache data to the new node for horizontal scaling.

To improve cache availability, a Master/Slave design is added to the existing cache nodes. When cached data is written to the Master node, a copy is synchronized to the Slave node.

If the Master node fails, you can directly switch to the Slave node through the proxy. Then the Slave node becomes the Master node to ensure the normal operation of the cache.

Each cache node also provides a mechanism for cache expiration, and periodically saves the cache contents to a file in the form of snapshots, which is convenient to start preheating loading after the cache crashes.

5.1. High performance

When caching is distributed, data is allocated to each caching application/service on a regular basis.

If we call these caching applications/services cache nodes, each node can generally cache a certain amount of data. For example, Redis can cache 2 gb of data per node.

If a large amount of data needs to be cached, multiple cache nodes need to be extended to achieve this. With so many cache nodes, what if the client requests do not know which node to access? How will cached data be placed on these nodes?

Caching proxy services already help us solve these problems. For example, Twemproxy not only helps cache routes, but also manages cache nodes.

Here are three algorithms for caching sharded data that can be easily found by the cache agent.

5.1.1. Hashing algorithm

Hash tables are the most common data structures. They are implemented by hashing the key values of data records and then modulating the number of cache nodes to be sharded to allocate the remainder of the data.

For example, there are three records: R1, R2, and R3. Their ids are 0102, 03, respectively, assuming that the Hash algorithm on the ids of the three records as key values still yields 0102, 03.

We want to put these three records in three cache nodes. We can modulo this result with the number 3 to get the remainder, which is the cache node where these three records are placed.

The Hash algorithm is somewhat of an average placement. The strategy is relatively simple. If you want to add a cache node, the existing data will be substantially changed.

5.1.2. Consistent hashing algorithm

Consistent Hash maps data by eigenvalues to a end-to-end Hash ring that also maps cache nodes to.

If you want to cache data, you can find the location on the ring by the Key of the data. The data is sorted on the ring by the Hash value of its own ID.

If a new piece of data with ID 115 were to be inserted at this point, it would be inserted in the position shown below.

Similarly, if you want to add a cache node N4 150, you can also put it in the position shown below.

This algorithm has relatively low overhead for adding cached data and cached nodes.

5.1.3. Range Based algorithm

In this approach, the data is divided into different intervals based on key values (such as ID), and each cache node is responsible for one or more intervals. It’s a little bit like consistent hashing.

For example, there are three cache nodes: N1, N2, and N3. The intervals they used to store data were N1(0, 100), N2(100, 200), and N3(300, 400).

Then the result of the Hash based on the data ID as the key will be put into these fields respectively.

5.2. Availability

Given the two sides of the coin, while distributed caching brings high performance, we also need to focus on its usability. So what are the potential risks that we need to guard against?

5.2.1. Cache avalanche

When the cache expires, the cache expiration is cleared, and the cache is updated. The request will not hit the cache and will be returned directly to the database.

If the above situations occur frequently or simultaneously, a large number of requests will be sent directly to the database, resulting in database access bottlenecks. We call this a cache avalanche.

Think of a solution in two ways:

Caching:

To avoid cache invalidation at the same time, set different timeout periods for different keys.
Mutex is added to protect cache update operations by locking them so that only one thread can update the cache. Once the cache fails, the cache can be quickly rebuilt by caching snapshots. Add the active/standby mechanism for the cache node, and switch to the standby cache to continue working after the primary cache fails.

In terms of design, here are some suggestions for your reference:

Circuit breaker mechanism: When a cache node fails to work, the cache agent needs to be notified not to route requests to the node, reducing user waiting and request time.
Traffic limiting mechanism: Traffic limiting can be implemented at the access layer and proxy layer. When the cache service cannot support high concurrency, the front-end can queue or discard unresponsive requests.
Quarantine mechanism: The request is queued when the cache is unavailable or is being preheated so that the request is not routed to another cache node because it is quarantined.
In this way, problems with this node will not affect other nodes. After the cache is rebuilt, requests are removed from the queue and processed sequentially.

5.2.2. Cache penetration

The cache usually exists in the form of Key and Value. If a Value corresponding to a Key does not exist, the request is sent back to the database.

If the corresponding Value does not exist all the time, the database will be frequently requested, causing access pressure to the database. If someone exploits this vulnerability, they’re in trouble.

Solution: If a query returns an empty Value for a Key, we still cache the empty result. If the Value does not change, the next query will not request the database.

Hash all possible data into a Bitmap large enough that non-existent data will be intercepted by the Bitmap filter, avoiding query pressure on the database.

5.2.3. Cache breakdown

At the time of the data request, one of the caches is invalid or being written to the cache, and the cached data may be excessively concurrent requests at this time point, and become “hot” data.

This is the cache breakdown problem. The difference between this and cache avalanche is that this is for one cache and this is for multiple caches.

Solution: The problem is that the cache is being read/written at the same time, so ensure that only one thread is writing at the same time, and then the cache can be used for other requests.

A more common approach is to use mutex (mutex). Instead of writing to the cache immediately when the cache fails, a MUtex is set first. When the cache has been written, release the lock and let the request access it.

summary

To summarize, there are five strategies for cache design, starting with user requests:

HTTP cache
CDN cache
Load balancing cache
In-process cache
Distributed cache

The first two cache static data and the last three cache dynamic data:

HTTP caching includes mandatory caching and comparison caching.
CDN caching and HTTP caching are good partners.
The load balancer caches relatively stable resources and requires services to assist.
In-process caching is efficient, but capacity is limited. There are two solutions to the cache synchronization problem.
Distributed cache is large and powerful, keeping three performance algorithms in mind and guarding against three cache risks.

Follow the public account “One Technology Stack”, continue to push high-quality backend technology dry goods, including virtual machine foundation, multi-threaded programming, high-performance framework, asynchronous, cache and messaging middleware, distributed and micro services, architecture learning and advanced learning materials and articles.