Current I/O devices cannot meet the massive read and write requests of Internet applications. Hence the appearance of caches, which take advantage of the high-speed read and write performance of memory to cope with massive query requests. However, memory resources are very valuable, and it is not practical to store the full amount of data in memory. Therefore, memory and I/O are combined. Memory stores hotspot data, while I/O devices store full data. There are many tricks involved in the design of caches, which can have serious consequences if not properly designed. This article introduces three common problems with cache usage and provides corresponding solutions.

1. Cache penetration

In most Internet applications, caching is used as follows:

  1. When the service system initiates a query request, it first checks whether the data exists in the cache.
  2. If it exists in the cache, the data is returned directly;
  3. If it does not exist in the cache, the database is queried again and the data is returned.

With that in mind, let’s talk about cache penetration.

1.1 What is Cache penetration?

Business system to query the data exist! When a service system initiates a query, it first searches the cache because it does not exist in the cache, and then searches the database. Because the data does not exist, the database returns null. This is cache penetration.

To sum up: Business systems accessing data that does not exist is called cache penetration.

1.2 Hazards of Cache penetration

If there is a huge request data query doesn’t exist at all, so these massive requests will fall to the database, the database pressure increase, may cause a system crash (you know, the business system is the most vulnerable in IO, it will collapse to a little pressure, so we’re going to want a variety of ways to protect it.

1.3 Why does cache penetration occur?

There are many reasons for cache penetration, generally as follows:

  1. Malicious attacks, deliberately create a large number of non-existent data requests to our service, because the cache does not exist these data, so massive requests fall in the database, which may lead to database crash.
  2. Code logic error. This is the programmer’s pot, nothing good to say, must be avoided in development!

1.4 Cache penetration Solution

Here are two ways to prevent cache penetration.

1.4.1 Caching null data

Cache penetration occurs because there is no key in the cache to store the empty data, causing all requests to hit the database.

Then, we can modify the code of the business system slightly to store the key in the cache if the database query results are empty. When a query request for the key is returned, the cache returns NULL without querying the database.

1.4.2 BloomFilter

The second way to avoid cache penetration is to use BloomFilter.

It needs to add a barrier before the cache, which stores all the keys currently in the database, as shown in the following figure:

When the service system receives a query request, the system checks whether the key exists in BloomFilter. If no, the data does not exist in the database. Therefore, do not check the cache and return NULL. If it exists, the subsequent process continues, first to the cache for query, and then to the database for query if it does not exist.

1.4.3 Comparison of the two schemes

Both solutions solve the problem of cache penetration, but the usage scenarios are different.

For some malicious attacks, query keys are often different, and there are many data thieves. At this point, the first option seems to be too tight. Because it requires a key to store all the empty data, these malicious attacks often have different keys, and the same key is often requested only once. So even if a key is cached for empty data, it does not protect the database because it is not used a second time. Therefore, the second option should be used in scenarios where the keys of empty data are different and the probability of repeated key requests is low. However, in the scenario where the number of empty keys is limited and the probability of repeated key requests is high, the first solution should be selected.

2. Cache avalanche

2.1 What is cache avalanche?

As we saw above, caching actually plays a role in protecting the database. It helps the database withstand a large number of query requests, thus preventing vulnerable databases from being harmed.

If the cache goes down for some reason, a flood of query requests that the cache would otherwise be fended off would flood the database like mad dogs. At this point, if the database cannot withstand the tremendous pressure, it will collapse.

This is cache avalanche.

2.2 How to avoid cache avalanche?

2.2.1 Use a Cache cluster to ensure high cache availability

That is, before an avalanche happens, take precautions to prevent it from happening. PS: The issue of distributed high availability is not the focus of today’s discussion, the routine is that, there will be high availability related articles, please pay attention to.

2.2.2 use Hystrix

Hystrix is an open source “avalanche protection tool” that uses three methods to reduce avalanche damage: circuit breakers, downscaling, and current limiting.

Hystrix is a Java class library that uses a command model, with a separate processor for each service that handles requests. All requests go through their respective handlers. The processor records the request failure rate for the current service. Once Hystrix finds that the current service’s request failure rate reaches a preset value, it will reject all subsequent requests for that service and return a preset result. This is known as a “circuit breaker”. After a certain period of time has passed, Hystrix will allow some of the requests to the service and again calculate its request failure rate. If the request failure rate meets the preset value, the traffic limiting switch is fully turned on. If the request failure rate remains high, continue to reject all requests for the service. This is called “limiting the flow”. Hystrix, on the other hand, returns a default result directly to rejected requests, called a “downgrade.”

The introduction of more Hystrix see: https://segmentfault.com/a/1190000005988895

3. The hotspot data set fails

3.1 What is hotspot Data Set Failure?

We generally set an expiration time for the cache. After the expiration time, the database will be directly deleted by the cache, so as to ensure the real-time performance of data to a certain extent.

However, for some hot data with high volume of requests, once the validity period expires, a large number of requests will fall on the database, which may lead to database crash. The process is shown in the figure below:

If a hotspot data fails, the database will be queried when there is another query request [REQ-1] for this data. However, between the time the request is sent to the database and the time the data is updated to the cache, any query requests that arrive during this time will fall on the database because the data is still not in the cache, which will cause a great deal of strain on the database. In addition, the cache is repeatedly updated as these request queries are completed.

3.2 Solutions

3.2.1 the mutex

We can use the locking mechanism of the cache to lock the data in the cache when the first database query request is made. At this time, other query requests that reach the cache cannot query this field and are blocked waiting. When the first request completes the database query and the data is cached with updated values, the lock is released; Other blocked query requests will be able to retrieve the data directly from the cache.

When a hotspot data fails, only the first query request is sent to the database, and all other query requests are blocked, protecting the database. However, because of the mutex, other requests will block the wait, and the throughput of the system will decrease. This needs to be considered in the context of the actual business.

The mutex prevents the database from crashing due to the failure of a single hotspot data. In actual services, a batch of hotspot data may fail simultaneously. So how do you prevent database overloads in this scenario?

3.3.2 Set different failure times

When we store these data in the cache, we can stagger their cache expiration dates. This avoids simultaneous failures. For example, add/subtract a random number from a base time to stagger the expiration times of these caches.