Cache avalanche:

What is cache avalanche?

If a large number of key failures occur at a certain time, a large number of requests will hit the database, resulting in huge pressure on the database. In the case of high concurrency, the database may break down instantly. At this time, if the operation and maintenance immediately restart the database, immediately there will be new traffic to the database. This is cache avalanche.

2. Problem Analysis:

The key to the cache avalanche was a massive key failure at the same time, and there are two main possibilities for why this happened: first, Redis went down, and second, the same expiration time was adopted. Now that we know why, what’s the solution?

3. Solutions:

(1) Prior to:

(1) Uniform expiration: set different expiration time, so that the cache expiration time is as even as possible, to avoid the same expiration time resulting in cache avalanche, resulting in a large number of database access.

② Hierarchical cache: On the basis of the failure of the first-level cache, access to the second-level cache, and the failure time of each level cache is different.

③ The hot data cache never expires.

Never expire actually has two meanings:

  • The physical key does not expire, and the expiration time is not set for the hotspot key

  • Logical expiration: store the expiration time in the value corresponding to the key. If it is found to be about to expire, build the cache through an asynchronous thread in the background

④ Ensure the high availability of Redis cache and prevent the cache avalanche caused by Redis downtime. You can use master-slave + sentry, Redis clusters to avoid a total Redis collapse.

(2) In the event:

(1) Mutex: After the cache is invalid, the mutex or queue is used to control the number of threads that read and write the data cache. For example, only one thread is allowed to query the data and write the cache for a certain key, while other threads wait. This approach blocks other threads, and the throughput of the system decreases

② Use circuit breaker mechanism to limit current degradation. When the traffic reaches a certain threshold, a message such as “System congestion” is displayed to prevent excessive requests from hitting the database and causing the database to collapse. In this way, at least some users can use the database normally, and other users can obtain results even after refreshing for several times.

(3) Afterwards:

(1) Enable Redis persistence mechanism to recover cached data as soon as possible. Once restarted, data can be automatically loaded from disk to restore data in memory.

 

Two, cache breakdown:

1. What is cache breakdown?

A cache breakdown is similar to a cache avalanche. A cache avalanche is a large-scale key failure, while a cache breakdown is a key failure of a hot spot. A large number of concurrent requests will result in a large number of requests to read data from the cache, resulting in a high number of concurrent access to the database, resulting in a sharp increase in database pressure. This phenomenon is called cache breakdown.

2. Problem Analysis:

The key of a hot spot fails, causing a large number of concurrent calls to the database. Therefore, we need to solve the problem from two aspects: first, we can consider whether the hot key does not set expiration time; second, we can consider reducing the number of requests made on the database.

3. Solutions:

(1) After cache failure, the number of threads reading data and writing cache is controlled by mutex or queue. For example, a key allows only one thread to query data and writing cache, while other threads wait. This approach blocks other threads, and the throughput of the system decreases

(2) The hot data cache never expires.

Never expire actually has two meanings:

  • The physical key does not expire, and the expiration time is not set for the hotspot key

  • Logical expiration: store the expiration time in the value corresponding to the key. If it is found to be about to expire, build the cache through an asynchronous thread in the background

 

3. Cache penetration:

1. What is cache penetration?

Cache penetration means that the data requested by the user does not exist in the cache, that is, does not match the data, and does not exist in the database. As a result, the user has to query the data in the database every time he requests the data. If a malicious attacker constantly requests data that does not exist in the system, it will cause a large number of requests to fall on the database in a short time, resulting in excessive pressure on the database, and even lead to the breakdown of the database.

2. Problem Analysis:

The key to cache penetration is that the key value cannot be found in Redis. The fundamental difference between cache penetration and cache penetration is that the key passed in does not exist in Redis. If a hacker passes a large number of non-existent keys, then a large number of requests hit the database is a very fatal problem, so in daily development to do a good check on the parameters, some illegal parameters, impossible to exist key directly return error prompt.

3. Solutions:

(1) Store invalid keys in Redis:

If Redis cannot find the data and the database cannot find the data, we will save the key to Redis, set value=”null”, and set the expiration time to be very short. If there is a request for this key, we will directly return NULL, and no need to query the database. But there is a problem with this approach. If the nonexistent Key is passed in randomly every time, there is no point in saving it in Redis.

(2) Use bloom filter:

If the Bloom filter determines that a key does not exist in the Bloom filter, then it certainly does not exist. If the bloom filter determines that a key exists, then it probably does (there is a certain misjudgment rate). So we can add a Bloom filter before the cache, store all the keys in the database in the Bloom filter, before querying Redis, go to the Bloom filter to check whether the key exists, if it does not exist, directly return, do not let it access the database, so as to avoid the query pressure on the underlying storage system.

How to choose: For some malicious attacks, a large number of keys brought by the attack are random, so we use the first scheme to cache a large number of data without keys. Then this scheme is not suitable, we can first use bloom filter scheme to filter out these keys. Therefore, the second scheme is preferred for filtering out the data with a large number of abnormal keys and a low request repetition rate. For empty data with limited keys and high repetition rate, the first method can be used preferentially.

 

4. Cache preheating:

1. What is cache preheating?

Cache preheating means that relevant cache data is loaded to the cache system in advance after the system goes online. Avoid the problem that users query the database first and then cache the data. Users directly query the cached data that has been preheated in advance.

If there is no preheating, the initial status data of Redis will be empty. In the early stage of system on-line, high concurrent traffic will be accessed to the database, causing traffic pressure to the database.

2. Cache preheating solution:

(1) When the amount of data is not large, the loading and caching action is carried out when the project is started;

(2) When there is a large amount of data, set a scheduled task script to refresh the cache;

(3) When the amount of data is too large, priority is given to ensure that hotspot data is loaded into the cache in advance.

 

5. Cache degradation:

Cache degradation refers to the failure of the cache or the failure of the cache server. Instead of accessing the database, the default data is returned or the memory data of the service is accessed. A downgrade is generally harmful. Therefore, minimize the impact of a downgrade on services.

In actual project practice, it is common to cache some hot data in the memory of the service. In this way, once the cache is abnormal, the service memory data can be directly used, thus avoiding huge pressure on the database.

 

  • [reproduced from the] – blog.csdn.net/a745233700/…