REDIS cache penetration, cache breakdown, cache avalanche cause + solution

  • Cache penetration: The data corresponding to the key does not exist in the data source. Each request for the key cannot be obtained from the cache, and the request will be sent to the data source, which may overwhelm the data source. For example, using a non-existent user ID to obtain user information, neither cache nor database, if hackers use this vulnerability to attack may overwhelm the database.
  • Cache breakdown: The data corresponding to the key exists, but expires in Redis. At this time, if a large number of concurrent requests come in, they usually load data from the back-end DB and set it back to the cache when the cache expires. At this time, the large number of concurrent requests may overwhelm the back-end DB instantly.
  • Cache avalanche: When a cache server is restarted or a large number of caches fail at a certain time, it also puts a lot of stress on the back-end system (such as DB) when it fails.

Cache penetration solution

A certain data does not exist in the cache and cannot be queried, because the cache is passive when the hit, and for fault tolerance consideration, if the data cannot be found from the storage layer will not be written into the cache, which will result in the non-existence of the data every request to the storage layer to query, lost the significance of the cache.

There are several ways to effectively solve the cache penetration problem. The most common one is to use bloom filters, which hash all possible data into a bitmap large enough that a data that must not exist will be intercepted by the bitmap, thus avoiding the query pressure on the underlying storage system. A more crude approach (which we did) is that if a query returns empty data (whether it’s nonexistent or a system failure), we still cache the empty result, but its expiration time is short, no more than five minutes.

Crude pseudocode:

Public object GetProductListNew() {int cacheTime = 30; String cacheKey = "product_list"; String cacheValue = CacheHelper.Get(cacheKey); if (cacheValue ! = null) { return cacheValue; } cacheValue = CacheHelper.Get(cacheKey); if (cacheValue ! = null) { return cacheValue; } else {cacheValue = GetProductListFromDB(); If (cacheValue == null) {// If it is found to be Empty, set a default value that is also cached cacheValue = string.empty; } CacheHelper.Add(cacheKey, cacheValue, cacheTime); return cacheValue; }}Copy the code

Cache breakdown solution

A key may be accessed at very high concurrency at certain points in time and is a very “hot” type of data. At this point, there is a problem to consider: cache breakdown.

Use mutex keys

A common practice in the industry is to use Mutex. In simple terms, when the cache fails, instead of loading the db immediately, the cache tool will first set a mutex key with the return value of the successful operation (such as Redis SETNX or Memcache ADD). When the return value of the successful operation is returned, Load DB and set the cache; Otherwise, retry the entire GET cache method.

SETNX, which stands for “SET if Not eXists”, can be used to achieve the effect of a lock.

public String get(key) { String value = redis.get(key); If (value == null) {if (value == null) {if (value == null) { If (redis.setnx(key_mutex, 1, 3 * 60) == 1) {value = db.get(key); redis.set(key, value, expire_secs); redis.del(key_mutex); } else {sleep(50); sleep(50); sleep(50); get(key); }} else {return value; }}Copy the code

Memcache code:

if (memcache.get(key) == null) { // 3 min timeout to avoid mutex holder crash if (memcache.add(key_mutex, 3 * 60 * 1000) == true) { value = db.get(key); memcache.set(key, value); memcache.delete(key_mutex); } else { sleep(50); retry(); }}Copy the code

Other plans: to be supplemented by you.

Cache avalanche solution The avalanche effect of cache failure on the underlying system is terrible! Most system designers consider locking or queuing to ensure that there will not be a large number of threads reading and writing to the database at one time, so as to avoid a large number of concurrent requests falling on the underlying storage system in the event of failure. Another simple solution is to disperse the cache expiration time. For example, we can add a random value on the basis of the original expiration time, such as 1-5 minutes random, so that the repetition rate of each cache expiration time will be reduced, and it is difficult to trigger the collective failure event.

Lock queue, pseudo-code is as follows:

Public object GetProductListNew() {int cacheTime = 30; String cacheKey = "product_list"; String lockKey = cacheKey; String cacheValue = CacheHelper.get(cacheKey); if (cacheValue ! = null) { return cacheValue; } else { synchronized(lockKey) { cacheValue = CacheHelper.get(cacheKey); if (cacheValue ! = null) { return cacheValue; CacheValue = GetProductListFromDB(); CacheHelper.Add(cacheKey, cacheValue, cacheTime); } } return cacheValue; }}Copy the code

Locking queuing is just to relieve the pressure on the database, and does not improve system throughput. Suppose that in high concurrency, the key is locked during the cache rebuild, which is the last 1000 requests and 999 are blocked. It will also cause users to wait for timeout, which is a palliative method!

Note: the solution to the concurrency problem of distributed environment may also solve the problem of distributed lock; Threads are blocked and the user experience is poor! Therefore, rarely used in true high-concurrency scenarios!

Random value pseudocode:

Public object GetProductListNew() {int cacheTime = 30; String cacheKey = "product_list"; String cacheSign = cacheKey + "_sign"; String sign = CacheHelper.Get(cacheSign); String cacheValue = cacheHelper. Get(cacheKey); if (sign ! = null) { return cacheValue; Add(cacheSign, "1", cacheTime);} else {cachehelper. Add(cacheSign, "1", cacheTime); ThreadPool. QueueUserWorkItem ((arg) - > {/ / there is generally an SQL query data cacheValue = GetProductListFromDB (); Cachehelper. Add(cacheKey, cacheValue, cacheTime * 2); }); return cacheValue; }}Copy the code

Explanation:

  • Cache flag: Records whether the cache data has expired. If it has expired, another thread will be notified to update the cache of the actual key in the background.
  • Cache data: its expiration time is twice as long as that of the cache tag, for example: the tag cache time is 30 minutes, and the data cache is set to 60 minutes. This way, when the cache key expires, the actual cache can still return the old data to the caller until another thread completes the update in the background.

There are three proposed solutions to cache crashes: use locks or queues, update the cache with expiration flags, set different cache expiration times for keys, and a solution called “level 2 caching.”

For the business system, it is always a case by case analysis, there is no best, only the most appropriate.

Other cache issues, full cache and data loss issues, we can learn by ourselves. LRU, RDB and AOF are used to handle overflow. Redis’ RDB and AOF persistence are used to ensure data security under certain conditions.