preface

When designing a cache system, we have to consider the avalanche effect of cache penetration, cache breakdown and failure.

Cache penetration (a key that does not exist)

Cache penetration refers to the query of a certain data does not exist, because the cache is passively written when the data is not hit, and for fault tolerance, if the data cannot be found from the storage layer, the data will not be written to the cache, so that the non-existent data will be queried to the storage layer every time, losing the significance of cache. During heavy traffic, DB may fail. If someone attacks our application frequently with non-existent keys, this is a vulnerability.

The solution

There are several ways to effectively solve the cache penetration problem. The most common one is to use bloom filters, which hash all possible data into a bitmap large enough that a data that must not exist will be intercepted by the bitmap, thus avoiding the query pressure on the underlying storage system. A more crude approach (which we did) is that if a query returns empty data (whether data is nonexistent or a system failure), we still cache the empty result, but its expiration time is short, no more than five minutes.

P.S. Redis can install bloom filters as described in the following article

Redis Bloom filter – pk.com.cn – Blog park www.cnblogs.com

Cache avalanche (a batch of expired keys)

Cache avalanche refers to that the cache is set with the same expiration time, so that the cache becomes invalid at a certain time and all requests are forwarded to DB. The DB is overloaded instantly.

The solution

The avalanche effect of a cache failure on the underlying system is terrifying. Most system designers consider locking or queuing cached single-threaded (process) writes to avoid large numbers of concurrent requests falling on the underlying storage system in the event of a failure. For example, we can add a random value on the basis of the original expiration time, such as 1-5 minutes random, so that the repetition rate of each cache expiration time will be reduced, and it is difficult to cause a collective failure event.

Cache breakdown (an expired key)

For some keys that are set to expire, it is a very “hot” data if the keys are likely to be accessed at a very high concurrency at some point in time. At this point, there is a problem to consider: cache “breakdown”. The difference between this and cache avalanche is that the cache is for one key, whereas the cache is for many keys.

When the cache expires at a certain point in time, a large number of concurrent requests for the Key come in, and these requests find that the cache is out of date and usually load data from the back-end DB back to the cache, the large number of concurrent requests can overwhelm the back-end DB instantly.

The solution

1. Use mutex keys

A common practice in the industry is to use Mutex. In simple terms, when the cache fails, instead of loading the db immediately, the cache tool will first set a mutex key with the return value of the successful operation (such as Redis SETNX or Memcache ADD). When the return value of the successful operation is returned, Load DB and set the cache; Otherwise, retry the entire GET cache method.

SETNX, which stands for “SET if Not eXists”, can be used to achieve the effect of a lock. Prior to redis2.6.1 the setnx expiration time was not implemented, so here are two versions of the code reference:

2. Use mutex keys “early” :

Set a timeout value (timeout1) inside the value, which is smaller than the actual memcache timeout(timeout2). When timeout1 is read from the cache and found to be out of date, extend timeout1 and reset it to the cache. The data is then loaded from the database and set to the cache. The pseudocode is as follows:

3. “Never expire” :

Here “never expires” has two meanings:

(1) From redis, it is true that there is no set expiration time, which ensures that there will not be hot key expiration problem, that is, “physical” does not expire. (2) From the point of view of function, if not expired, it is not static? So we store the expiration time in the value corresponding to the key. If we find that the expiration time is about to expire, we build the cache through a background asynchronous thread, which is called “logical” expiration

From a practical point of view, this approach is very performance-friendly. The only downside is that while building the cache, other threads (non-building cache threads) may access old data, but this is tolerable for general Internet functionality.

4. Resource Protection:

With Netflix Hystrix, you can isolate resources to protect the main thread pool, and if you can apply this to caching, you can do the same.

Four solutions: there is no best, only the most suitable

conclusion

For business systems, it is always a case by case analysis, there is no best, only the most appropriate.

Finally, the common problems of full cache and data loss in the cache system need to be analyzed according to the specific business. LRU strategy is usually used to deal with overflow, and Redis RDB and AOF persistence strategy is used to ensure data security under certain circumstances.