Cache avalanche and cache penetration are not the same concept

Cache avalanche

When the cache is invalid or the cache data is not ready, the access of high concurrent requests cannot be blocked, and the access to the database leads to the database downtime or delay. The database is depended on by a large number of other services, resulting in the service crash of a large area, which ultimately leads to the crash of the whole system or website.

Solution:

1. Distributed lock: only one thread can obtain the lock, and determine whether the cache data exists after obtaining it. If there is no cache data and updating the cache, there is data acquisition, other threads block, and release the lock after completing the cache update or data acquisition

2. Data preheating: the data will be cached in advance, and the data will not be cached according to the user’s request. If the data volume is large, the operation and maintenance will be manually triggered after startup

3. Two-tier cache degradation strategy: when cache C1 fails, request cache backup C2, C1 has a short failure time, and C2 has a long failure time

4. Off-peak update cache: update cache and off-peak of user request peak, and customize update strategy

5, set different expiration time, so that the cache expiration time is as uniform as possible, so that a large number of users will not cache failure at the same time

Cache avalanches occur only for very high concurrency scenarios, moderate traffic scenarios are not severe, and then cache penetration is very severe for moderate traffic scenarios

The cache to penetrate

When a data is not hit, a large number of requests are put directly into the database

Solution:

1. Cache empty data and set an appropriate expiration time to update the cache

2. Bloom filters out requests that cannot have cached values

Bloom filter: Hash all possible data into a bitmap large enough that a non-existent data will be intercepted by the bitmap, thus avoiding the query pressure on the underlying storage system

A long piece of information is converted into a bit vector by multiple hash functions, and the existence of the information is determined according to the result of the vector.

Advantages: Space saving: the bit is very space saving, a very long thing becomes several bits save time: the judgment and operation bits are fast, and the hash algorithm is quickly converted into bit vector disadvantages: certain accuracy is sacrificed

Here’s how guava implements the Bloom filter:

private BloomFilter<String> bf; Bf = bloomfilter. create(funnel.stringfunnel (charsets.utf_8), allUsers.size()); bf.put(userDto.getUserName()); if(! bf.mightContain(randomUser)){ System.out.println("bloom filter don't has this user"); return; }Copy the code

Integrated solution

Note: Some ideas refer to jingdong home best practice

This diagram is mainly distributed locking and caching null values to avalanche, penetration

Here mainly explain the details of the distributed lock: lock the thread to have the opportunity to update the cache, but didn’t get the thread lock in the absence of a “deadlock problem” can only wait for the first thread gets the lock will cache the data ready, enjoy the cached data directly so there are two kinds of situations to lock acquisition is different:

Case 1: Concurrent requests (multiple requests) correspond to the same QUERY object in the DB

In this case, concurrent requests can compete for the same lock. For example, if the configuration information queried by users in a province is consistent, the key can be in the unit of province, for example, 110config, and all concurrent requests from users in a province will compete for the 110config lock

Case 2: Concurrent requests (multiple requests) correspond to multiple DB query objects (extreme case 1-to-1)

This kind of situation, can’t all concurrent requests for the lock, this logic is not correct, each user has its own order data, for example, each user to query the database results are also different, so the key is not to save for the unit, otherwise the cached results is certainly not correct Also can not be each user (concurrent requests, Each user has a request, rather than each user request) each have a lock for many times, if 1 w request at the same time, there is 1 w request, if again set 1 w lock, cache and the failure will also have 1 w requests into the DB, DB can’t afford to, so this kind of situation can make the 1 w, competing for 128 locks, grab the lock request, It’s possible to update the cache

// Number of locks The less the number of locks, the more fierce the competition between each user for locks, the less traffic directly to the database, the better the protection of the database, if too small, and affect the system throughput, Public static final String[] LOCKS = new String[128]; Static {for (int I = 0; i < 128; i++) { LOCKS[i] = "lock_" + i; }} try{if(exists(cacheKey)){return cacheData; } String[] locks = orderRediskey.locks; Hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash //a % (2^n) is equal to a & (2^n -1) and 2^n -1 is equal to %. Int index = userID.hashCode () & (locks.length - 1); lock(locks[index]); // Contention lock if(exists(cacheKey)){return cacheData; } data=getFromDB(); Addcahce (cacheKey, NULL, ttl_NULL); if(data == null){addCAhce (cacheKey,null, ttl_NULL); }else{// Cache data and set cache expiration time addCAhce (cacheKey,data,ttl_notnull); } return data; } the finally {unlock (the locks [index])}Copy the code