Transfer: cloud.tencent.com/developer/a…

The cache to penetrate

Use the cache

The main purpose of adding a cache is to relieve the pressure on the database and improve the performance of the system

Database combined with cache query process:

  • When a user sends a request, it queries the cache first and returns the data in the cache
  • If no data exists in the cache, query the data in the database
  • If there is data in the database, it is put in the cache and returned to the user
  • If no data exists in the database, failure is returned

What is cache penetration

Cache penetration refers to the data that does not exist in the cache and the database. As a result, all the requests are sent to the database. As a result, the database has to be queried in the database every time the user requests the data, causing the database to collapse under a large number of requests in a short time.

The result of such a user request is that the data is not retrieved from the cache each time, the database needs to be queried, and the data is not found in the database and cannot be placed in the cache. That is, each time the user request comes in, the database will be queried.

The red arrow on the chart shows the route taken each time.

Obviously, the cache didn’t work at all, as if it was being penetrated, and the database was being accessed every time.

This is what we call the cache penetration problem.

If the cache is breached at this point and the number of requests to the direct database is very high, the database may fail under the strain.

The solution

1. Verify parameters

Verify parameters of sent requests in advance

For example, your legal ID is 15XXxxxx, starting with 15. If the user passes an ID starting with 16, for example, 16232323, the parameter verification fails and related requests are directly intercepted. In this way, some malicious forged user ids can be filtered out.

2. Bloom filter

If we have a small amount of data, we can put all the data in the database into a map in memory.

This is a very quick way to tell if the data is in the cache or not. If it does, give it access to the cache. If it does not exist, the request is rejected.

But if you have too much data, tens of millions or hundreds of millions of data, and you put it all in memory, it’s going to take up too much memory.

So, is there a way to reduce memory space?

A: That’s where the Bloom filter comes in.

The underlying bloom filter stores data using an array of bits, whose elements default to 0.

When the bloom filter is first initialized, it evaluates all the existing keys in the database through a series of hash algorithms (e.g., triple hash algorithm). Each key evaluates multiple positions and sets the value of the elements in those positions to 1.

Later, when a key request is received, the same hash algorithm is used to calculate the location.

  • If the element value is 1 in multiple locations, the key already exists in the database. This allows you to continue.
  • If the element value is 0 in more than one location, the key does not exist in the database. You can then reject the request and return directly

Using bloom filters does solve the cache penetration problem, but it also introduces two problems:

  1. There has been a miscarriage of justice.
  2. Data update problems exist.

When initializing the data, the hash algorithm computes positions for each key multiple times, and sets the value of the element in those positions to 1.

Hash algorithms have hash collisions, which means that different keys may compute the same position.

For example, the subscript 2 position has a hash conflict, and key1 and key2 compute the same position

If you have tens or hundreds of millions of data, hash collisions in bloom filters can be very noticeable.

If a user key has been hashed several times, its element value has been initialized to 1 by other keys. The key does not exist in the database, but the Bloom filter confirms that it does.

If bloom filter detects the existence of a key, miscalculation may occur

If a key does not exist, it must not exist in the database.

In general, the error rate of Bloom filter is relatively low. Even if a small number of misjudged requests directly access the database, but if the number of visits is not large, the impact on the database is not big.

In addition, if you want to reduce the misjudgment rate, you can increase the hash function appropriately. The 3-hash in the figure can be increased to 5.

In fact, the most fatal problem with Bloom filters is that if the data in the database is updated, the Bloom filter needs to be updated synchronously. But it and database are two data sources, there may be inconsistent data.

For example, a new user is added to the database, and the user data needs to be synchronized to bloom filtering in real time. However, the synchronization failed due to a network exception

The user request was rejected because the Bloom filter did not have data for the key. But this is a normal user, and it was blocked.

Obviously, there are some businesses that can’t tolerate normal users being blocked. So, the Bloom filter depends on the actual business scenario, it helps us solve the cache penetration problem, but it also introduces new problems.

3. Cache null values

When a user ID is not found in the cache or in the database, the user ID needs to be cached, but the value is empty. In this way, subsequent requests with the same user ID can fetch empty data from the cache and return it directly, without having to check the database again

The key point is that the result is placed in the cache regardless of whether the data is retrieved from the database, but if no data is retrieved, the value in the cache is empty

Cache breakdown

What is cache breakdown

Sometimes, when we access hot data. For example, we buy a popular product at a certain mall.

In order to ensure the speed of access, normally, the mall system will put the product information in the cache. But if at some point, the product reaches its expiration date and becomes invalid.

At this point, if a large number of users request the same item, but the item is invalid in the cache, all of a sudden these user requests are directly directed to the database, which may cause the database to be overburdened and directly hang.

The solution

1. The lock

The root cause of database stress is that too many requests are accessing the database at the same time.

If we could limit access to a productId’s database item information to only one request at a time, this would solve the problem

2. Automatic renewal

Cache breakdown is caused by the expiration of the key. So, let’s change the way of thinking, before the key is about to expire, automatically renew it, OK?

A: Yes, we can renew the specified key automatically with job.

For example, we have a sorting feature that sets the cache expiration time to 30 minutes. However, a job is executed every 20 minutes to automatically update the cache and set the expiration time to 30 minutes. This ensures that its cache will not expire

3. The cache is never invalid

For many popular keys, it is possible to make them permanent without setting an expiration date.

For example, we can not set the expiration time in the cache because there are not many ids of popular goods participating in seckilling activities.

Before the second kill activity began, we used a program to query the data of goods from the database in advance, and then synchronized to the cache to do preheating in advance.

After the seckill activity has finished for some time, we can manually delete these useless caches.

Cache avalanche

Cache avalanche is an updated version of cache breakdown, where a hot key fails, and cache avalanche, where multiple hot keys fail at the same time. It seems that the problem is even worse if there is a cache avalanche.

There are currently two types of cache avalanche:

  1. There are a lot of hot caches and invalidation at the same time. Can result in a large number of requests to access the database. And the database is likely to fail directly because of the pressure.
  2. The cache server is down. It may be a hardware problem or a network problem in the machine room. In short, the entire cache becomes unavailable.

The bottom line is that there are a lot of requests to access the database directly through the cache.

The solution

1. Expiration time plus random number

To solve the cache avalanche problem, we need to avoid simultaneous cache failures in the first place.

This requires us not to set the same expiration times.

You can add a random number of 1 to 60 seconds to the expiration time.

Actual expiration time = Expiration time + random number from 1 to 60 secondsCopy the code

In this way, even if the expiration time of multiple requests is set at the same time in the case of high concurrency, there will not be too many identical expiration keys due to the existence of random numbers.

2. High availability

In view of the situation that the cache server is down, some high availability architectures can be made in the early stage of system design.

For example, if redis is used, you can use sentinel mode or cluster mode to avoid the situation that the entire Redis service is unavailable due to the failure of a single node.

After the Sentinel mode is used, when a master service goes offline, the slave service of the master is automatically upgraded to the master service and continues to process requests instead of the offline Master service.

3. Service degradation

What if the Redis service still fails after making a high availability architecture?

This is where service downgrades come in.

We need to configure some default pocket data.

There is a global switch in the program. For example, if 10 requests fail to obtain data from Redis within the last minute, the global switch will be turned on. Subsequent new requests get the default data directly from the configuration center.