1. Flowchart description
Caching has been widely used in projects due to its high concurrency and high performance features. When it comes to reading the cache, there is no doubt that everyone will follow the following process for business operations.
2, doubt
When both the cache and the database exist at the same time, if there is a write operation, should the database or the cache be operated first? So let’s think about what the problems might be, and let’s move on.
3. Update the policy
3.1 Update the database first and then the cache
There was widespread opposition to the scheme. Why is that? There are two reasons.
3.1.1 Cause one (Thread Safety)
If both request A and request B are performing the update operation, the
- 1) Thread A updates the database
- 2) Thread B updates the database
- 3) Thread B updated the cache
- 4) Thread A updated the cache
The request to update the cache should have been made earlier than the request to update the cache of B, but B updated the cache earlier than A because of the network or other reasons. This results in dirty data, so do not consider.
3.1.2 Cause two (Service Scenario)
-
(1) If you have a business requirement with a large number of write database scenarios and a small number of read data scenarios, this approach will result in the cache being updated frequently before the data is even read, wasting performance.
-
(2) If you write to the database, you do not write directly to the cache, but go through a series of complex calculations before writing to the cache. It would be a waste of performance to re-evaluate the value of the write to the cache after each write to the database. Obviously, removing the cache is more appropriate.
Next up is the most controversial one, deleting the cache first and then updating the database. Update the database first, then delete the cache problem.
3.2 Delete the cache before updating the database
This scheme can lead to inconsistency for several reasons. At the same time, A requests the update operation, and B requests the query operation. Here’s what happens:
- 1) Request A to perform A write operation and delete the cache
- 2) Request B finds that the cache does not exist
- 3) Ask B to query the database to get the old value
- 4) Request B to write the old value to the cache
- 5) Request A to write the new value to the database
This leads to an inconsistent situation. Moreover, if you do not use a policy to set the expiration time for the cache, the data will always be dirty.
So what’s the solution? The delay deduplication policy is adopted
/** * pseudocode for the solution */
public void write(String key,Object data){
//1, delete cache first
redis.delKey(key);
//2, update the database, write data
db.updateData(data);
//3. Sleep for 1 second
Thread.sleep(1000);
//4. Delete the cache again
redis.delKey(key);
}
Copy the code
3.2.1 How do I Determine the Sleep Time?
So, how do you determine this one second? How long should you sleep?
You need to assess the time consuming of the read data business logic for your project. The purpose of this is to ensure that the read request ends and the write request can remove the cache dirty data caused by the read request.
Of course, this strategy also takes into account redis and the database master/slave synchronization time. The sleep time of the last write data: add several hundred ms to the time of reading the data business logic. For example, sleep for 1 second.
3.3 Update the database before deleting the cache
First, a word. There is a way to update the Cache called “cache-aside Pattern”. And it says
- Invalid: The application first obtains the data from the cache. If no data is obtained, the application retrives the data from the database and puts it into the cache after success.
- Hit: The application retrieves data from the cache and returns the data.
- Update: first save data to the database, after successful, then disable the cache.
Facebook also used the same strategy in a paper called “Memcache at Facebook” when the database was upgraded and then the cache was removed.
/** * pseudocode for the solution */
public void write(String key,Object data){
//1, update the database, write data
db.updateData(data);
//2. Delete the cache
redis.delKey(key);
}
Copy the code
3.3.1 Is there no concurrency problem in this case?
Isn’t. Assuming that there are two requests, one for query and one for update, the following situation arises
- 1) The cache is just invalid
- 2) Ask A to query the database and get an old value
- 3) Request B to write the new value to the database
- 4) Request B to delete the cache
- 5) Request A to write the old value found into the cache
Ok, if that happens, dirty data does happen.
3.3.2 What is the probability of concurrency problems?
It is possible that step (4) precedes step (5) because the write to the database in step (3) takes less time than the read to the database in step (2). However, if you think about it, the read operation of the database is much faster than the write operation, so step (3) is shorter than step (2), this situation is difficult to occur.
3.3.3 How Can I Solve the Concurrency Problem?
First, it is one solution to set the cache to a valid time. Secondly, the asynchronous delay deletion strategy given in strategy (2) is adopted to ensure that the deletion operation is performed after the read request is completed
4, the best solution discussion
Scheme 4.1 a
One caveat is that, in theory, setting an expiration time for the cache is the solution to ensure ultimate consistency. In this scenario, we can set the expiration time for the data stored in the cache. All write operations are based on the database, and we just do our best for the cache operations. That is, if the database write succeeds and the cache update fails, subsequent read requests will automatically read the new value from the database and backfill the cache as soon as the expiration time is reached.
4.2 2
Data in Redis is always valid, but a background update task (” timed code “or” queue-driven code “) reads the DB and pushes the latest data to Redis. This approach treats Redis as “storage.” The visitor does not know the actual data source behind it, only that Redis is the only place where the data can be fetched. When the actual data source is updated, the background update task updates the data to Redis. In this case, there will be the problem of inconsistency between Redis and the actual data source. If the task is a scheduled task, the maximum inconsistent duration is the execution interval of the update task. If the update is done in a queue-like manner, the inconsistency time depends on the delay in queue generation and consumption. Commonly used queues (or equivalents) Redis (how still Redis), Kafka, AMQ, RMQ, Binglog, log file, Ali canal, etc.
5. References
www.zhihu.com/question/31… zhuanlan.zhihu.com/p/98909029