“This is the fifth day of my participation in the First Challenge 2022. For details: First Challenge 2022”

1. The opening words

I remember that a long time ago, an ali interviewer asked me how to solve the problem of consistency between database and cache. It happened that I had just finished reading The Geek Time book Redis Core Technology and Actual Combat by Jiang Dejun, in which there was a chapter about the past and present of consistency of cache. From the cause of the consistency problem to the occurrence of data inconsistencies in various extreme scenarios, I gave the interviewer one by one examples, told him a leng leng, gave him a good lesson.

Of course, I passed this round of interview. Finally, I refused their Offer because the salary was not negotiated. Maybe it was fate that did not arrive, so WE got away.

The main topic of this paper is cache consistency, including Redis common read and write policies, why there is cache inconsistency, how to ensure the consistency of database and cache.

Without further ado, let’s get to the point.

2. Bypass cache mode

When we use Redis as the Cache, we generally adopt Cache Aside Pattern as the read and write strategy of Redis Cache.

In bypass caching mode, a layer of cache is added between the client and the database. When data is read, values in the cache are first read, and data in the cache is maintained when data is written. The “maintenance” here can be either deleting the cache or updating the cache, but we generally use the former to ensure data consistency.

To better understand the bypass caching pattern, here is a flowchart for both read and write scenarios.

2.1 Bypass Cache Mode Read process

  • The client initiates a read request and queries the cache first. If a match is found in the cache, the client directly returns cached data
  • If the cache is not matched, the database data is queried, the database query result is updated to the cache, and the response result is returned to the client

2.2 Bypass Cache Mode Write Process

  • The client initiates a write request, writes data to the database, and deletes the corresponding data from the cache

Some students may ask: why write data to delete the corresponding cache content rather than update the cache data?

This involves the consistency of cache and database, we can regard cache and database as two irrelevant middleware, they do not have the concept of atomicity like database transactions, so in the process of processing business is likely to happen database processing success, cache processing failure scenario. In addition, there may be an inconsistency between the cache and the database data due to the update cache order.

The list is as follows.

2.3 Exception of Cache Update

First, enumerate the cases where the cache and database data are inconsistent due to the update cache order.

Suppose you have an item with an inventory of 10, stored separately in the data and cache, and now you have two users who have each bought an item, so you need to subtract the inventory. When user 1 places an order, the inventory number changes from 10 to 9. When user 2 places an order, the inventory number changes from 9 to 8. The final inventory number should be 8.

In the description of the write process in bypass cache mode, the corresponding data in the cache needs to be updated after the database is updated. However, after the database is updated twice, some accidents occur during the data update to Redis: the request of user 1 to update the cache is later than that of user 2. This results in a final cache of 9 inventory, which can result in inconsistent data cached in the database.

The above abnormal flow is shown as follows:

SequenceDiagram Autonumber Client ->> Application: User 1 order Application ->> Application: user 2 order Application ->> MySQL: Application ->> Redis: select * from user 1; User 2 requests to update the cache to 9

The second abnormal process is that after updating the database, all the requests to update the cache fail, which will eventually result in the cache data being 10, and the subsequent read requests will read the wrong inventory data, which also leads to the inconsistency between the cache and the database data.

Therefore, when we use the bypass caching mode, we generally update database data and delete corresponding cached data to ensure consistency between database and cached data.

In reality, however, there are extreme scenarios that can cause the database to be inconsistent with the cache, which is the focus of this article.

3. Cache consistency

The cache consistency described in the previous article refers to the consistency between database data and cached data. The consistency here actually includes two situations:

  • When a cache hit, ensure that the cache data is consistent with the database data
  • If the cache is not hit, ensure that the data loaded from the database is the latest version, that is, the data is not loaded into the cache and then updated to the database

4. The database is inconsistent with the cache

In describing the bypass cache pattern, I deliberately obscure the timing of database updates and cache deletions in the case of write requests, because different situations have different issues. In other words, various cache inconsistency scenarios are analyzed below.

4.1 Delete the cache before updating the database

In the first case, the cache is deleted and then the database is updated. In this case, the sequence diagram is shown below:

SequenceDiagram Autonumber Client ->> Application: The Client initiates a write request Application ->> Redis: the server deletes the corresponding cache Application ->> MySQL: Redis -->> Application: deletes the cache in response to MySQL -->> Application: updates the database in response to Application ->> Client: responds to the Client write request

In the sequence diagram above, I have drawn two dashed lines to indicate where exceptions can occur, namely, cache deletion failure, database update failure, which in both cases can be divided into different cache inconsistency scenarios:

4.1.1 Failed to update the database after Deleting the Cache succeeded

The sequence diagram of the scenario where the cache is deleted successfully and the database fails to be updated is as follows:

SequenceDiagram Autonumber Client ->> Application: The Client initiates a write request Application ->> Redis: the server deletes the corresponding cache Application ->> MySQL: MySQL -->> Application: failed to update the database. Application ->> Client: the server responds to the Client's write request

The result of this exception scenario is frequent cache failures and database write exceptions. From the user’s point of view, the data remains the same after the operation.

Of course, this scenario is caused by abnormal database services. Regardless of whether there is cache or not, o&M personnel need to check database services in the online environment. This scenario has little to do with data consistency problems and is listed here as an exception.

4.1.2 Database update Succeeded because Cache Deletion Failed

The sequence diagram of the scenario where the cache fails to be deleted and the database is updated successfully is as follows:

SequenceDiagram Autonumber Client ->> Application: The Client initiates a write request Application ->> Redis: the server deletes the corresponding cache Application ->> MySQL: MySQL ->> Application: the database is updated successfully. Application ->> Client: the server responds to the Client's write request

The result of this exception scenario is that the database updates normally, but the cache remains old. This is a case where the database data is inconsistent with the cache data. Subsequent read requests will hit the old values stored in the cache.

Of course, this scenario is caused by the abnormal cache service. When a large number of data problems are reported, O&M personnel need to check the cache service online. Although this is a cache inconsistency scenario, which has nothing to do with cache read/write policies, it is also listed as an exception scenario.

4.1.3 Concurrent Scenario

The above two cases discuss cache or database service exceptions, which are generally unlikely to occur, and if they do occur, the service provider is at fault, so developers don’t need to worry too much about restoring the service as soon as possible. Cache inconsistencies caused by multi-threaded concurrency are what developers really need to be concerned about.

Assuming that both the cache service and the database service are healthy, exceptions may exist in concurrent scenarios: Thread A cache was removed successfully, but has not update the data in the database, there are thread B read requests, found that cache misses, then loaded from the database data to the cache, while due to thread A unfinished action update database, the data in the database is the old version data, namely thread B read the old value, and then in the bypass the cache mode, The old value is written to the cache before thread A continues to update the database. In this way, subsequent reads will not read the latest values in the database.

This scenario is depicted as an inventory case, with the sequence diagram shown below.

Note right of thread A: Application ->> Redis: Note right of Application: thread B -->> Application: thread B -->> Redis: MySQL > select * from thread B; MySQL > select * from thread B; Note right of thread B: the current database value is 10 Application -->> Redis: the database value is written to the cache. Note right of Redis: 10 Application ->> MySQL: Thread A predefined update database MySQL - > > Application: thread A Note to change the database success right of thread B: predefined were updated successfully for 9 Application - > > A: thread thread A write request is successful

4.2 Update the database before deleting the cache

Let’s discuss the second case, where the database is updated before the cache is deleted. Similarly, there are three cases to be discussed, namely:

  • Update the database successfully, but delete the cache failed: This is the same as the second case of “Delete the cache first, then update the database”.
  • Failed to update the database but succeeded in deleting the cache: This is the same as the first case of delete the cache and then update the database.
  • Concurrent scenarios

Assume that the cache and database services are normal. When the database is updated before the cache is deleted, an exception may occur in concurrent scenarios: Thread A writes data to the database successfully, but the cache data has not been updated. At this time, thread B initiates a read request and hits the cache. The cached data is of the old version, and thread B reads the old value. Thread A then deletes the cached data. In this concurrent scenario, the data read by thread B is not the latest version data in the database, that is, the data in the cache is inconsistent with the data in the database.

This scenario is depicted as an inventory case, with the sequence diagram shown below.

SequenceDiagram autonumber thread A ->> Application: Note right of Application: MySQL > alter database; MySQL > alter database; Redis -->> Application: Thread B hits the cache. Note right of thread B: Note right of MySQL: 10 Application ->> Redis: Redis ->> Application: Thread A deletes the cache. Application ->> Thread A: Thread A writes the cache successfully

5. Cache inconsistency solution

With cache inconsistencies out of the way, here’s a solution for different cache inconsistencies:

  • Retry mechanism: If database writing fails or cache deletion fails, you can retry the failed operation based on the retry mechanism. If the operation still fails after a certain number of times, you need to roll back the database operation, manually roll back the cache data, and throw a service exception.
  • Delay double delete: in the face database and delete the cache is successful in writing but in a concurrent scenarios, due to the complicated serial between two operations, cause other read operation, between the two leading to cache inconsistency phenomenon, can consider to use time delay to solve them, or delete the cache first, then write the database, and then after a certain time delay to delete the cache again.

Note that the delay in delayed double deletion is a certain time. Try to ensure that the deletion is performed after the database is written. This time can only be estimated, and the delayed double deletion policy may be inconsistent in extreme scenarios.

  • After the cache is deleted for the first time, data inconsistency occurs when the cache is updated to an earlier version because other read operations fail to hit the cache

If the service needs to ensure strong consistency between cache and database, it can only serialize read and write requests based on locking to achieve strong consistency between cache and database.

6. Summary

This paper discusses the bypass cache mode, the general use of the bypass cache mode, the database and cache data inconsistency scenario analysis and its solution.

7. Reference materials

  • Redis Core Technology and Practice (Geek Time)
  • Redis Deep Adventures: Core Principles and Practical Applications

Finally, this article is included in the Personal Speaker Knowledge Base: Back-end technology as I understand it, welcome to visit.