Database and cache consistency issues, just read this article

Writing in the front

In our back-end development, we often discuss the question of how to ensure cache and database consistency.

I believe there are a large number of people who do not know much about this question, or have a lot of doubts:

When updating data, do I update the database first and then delete the cache, or delete the cache first and then update the database?
Should you consider introducing message queues to ensure data consistency?
Can delayed double delete be used? What will happen if it is used?
.

The following article will explain the above problems clearly. Let’s look at the outline first.

Why cache?

For small businesses, or for businesses that don’t have a lot of requests per day, introducing caching just makes the system more complicated. In short, there’s no need to introduce caching unless you want to do it for the sake of using it. The simple architectural model is as follows:

However, as the business of the company grows and the number of requests for projects increases, there will be performance problems if the simple architecture above is used to support it.

At this point, it is time to introduce cache, which can improve performance. At this point, the architecture of the upgrade version is as follows:

It can be seen that the middleware used for cache is Redis, which not only has high performance, but also has rich and simple data structure, which can be satisfied without using special data types.

Caching schemes and how to use caching

So let’s look at a relatively simple and straightforward caching scheme:

Flush the full amount of database data to the cache, and do not set the expiration time;
Then, when writing, only the database is updated, not the cache;
There is also a scheduled task that periodically updates the database’s incremental data to the cache.

Advantages of the scheme:

All requests are in the cache, no need to check the database;
Direct access to Redis is very high performance.

But the disadvantages are also obvious:

The cache utilization is low, and the data that is not commonly used is kept in the cache, occupying the memory.
Cache and database data may be inconsistent, depending on how often scheduled tasks refresh the cache.

Similarly, the above solution is suitable for service scenarios with small traffic and low requirements on data consistency.

What about scenarios with heavy traffic and data consistency requirements?

When do data consistency problems arise?

Before we talk about data consistency, let’s look at how to improve cache utilization.

As mentioned above, putting infrequently used data into the cache will occupy memory and reduce the cache utilization rate. Therefore, to improve the cache utilization rate, it is easy to think of a solution: Redis cache only keeps the recently accessed data, which we call “hot data”. The concrete implementation is as follows:

Writing data is still writing to the database;
The read request reads data from the cache first. If the read data is not in the cache, the read data is read from the database and flushed to the cache.
In addition, data flushed into the cache needs to be set to expire.

Will cache the data set expiry time, such a cache if not frequently accessed data, will be expired eliminated over time, the cache are frequently accessed the remainder of the “hot” data, so as to improve the utilization of the cache, is eliminated LRU algorithm, and may have a look before I wrote an article: the LRU cache elimination algorithm is how much do you know? .

Next, look at data consistency.

If you want to ensure consistency between the cache and the database, you can’t use the timed task refresh method mentioned above.

That is, when updating data, not only the database, but also the cache must be operated on. That is, to modify a piece of data, update not only the database, but also the cache.

One might notice that if I update data, do I update the cache first and then update the database, or do I update the database first and then update the cache? There are only two priorities:

Update the database first, then the cache;
Update the cache first, then update the database;

So which is the priority? The following is a case discussion.

Let’s put aside the concurrency issues, and under normal circumstances, either of the above options are available, whether the data is updated first or the cache is updated first, and the data is consistent between the two. It’s the exceptions we need to worry about.

Update the database first, then the cache

Update the database first, then update the cache: If the database update succeeds but the cache update fails, the data in the database is the new value and the data in the cache is the old data.

Then, a read request comes in, and the read cache reads the old data (before the cache lapses). Only after the cache lapses does it read the latest value from the database and rebuild the cache, at which point the cached data is the latest.

If users read data before the cache becomes invalid, the modified data does not take effect and is updated after a period of time. In this way, services are affected.

Update the cache first, then update the database

Update the cache first, in the case of updating the database: if the cache update succeeds but the database update fails, the cached data is the new value and the data in the database is the old data.

Then, a read request comes in, and even though it hits the cache, it reads the new value, the correct value, but once the cache fails, it reads the old data from the database and rebuilds the cache, so that the cached data is also old.

When the user reads the data again, the modified data is changed back to the old data, which also affects services.

To sum up, the two schemes mentioned above:

No matter who is updated first, any exception in the update of the latter will affect services to some extent. So how to solve this problem? The analysis will be continued later and corresponding solutions will be given.

Data consistency problems in concurrent scenarios

Let’s take a look at the prerequisites: use the “update database first, update cache later” scenario, and both updates are successful.

Given the above, what if there is concurrency?

Let’s start with the following scenario:

There are two threads, A and B, which need to update the same data (assuming that data X is updated) in the following order:

A Update database, X = 1;
B update database, X = 2;
B Update cache, X = 2;
A Update cache, X = 1;

According to the order of execution, the value of X in the cache is 1, and the value of X in the database is 2. Therefore, the value of X in the cache and the value of X in the database is inconsistent, which is not expected.

Similarly, with the “update the cache first, update the database later” scenario, there are similar problems, and I won’t go into details here.

In addition, if the cache is updated every time the data is modified, but the data in the cache may not be read immediately, as mentioned above, the cache may store a lot of data that is not often accessed, occupying the memory, and the cache utilization rate is not high.

Moreover, in some cases, the data written to the cache is not directly brushed from the database, that is, it is not one-to-one correspondence, it is possible to look up the database data, and then through some calculated values, and then updated into the cache.

Therefore, “update the database + update the cache” scheme, not only the cache utilization is not high, but also reduce the performance, not worth the loss.

Therefore, we need to consider another solution: deleting the cache.

Can cache deletion be consistent?

Similarly, there are two ways to remove the cache:

Delete the cache first, then update the database;
Update the database first, then delete the cache;

According to the above analysis, if the operation fails in the second step, the data will be inconsistent.

I won’t repeat it here.

Let’s focus on concurrency and how to deal with it.

Delete the cache first, then update the database

Using the same concurrency example above (with minor modifications) :

Let’s say X is 1, and we have threads A and B.

A to update data (X = 2), delete the cache first;
B read the cache and find that it does not exist, and read the data from the database (X = 1);
A writes the new value (X = 2) to the database;
B writes old data (X = 1) to the cache;

According to the above execution order, the final value of X in the cache is 1 (old data), and the value in the database is 2 (new value), and the data is inconsistent.

It can be seen that the scheme of “delete the cache first, then update the database” still has data inconsistency when “read + write” concurrency occurs.

Update the database first, then delete the cache

Threads A and B also operate concurrently:

No X data exists in cache, X = 1 exists in database;
A reads the database and gets X = 1;
B update database X = 2;
B delete cache;
A writes X = 1 (old value) to the cache;

Finally, the value of X in the cache is 1 (old value), and the value in the database is 2 (new value), and the data is inconsistent.

This is theoretically possible, but in practice it is very unlikely to happen, because the following three conditions must be met for it to happen:

First, the cache is invalid, that is, the cache does not exist the data;
Read and write requests the data to be sent concurrently;
Updating the database and deleting the cache (steps 3 and 4 above) takes less time than reading the database and writing the cache (steps 2 and 5 above);

Based on years of development experience, the probability of condition 3 occurring is actually relatively low.

Because database writes usually lock first, database writes usually take longer than database reads.

To sum up, “update the database first, then delete the cache” scheme can ensure data consistency to a certain extent.

Therefore, in normal development, we should adopt this scheme to operate the database and cache.

At this point, the concurrency problem is resolved, and we move on to the previously mentioned problem: the second step execution “failure” or exception, resulting in inconsistent data.

How do I ensure that updating the database and deleting the cache are both successful?

As previously analyzed, any failure in step 2, whether updating or deleting the cache, will result in inconsistencies between the database and the cache.

So how to solve this problem?

Scheme 1: Retry

One of the first scenarios that comes to mind is to try again after the execution fails.

The retry scheme will not be described here.

Scheme 2: Retry asynchronously

Asynchronous retry is simply the throwing of a retry request into a “message queue” that is then retried by a dedicated consumer until it succeeds.

At this point, some of you might notice that writing a message queue can fail, right? In addition, the introduction of message queues, which not only increases the complexity of the system, but also increases the cost of maintenance, is it worthwhile?

This is a good question, only continuous thinking, put forward questions, solve problems, there will be progress.

Before answer the above questions, we first to think about such a question: if you don’t still to the message queue to retry the operation, failure in the implementation of the thread has been try again, try again before execution success, at this time if the process is “reset”, then the retry the request would be “lost” and that the data is actually inconsistent (no chance anymore).

So, here we put the retry or step 2 operation into another service that uses message queues for the retry operation.

To review the features of message queues:

Guaranteed reliability: Messages written to the queue will not be lost until they are successfully consumed and the service will restart without any problem;
Ensure successful message delivery: The downstream pulls the message consumption from the queue and deletes the message only after successful consumption. Otherwise, the downstream continues to deliver the message to the consumer (in line with the retry scenario).

As for write queue failures and message queue maintenance costs:

Write queue failure: Operations on the cache and message queue fail at the same time with a low probability.
Maintenance costs: Message queue components are mature and are often used in corporate projects. There are no maintenance costs.

If you really don’t want to write message queues in your application, there is a consistent solution: subscribe to the database change log and then manipulate the cache.

In other words, when you want to change data in a service, you just need to change the database, and you don’t need to manipulate the cache.

The operation of the cache is left to Canal, the middleware that subscribes to the database change log.

Canal is a mature middleware of Ali Open source. I will not introduce the details here. If you are interested, you can Google it yourself.

The architecture model is as follows:

Advantages:

Don’t worry about the failure of writing to the message queue: if writing to the MySQL database succeeds, the Binlog will have;
Automatically post to the downstream queue: Canal automatically posts the database change log to the downstream message queue, as long as it is configured, see my previous article on canal middleware synchronizing MySQL data to ElasticSearch.

Of course Canal’s high availability and stability still need to be maintained.

From here, we can draw the following conclusions:

To ensure consistency between the database and the cache, you are advised to update the database first, then delete the cache, and then use message queue or Subscribe to the change log.

conclusion

According to what has been said above, the following points can be summarized:

In scenarios with large service volumes, caching can improve performance.
After adding cache, we should consider the problem of cache and database consistency, refer to the solution: “update the database, then delete cache”;
In the scheme of “update database first, then delete cache”, to ensure the successful execution of the two steps, it can be combined with the scheme of “message queue” or “subscribe to change log”, which is essentially to ensure data consistency through “retry”.

In addition, share some tips:

In many cases, performance and consistency cannot be met at the same time. For the sake of performance, “final consistency” is usually adopted.
Cache and database consistency issues focus on: cache utilization, concurrency, and success of cache and database.
In some failure scenarios, to ensure consistency, the common method is “retry”. Synchronous retry affects throughput, so asynchronous retry is usually used.

If you still want to read more quality technical articles, welcome to follow my public account “Go Keyboard man”.