Cache consistency issues

In our real business scenario, there must be a lot to do data cache scenarios, such as selling merchandise page, including the many concurrent traffic data, they are can be described as “hot spots” data, these data have a characteristic, is the update frequency is low, read high frequency, these data should be cached as far as possible, This reduces the chances of requests hitting the database and reduces the pressure on the database.

Why cache

Caching exists for the sake of being “fast.” Let’s do an example in code.

In my Demo code repository, I added two interfaces getStockByDB and getStockByCache to query the inventory of an item from the database and cache respectively.

We then tested concurrent requests with JMeter.

It is important to note that my tests are not rigorous and are intended as a comparison test, not a reference to actual service performance.

When JMeter was first set to 10,000 concurrent requests, a large number of errors were reported, with 98% of the 10,000 requests failing directly. Make people very nervous ~

The following error occurs when the log is opened:

SpringBoot has a built-in maximum number of Tomcat concurrent requests. The default value is 200. For 10000 concurrent requests, the single-server service is not sufficient. Of course, you can change the concurrency Settings here, but your small machine may still fail.

After changing it to the following configuration, my small machine was able to fetch inventory through the cache.

Guaranteed 10,000 concurrent 100% return requests:

server.tomcat.max-threads=10000server.tomcat.max-connections=10000
Copy the code

As you can see, the throughput is 668 requests per second without caching:

With caching, the throughput is 2177 requests per second:

In this “very loose” comparison, the performance of a single machine with cache improved by more than 3 times, but in the case of multiple machines with more concurrency, the performance advantage of cache should be even more obvious due to the greater pressure on the database.

After testing this small experiment, I looked at my small pipe Tencent cloud server hanging MySQL, afraid he was so high traffic hang. This burst of traffic, may be detected as abnormal attack traffic ~

I use Tencent cloud server 1C4G2M, bought by the activity, very cheap. Here hit a free advertising, Tencent cloud please see after contact me to give me money;)

What kind of data is suitable for caching

Cache large but infrequently changing data, such as details, comments, etc. For those frequently changing data, it is not suitable for caching, on the one hand, it will increase the complexity of the system (cache update, cache dirty data), on the other hand, it will also bring some instability to the system (cache system maintenance).

In extreme cases, however, you may need to cache data that changes, such as when you want the page to display a quasi-real-time inventory number, or some other special business scenario. This is where you need to make sure that the cache doesn’t have dirty data (all the time), which is a bit of a further discussion.

The pros and cons of caching

It’s a tradeoff whether we should be caching or not.

Advantages of caching on:

It can shorten the response time of the service and bring a better user experience.
It can increase the throughput of the system and still improve the user experience.
Reduce the pressure of the database, prevent the peak database from being overwhelmed, resulting in the whole online service BOOM!

Caching also introduces a number of additional problems:

There are many different types of caches, whether they are in-memory, memcached or redis. If you are not familiar with them, it will increase the difficulty of maintaining a pure database system.
The distributed cache system should also be considered. For example, the distributed cache of Redis will have many pits, which undoubtedly increases the complexity of the system.
In special scenarios, if the accuracy of the cache is very high, the consistency between the cache and the database must be considered.

This article will focus on caching and database consistency.

How to ensure cache and database consistency

Having said so much about the necessity of caching, then using caching is not a very simple thing, I have always thought so, until I encountered the need to cache and database to maintain strong consistency of the scene, did not know that database data and cache data consistency is a very sophisticated knowledge.

From the ancient days of hardware caching and operating system caching, caching is a unique science. This issue has been discussed by the industry for a long time, and the debate is still ongoing. I’ve been looking through a lot of literature, and it’s really a matter of tradeoffs. It’s worth talking about.

The following discussion will introduce several perspectives, and I will use them to write code to verify the issues mentioned.

Instead of updating the cache, the cache is deleted

According to most people, caching is not about updating the cache, it’s about deleting the cache, having the next request remove the cache, and then reading the database and writing to the cache when it’s not there.

“Analysis of distributed double-write consistency schemes for databases and caches”

Cause one: Thread-safe Angle

If there are both requests A and B for the update operation, it will appear

(1) Thread A updates database

(2) Thread B updates database

(3) Thread B updates the cache

(4) Thread A updates the cache

A should update the cache earlier than B, but B updates the cache earlier than A because of network reasons. This results in dirty data and is therefore not considered.

Cause two: Service scenario

There are two points:

(1) If you are a business requirement with more write database scenarios and less read data scenarios, using this scheme will result in frequent cache updates before data is read at all, wasting performance.

(2) If you write a value to the database, it is not written directly to the cache, but is written to the cache after a series of complex calculations. Then, it is a waste of performance to evaluate the value written to the cache again after each write to the database. Obviously, deleting the cache is a better fit.

In fact, if the business is very simple, just go to the database to get a value, write to the cache, then update the cache is ok. However, flushing out the cache is simple and only causes an additional cache miss, which is recommended as a general handling method.

Cache first, or database first

So the question is, do we delete the cache first and then update the database, or do we update the database first and then delete the cache?

Let’s see what the big boys have to say.

【58 Shenjian Architecture Series 】 Two or three Details of Cache Architecture Design

For an operation that cannot be guaranteed to be transactional, it must involve the problem of “which task should be done first and which task should be done later”. The solution to this problem is as follows: If the inconsistency occurs, the one who performs the operation first will have less impact on services.

If the Cache is flushed first and then the database is written, only one Cache miss is generated.

Assume that the database is written first and the Cache is discarded. If the database is written successfully in the first step but the Cache is discarded in the second step, new data is stored in the DB and old data is stored in the Cache, causing data inconsistency.

There is no problem, but the problem of dirty data reading in concurrent requests is not fully considered. Let’s look at The analysis of distributed database and cache double-write consistency scheme.

Delete the cache first, then update the database

This scenario results in inconsistent request data

At the same time, there is A request A for update operation, and another request B for query operation. Then the following situation will occur:

(1) Request A to perform write operations and delete the cache

(2) The cache does not exist in the query of B

(3) Request B to query the database and get the old value

(4) Request B to write the old value to the cache

(5) Request A to write the new value to the database

This leads to inconsistencies. Also, if you don’t set an expiration time policy for the cache, the data will always be dirty.

So deleting the cache first and then updating the database is not a one-size-fits-all solution.

Isn’t there a concurrency problem in updating the database and then deleting the cache?

Isn’t. Suppose there are two requests, one for query A and one for update B, then the following situation will occur

(1) The cache is just invalid

(2) request A to query the database, get an old value

(3) Request B to write the new value to the database

(4) Request B to delete the cache

(5) Request A to write the found old value into the cache

Ok, if that happens, there will be dirty data.

But what are the odds of that happening?

The congenital condition is that the write operation of step (3) takes less time than the read operation of step (2), so step (4) may precede step (5). However, if you think about it, the database read operation is much faster than the write operation, so step (3) takes less time than step (2), this situation is difficult to occur.

Update the database first, then delete the cache will still have a problem, but the probability of the problem will be lower because of the reasons mentioned above.

(Addendum: I use “update database first, delete cache later” and do not set expiration time policy, will there be a problem? Since caching and updating the database first is not atomic, if the database is updated, the program stops deleting the cache, and since there is no expiration policy, the data is forever dirty.

So, if you want to implement the basic cache database double write consistent logic, then in most cases, without doing too much design, add too much work, please update the database first, then delete the cache!

What if I want the database to be strongly consistent with the cached data

So, what if I wanted to guarantee absolute consistency? Here’s the conclusion:

There is no way to achieve absolute consistency, which is determined by THE CAP theory. The cache system applies to scenarios that are not strongly consistent, so it belongs to the AP in CAP.

So, we have to compromise, we can achieve the final consistency of BASE theory.

Final consistency emphasizes that all copies of data in the system can eventually reach a consistent state after a period of synchronization. Therefore, the essence of final consistency is that the system needs to ensure the consistency of the final data, rather than ensuring the strong consistency of the system data in real time

The main solution is to deal with the dirty data caused by the above two double-write strategies (delete cache first, then update database/update database first, then delete cache) to ensure the final consistency.

Cache delay dual-delete

Q: Delete the cache first, then update the database to avoid dirty data?

Answer: Adopt the delayed dual-delete policy.

As mentioned above, if the cache is deleted and then the database is updated, the data will always be dirty if you do not set an expiration time policy for the cache.

So how to solve the problem of delayed double deletion?

(1) First eliminate cache

(2) Write database again (these two steps are the same as before)

(3) Sleep for 1 second and eliminate the cache again

By doing so, you can delete the cached dirty data that has been generated within 1 second.

So, how do you determine this one second, and how long should you sleep?

In this case, the reader should evaluate the time it takes to read the data business logic for his project. Then the sleep time of write data is based on the time of read data business logic, add several hundred ms. The purpose of this operation is to ensure that the read request ends and the write request can delete the cache dirty data caused by the read request.

What if you use mysql’s read-write separation architecture?

Ok, in this case, the data inconsistency is caused by the following, again two requests, one request A to update, and the other request B to query.

(1) Request A to perform write operations and delete the cache

(2) request A to write data to the database,

(3) Request B to query the cache and find no value in the cache

(4) Request B to query the slave database. At this time, the master/slave synchronization has not been completed, so the queried value is the old value

(5) Request B to write the old value to the cache

(6) The database completes the master/slave synchronization, and the slave library changes to the new value

This is why the data are inconsistent. The dual-delete delay policy is still used. However, the sleep time is modified to add several hundred ms on the basis of the delay time of master and slave synchronization.

What about throughput degradation with this synchronous obsolescence strategy?

Ok, then make the second deletion asynchronous. Own a thread, asynchronous deletion. In this way, the written request does not have to sleep for a while and then return. By doing this, you increase throughput.

Therefore, if the cache is deleted first and the database is updated, you can use the delayed double deletion policy to ensure that dirty data only survives for a period of time and is overwritten by accurate data.

In the case of updating the database and then deleting the cache, dirty data in the cache is rare, but it can happen. We can still use A delayed double-delete policy to delete the cache again after request A writes dirty old values to the cache. To make sure the dirty cache is removed.

What if cache deletion fails: Retry mechanism

It seems that all the problems have been solved, but in fact, there is still a problem that has not been considered, that is, what if the operation of deleting the cache fails? For example, when the second cache deletion fails, the dirty data is still not cleared.

The solution is to add a retry mechanism to ensure successful cache deletion.

Refer to the scheme diagram given by Teacher Lonely Yan:

Solution a:

The process is shown below

(1) Update database data;

(2) The cache fails to be deleted due to various problems

(3) Send the key to be deleted to the message queue

(4) Consume the message and obtain the key to be deleted

(5) Retry the deletion until the deletion succeeds

However, this scheme has a disadvantage of causing a lot of intrusion into the line of business code. This leads to plan 2, in which a subscriber is started to subscribe to the binlog of the database to obtain the data to be manipulated. In the application, start another program, get the information from the subscriber, and delete the cache.

Scheme 2:

The process is as follows:

(1) Update database data

(2) The database will write the operation information to the binlog

(3) The subscriber extracts the required data and key

(4) Start another non-business code to obtain the information

(5) Failed to delete the cache

(6) Send the information to the message queue

(7) Retrieve the data from the message queue again and retry the operation.

summary

Quote the conclusion of Chen Hao’s “Routine of Cache Update” as a summary:

Distributed systems either guarantee consistency through 2PC or Paxos protocols, or desperately reduce the probability of concurrent dirty data

Cache system is applicable to non-consistent scenarios, so it belongs to AP and BASE theory in CAP.

Heterogeneous database can not be strong consistency, just reduce the time window as much as possible to achieve the final consistency.

And don’t forget to set an expiration date. It’s a backstop

conclusion

This paper summarizes and discusses the double-write consistency of cached databases.

The content of this article can be summarized as follows:

For data that is read too much and written too little, use caching.
Keeping the database and cache consistent can result in a decrease in system throughput.
Keeping the database and cache consistent can lead to complex business code logic.
Caching can’t be absolutely consistent, but it can be ultimately consistent.
In the case of the need to ensure the consistency of cached database data, please try to consider how high the requirements for consistency, select the appropriate solution, avoid over design.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Why cache

What kind of data is suitable for caching

The pros and cons of caching

How to ensure cache and database consistency

Instead of updating the cache, the cache is deleted

Cache first, or database first

What if I want the database to be strongly consistent with the cached data

Cache delay dual-delete

What if cache deletion fails: Retry mechanism

Further reading

summary

conclusion

Cache consistency issues

Why cache

What kind of data is suitable for caching

The pros and cons of caching

How to ensure cache and database consistency

Instead of updating the cache, the cache is deleted

Cache first, or database first

What if I want the database to be strongly consistent with the cached data

Cache delay dual-delete

What if cache deletion fails: Retry mechanism

Further reading

summary

conclusion

Related Posts

The facade pattern is detailed with a code case study

Freeswitch 1.6 install Freeswitch 1.6 in Debian

How big should the database connection pool be? This article may change your mind