preface

This article is one of my previous series of articles, mainly discusses the problem of cache data that we need to use in the normal development process of each major system, and further extends to the problem of double-write consistency of database and cache, and gives the implementation code of all schemes for your reference.

The main content of this article

  • Data cache
    • Why cache
    • What kind of data is suitable for caching
    • The pros and cons of caching
  • How to ensure cache and database consistency
    • Instead of updating the cache, the cache is deleted
    • Cache first, or database first
    • What if the database and cache data are strongly consistent
  • Cache and database consistency practices
    • Combat: delete the cache first, then update the database
    • Combat: update the database first, and then delete the cache
    • Combat: Cache delay double delete
    • Combat: Delete the cache retry mechanism
    • Combat: read binlog asynchronously delete cache

Code word is not easy, just for attention, welcome to pay attention to my original technology public number: Back-end technology talk (QR code see the bottom of the article)

The project source code is here

Github.com/qqxx6661/mi…

Data cache

In our real business scenario, there must be a lot to do data cache scenarios, such as selling merchandise page, including the many concurrent traffic data, they are can be described as “hot spots” data, these data have a characteristic, is the update frequency is low, read high frequency, these data should be cached as far as possible, This reduces the chances of requests hitting the database and reduces the pressure on the database.

Why cache

Caching exists for the sake of being “fast.” Let’s do an example in code.

In my Demo code repository, I added two interfaces getStockByDB and getStockByCache to query the inventory of an item from the database and cache respectively.

We then tested concurrent requests with JMeter. (Please refer to my previous article on using JMeter: click here)

It is important to note that my tests are not rigorous and are intended as a comparison test, not a reference to actual service performance.

Here is the code for the two interfaces:

/** * @requestMapping ("/getStockByDB/{sid}") @responseBody Public String * @requestMapping ("/getStockByDB/{sid}") @responseBody Public String getStockByDB(@PathVariable int sid) { int count; try { count = stockService.getStockCountByDB(sid); } catch (Exception e) {logger. error(" error: [{}]", LLDB message ()); Return "Inventory query failed "; } LOGGER. The info (" commodity Id: [{}] remaining inventory is: [{}] ", sid, count); Return string. format(" Merchandise Id: %d Remaining stock: %d", sid, count); } /** * Query inventory: query inventory by cache * Cache hit: return inventory * cache missed: @requestMapping ("/getStockByCache/{sid}") @responseBody public String * @requestMapping ("/getStockByCache/{sid}") @responseBody public String getStockByCache(@PathVariable int sid) { Integer count; try { count = stockService.getStockCountByCache(sid); if (count == null) { count = stockService.getStockCountByDB(sid); Logger.info (" Cache missed, query database, write to cache "); stockService.setStockCountToCache(sid, count); }} catch (Exception e) {logger. error(" LLDB: [{}]", LLDB message ()); Return "Inventory query failed "; } LOGGER. The info (" commodity Id: [{}] remaining inventory is: [{}] ", sid, count); Return string. format(" Merchandise Id: %d Remaining stock: %d", sid, count); }Copy the code

When JMeter was first set to 10,000 concurrent requests, a large number of errors were reported, with 98% of the 10,000 requests failing directly. Make people very nervous ~

The following error occurs when the log is opened:

SpringBoot has a built-in maximum number of Tomcat concurrent requests. The default value is 200. For 10000 concurrent requests, the single-server service is not sufficient. Of course, you can change the concurrency Settings here, but your small machine may still fail.

After changing it to the following configuration, my small machine was able to guarantee 10,000 concurrent 100% return requests while fetching inventory from the cache:

server.tomcat.max-threads=10000
server.tomcat.max-connections=10000
Copy the code

As you can see, the throughput is 668 requests per second without caching:

With caching, the throughput is 2177 requests per second:

In this “very loose” comparison, the performance of a single machine with cache improved by more than 3 times, but in the case of multiple machines with more concurrency, the performance advantage of cache should be even more obvious due to the greater pressure on the database.

After testing this small experiment, I looked at my small pipe Tencent cloud server hanging MySql, afraid he was so high traffic hang. This burst of traffic, may be detected as abnormal attack traffic ~

I use Tencent cloud server 1C4G2M, bought by the activity, very cheap. Here hit a free advertising, Tencent cloud please see after contact me to give me money;)

What kind of data is suitable for caching

Cache large but infrequently changing data, such as details, comments, etc. For those frequently changing data, it is not suitable for caching, on the one hand, it will increase the complexity of the system (cache update, cache dirty data), on the other hand, it will also bring some instability to the system (cache system maintenance).

In extreme cases, however, you may need to cache data that changes, such as when you want the page to display a quasi-real-time inventory number, or some other special business scenario. This is where you need to make sure that the cache doesn’t have dirty data (all the time), which is a bit of a further discussion.

The pros and cons of caching

It’s a tradeoff whether we should be caching or not.

Advantages of caching on:

  • It can shorten the response time of the service and bring a better user experience.
  • It can increase the throughput of the system and still improve the user experience.
  • Reduce the pressure of the database, prevent the peak database from being overwhelmed, resulting in the whole online service BOOM!

Caching also introduces a number of additional problems:

  • There are many different types of caches, whether they are in-memory, memcached or redis. If you are not familiar with them, it will increase the difficulty of maintaining a pure database system.
  • The distributed cache system should also be considered. For example, the distributed cache of Redis will have many pits, which undoubtedly increases the complexity of the system.
  • In special scenarios, if the accuracy of the cache is very high, the consistency between the cache and the database must be considered.

This article will focus on caching and database consistency.

How to ensure cache and database consistency

Having said so much about the necessity of caching, then using caching is not a very simple thing, I have always thought so, until I encountered the need to cache and database to maintain strong consistency of the scene, did not know that database data and cache data consistency is a very sophisticated knowledge.

From the ancient days of hardware caching and operating system caching, caching is a unique science. This issue has been discussed by the industry for a long time, and the debate is still ongoing. I’ve been looking through a lot of literature, and it’s really a matter of tradeoffs. It’s worth talking about.

The following discussion will introduce several perspectives, and I will use them to write code to verify the issues mentioned.

Instead of updating the cache, the cache is deleted

According to most people, caching is not about updating the cache, it’s about deleting the cache, having the next request remove the cache, and then reading the database and writing to the cache when it’s not there.

“Analysis of distributed double-write consistency schemes for databases and caches”

Cause one: Thread-safe Angle

If there are both requests A and B for the update operation, it will appear

(1) Thread A updates database

(2) Thread B updates database

(3) Thread B updates the cache

(4) Thread A updates the cache

A should update the cache earlier than B, but B updates the cache earlier than A because of network reasons. This results in dirty data and is therefore not considered.

Cause two: Service scenario

There are two points:

(1) If you are a business requirement with more write database scenarios and less read data scenarios, using this scheme will result in frequent cache updates before data is read at all, wasting performance.

(2) If you write a value to the database, it is not written directly to the cache, but is written to the cache after a series of complex calculations. Then, it is a waste of performance to evaluate the value written to the cache again after each write to the database. Obviously, deleting the cache is a better fit.

In fact, if the business is very simple, just go to the database to get a value, write to the cache, then update the cache is ok. However, flushing out the cache is simple and only causes an additional cache miss, which is recommended as a general handling method.

Cache first, or database first

So the question is, do we delete the cache first and then update the database, or do we update the database first and then delete the cache?

Let’s see what the big boys have to say.

【58 Shenjian Architecture Series 】 Two or three Details of Cache Architecture Design

For an operation that cannot be guaranteed to be transactional, it must involve the problem of “which task should be done first and which task should be done later”. The solution to this problem is as follows: If the inconsistency occurs, the one who performs the operation first will have less impact on services.

If the Cache is flushed first and then the database is written, only one Cache miss is generated.

Assume that the database is written first and the Cache is discarded. If the database is written successfully in the first step but the Cache is discarded in the second step, new data is stored in the DB and old data is stored in the Cache, causing data inconsistency.

There is no problem, but the problem of dirty data reading in concurrent requests is not fully considered. Let’s look at The analysis of distributed database and cache double-write consistency scheme.

Delete the cache first, then update the database

This scenario results in inconsistent request data

At the same time, there is A request A for update operation, and another request B for query operation. Then the following situation will occur:

(1) Request A to perform write operations and delete the cache

(2) The cache does not exist in the query of B

(3) Request B to query the database and get the old value

(4) Request B to write the old value to the cache

(5) Request A to write the new value to the database

This leads to inconsistencies. Also, if you don’t set an expiration time policy for the cache, the data will always be dirty.

So deleting the cache first and then updating the database is not a one-size-fits-all solution.

Isn’t there a concurrency problem in updating the database and then deleting the cache?

Isn’t. Suppose there are two requests, one for query A and one for update B, then the following situation will occur

(1) The cache is just invalid

(2) request A to query the database, get an old value

(3) Request B to write the new value to the database

(4) Request B to delete the cache

(5) Request A to write the found old value into the cache

Ok, if that happens, there will be dirty data.

But what are the odds of that happening?

The congenital condition is that the write operation of step (3) takes less time than the read operation of step (2), so step (4) may precede step (5). However, if you think about it, the database read operation is much faster than the write operation, so step (3) takes less time than step (2), this situation is difficult to occur.

Update the database first, then delete the cache will still have a problem, but the probability of the problem will be lower because of the reasons mentioned above.

So, if you want to implement the basic cache database double write consistent logic, then in most cases, without doing too much design, add too much work, please update the database first, then delete the cache!

What if I want the database to be strongly consistent with the cached data

So, what if I wanted to guarantee absolute consistency? Here’s the conclusion:

There is no way to achieve absolute consistency, which is determined by THE CAP theory. The cache system applies to scenarios that are not strongly consistent, so it belongs to the AP in CAP.

So, we have to compromise, we can achieve the final consistency of BASE theory.

Final consistency emphasizes that all copies of data in the system can eventually reach a consistent state after a period of synchronization. Therefore, the essence of final consistency is that the system needs to ensure the consistency of the final data, rather than ensuring the strong consistency of the system data in real time

The main solution is to deal with the dirty data caused by the above two double-write strategies (delete cache first, then update database/update database first, then delete cache) to ensure the final consistency.

Cache delay dual-delete

Q: Delete the cache first, then update the database to avoid dirty data?

Answer: Adopt the delayed dual-delete policy.

As mentioned above, if the cache is deleted and then the database is updated, the data will always be dirty if you do not set an expiration time policy for the cache.

So how to solve the problem of delayed double deletion?

(1) First eliminate cache

(2) Write database again (these two steps are the same as before)

(3) Sleep for 1 second and eliminate the cache again

By doing so, you can delete the cached dirty data that has been generated within 1 second.

So, how do you determine this one second, and how long should you sleep?

In this case, the reader should evaluate the time it takes to read the data business logic for his project. Then the sleep time of write data is based on the time of read data business logic, add several hundred ms. The purpose of this operation is to ensure that the read request ends and the write request can delete the cache dirty data caused by the read request.

What if you use mysql’s read-write separation architecture?

Ok, in this case, the data inconsistency is caused by the following, again two requests, one request A to update, and the other request B to query.

(1) Request A to perform write operations and delete the cache

(2) request A to write data to the database,

(3) Request B to query the cache and find no value in the cache

(4) Request B to query the slave database. At this time, the master/slave synchronization has not been completed, so the queried value is the old value

(5) Request B to write the old value to the cache

(6) The database completes the master/slave synchronization, and the slave library changes to the new value

This is why the data are inconsistent. The dual-delete delay policy is still used. However, the sleep time is modified to add several hundred ms on the basis of the delay time of master and slave synchronization.

What about throughput degradation with this synchronous obsolescence strategy?

Ok, then make the second deletion asynchronous. Own a thread, asynchronous deletion. In this way, the written request does not have to sleep for a while and then return. By doing this, you increase throughput.

Therefore, if the cache is deleted first and the database is updated, you can use the delayed double deletion policy to ensure that dirty data only survives for a period of time and is overwritten by accurate data.

In the case of updating the database and then deleting the cache, dirty data in the cache is rare, but it can happen. We can still use A delayed double-delete policy to delete the cache again after request A writes dirty old values to the cache. To make sure the dirty cache is removed.

What if cache deletion fails: Retry mechanism

It seems that all the problems have been solved, but in fact, there is still a problem that has not been considered, that is, what if the operation of deleting the cache fails? For example, when the second cache deletion fails, the dirty data is still not cleared.

The solution is to add a retry mechanism to ensure successful cache deletion.

Refer to the scheme diagram given by Teacher Lonely Yan:

Solution a:

The process is shown below

(1) Update database data;

(2) The cache fails to be deleted due to various problems

(3) Send the key to be deleted to the message queue

(4) Consume the message and obtain the key to be deleted

(5) Retry the deletion until the deletion succeeds

However, this scheme has a disadvantage of causing a lot of intrusion into the line of business code. This leads to plan 2, in which a subscriber is started to subscribe to the binlog of the database to obtain the data to be manipulated. In the application, start another program, get the information from the subscriber, and delete the cache.

Scheme 2:

The process is as follows:

(1) Update database data

(2) The database will write the operation information to the binlog

(3) The subscriber extracts the required data and key

(4) Start another non-business code to obtain the information

(5) Failed to delete the cache

(6) Send the information to the message queue

(7) Retrieve the data from the message queue again and retry the operation.

The middleware for reading binlog can use Canal, which is open source of Ali

Well, here we have put the cache double write consistency of the idea of a thorough comb again, the following is my several ideas of freehand combat code, convenient for friends in need of reference.

Cache and database consistency practices

Combat: delete the cache first, then update the database

Finally, we added an interface to the SEC kill code: delete the cache first, then update the database

New to OrderController:

/** * Delete the cache first, * @requestMapping ("/createOrderWithCacheV1/{sid}") @responseBody public String * @requestMapping ("/createOrderWithCacheV1/{sid}") @responseBody public String createOrderWithCacheV1(@PathVariable int sid) { int count = 0; Try {/ / delete inventory cache it. DelStockCountCache (sid); / / finish buckle inventory order transaction orderService. CreatePessimisticOrder (sid); } catch (Exception e) {logger. error(" purchase failed: [{}]", LLDB message ()); Return "Purchase failed, stock insufficient "; } logger. info(" purchased successfully, remaining inventory is: [{}]", count); Return string. format(" purchased successfully, remaining inventory: %d", count); }Copy the code

Added the following to stockService:

@Override public void delStockCountCache(int id) { String hashKey = CacheKey.STOCK_COUNT.getKey() + "_" + id; stringRedisTemplate.delete(hashKey); Logger. info(" Delete commodity ID: [{}] cache ", id); }Copy the code

The other code involved is described in the previous three articles, and you can directly go to Github to get the project source code, not repeated here.

Combat: update the database first, and then delete the cache

If you update the database first and then delete the cache, the code is just reversed in the order of the operations.

/** * First update the database, * @requestMapping ("/createOrderWithCacheV2/{sid}") @responseBody Public String createOrderWithCacheV2(@PathVariable int sid) { int count = 0; Try {/ / finish buckle inventory order transaction orderService. CreatePessimisticOrder (sid); / / delete inventory cache it. DelStockCountCache (sid); } catch (Exception e) {logger. error(" purchase failed: [{}]", LLDB message ()); Return "Purchase failed, stock insufficient "; } logger. info(" purchased successfully, remaining inventory is: [{}]", count); Return string. format(" purchased successfully, remaining inventory: %d", count); }Copy the code

Combat: Cache delay double delete

The best way to do this is to create a Thread pool and delete keys in the Thread rather than wait for thread.sleep, which blocks the user’s request.

The cache is deleted before the update, then the data is updated, and then the cache is delayed.

New interface in OrderController:

Private static final int DELAY_MILLSECONDS = 1000; private static final int DELAY_MILLSECONDS = 1000; /** * Delete the cache first, then update the database, @requestMapping ("/createOrderWithCacheV3/{sid}") @responseBody public String createOrderWithCacheV3(@PathVariable int sid) { int count; Try {/ / delete inventory cache it. DelStockCountCache (sid); / / finish buckle inventory order transaction count = orderService. CreatePessimisticOrder (sid); Cachedthreadpool. execute(new delCacheByThread(sid)); } catch (Exception e) {logger. error(" purchase failed: [{}]", LLDB message ()); Return "Purchase failed, stock insufficient "; } logger. info(" purchased successfully, remaining inventory is: [{}]", count); Return string. format(" purchased successfully, remaining inventory: %d", count); }Copy the code

New thread pool in OrderController:

Private static ExecutorService cachedThreadPool = new ThreadPoolExecutor(0, integer. MAX_VALUE, 60L, TimeUnit.SECONDS,new SynchronousQueue<Runnable>()); /** * private class delCacheByThread implements Runnable {private int sid; public delCacheByThread(int sid) { this.sid = sid; } public void run() {try {logger. info(" async: async: async: async: async: async: async: async: async ", sid, DELAY_MILLSECONDS); Thread.sleep(DELAY_MILLSECONDS); stockService.delStockCountCache(sid); Logger. info(" Delete item ID: [{}] cache again ", sid); } catch (Exception e) {logger. error("delCacheByThread failed to execute ", e); }}}Copy the code

To experiment, call the createOrderWithCacheV3 interface:

In the log, two deletions have been made:

Combat: Delete the cache retry mechanism

As mentioned above, to solve the deletion failure problem, you need to use message queues to retry the deletion operation. Here we have RabbitMq for this to work and need to write and send messages to the interface and require a consumer resident to consume the messages. Spring’s RabbitMq integration is relatively simple, and I have posted the simple integration code.

Pom.xml adds RabbitMq dependencies:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-amqp</artifactId>
</dependency>
Copy the code

Write a RabbitMqConfig:

@Configuration public class RabbitMqConfig { @Bean public Queue delCacheQueue() { return new Queue("delCache"); }}Copy the code

Add a consumer:

@Component @RabbitListener(queues = "delCache") public class DelCacheReceiver { private static final Logger LOGGER = LoggerFactory.getLogger(DelCacheReceiver.class); @Autowired private StockService stockService; @rabbithandler public void process(String message) {logger. info("DelCacheReceiver received message: "+ message); Logger. info("DelCacheReceiver start deleting cache: "+ message); stockService.delStockCountCache(Integer.parseInt(message)); }}Copy the code

New interface in OrderController:

/** * Update the database first, then delete the cache, * @param sid * @return */ @requestMapping ("/createOrderWithCacheV4/{sid}") @responseBody public String createOrderWithCacheV4(@PathVariable int sid) { int count; Try {/ / finish buckle inventory order transaction count = orderService. CreatePessimisticOrder (sid); / / delete inventory cache it. DelStockCountCache (sid); Execute (new delCacheByThread(sid)); // Delete the cache again after a specified delay. SendDelCache (string.valueof (sid)); } catch (Exception e) {logger. error(" purchase failed: [{}]", LLDB message ()); Return "Purchase failed, stock insufficient "; } logger. info(" purchased successfully, remaining inventory is: [{}]", count); Return string. format(" purchased successfully, remaining inventory: %d", count); }Copy the code

Visit createOrderWithCacheV4:

As you can see, we first place the order, then delete the cache, and assuming that delayed cache deletion fails, send a message to the message queue to retry, and the message queue will delete the cache after receiving the message.

Combat: read binlog asynchronously delete cache

We need to use Ali open source Canal to read binlog for asynchronous cache deletion.

I wrote an introductory article on Canal, using the example of reading binlog to delete the cache. You can jump to here: Ali open source MySQL middleware Canal Quick Start

Further reading

There are four Design patterns for updating the cache:

  • Cache aside
  • Read through
  • Write through
  • Write Behind Caching. Here is Chen Hao’s summary article for study.

Coolshell. Cn/articles / 17…

summary

Quote the conclusion of Chen Hao’s “Routine of Cache Update” as a summary:

Distributed systems either guarantee consistency through 2PC or Paxos protocols, or desperately reduce the probability of concurrent dirty data

Cache system is applicable to non-consistent scenarios, so it belongs to AP and BASE theory in CAP.

Heterogeneous database can not be strong consistency, just reduce the time window as much as possible to achieve the final consistency.

And don’t forget to set an expiration date. It’s a backstop

conclusion

This paper summarizes and discusses the double-write consistency of cached databases.

The content of this article can be summarized as follows:

  • For data that is read too much and written too little, use caching.
  • Keeping the database and cache consistent can result in a decrease in system throughput.
  • Keeping the database and cache consistent can lead to complex business code logic.
  • Caching can’t be absolutely consistent, but it can be ultimately consistent.
  • In the case of the need to ensure the consistency of cached database data, please try to consider how high the requirements for consistency, select the appropriate solution, avoid over design.

The author’s level is limited, and there will inevitably be mistakes and omissions in the process of writing the article. Please discuss and correct rationally.

Code word is not easy, just for attention, welcome to pay attention to my original technology public number: Back-end technology talk (QR code see the bottom of the article)

reference

  • Cloud.tencent.com/developer/a…
  • www.jianshu.com/p/2936a5c65…
  • www.cnblogs.com/rjzheng/p/9…
  • www.cnblogs.com/codeon/p/82…
  • www.jianshu.com/p/0275ecca2…
  • www.jianshu.com/p/dc1e5091a…
  • Coolshell. Cn/articles / 17…

Pay attention to my

I’m a back-end development engineer. Focus on back-end development, data security, crawler, Internet of Things, edge computing and other directions, welcome to exchange.

I can be found on every platform

  • Wechat official account: A ramble on back-end technology
  • Making:@qqxx6661
  • CSDN: @Pretty three knives
  • Zhihu: @ Ramble on back-end technology
  • Jane: @pretty three knives a knife
  • Nuggets: @ pretty three knife knife
  • Tencent Cloud + community: @ Back-end technology ramble

Original article main content

  • Back-end development practices
  • Java Interview Knowledge
  • Design pattern/data structure/algorithm problem solving
  • Reading notes/anecdotes/Procedural life

Personal public account: Back-end technology ramble

If this article is helpful to you, please like it and bookmark it