The interview began

Hello young man, see you have MySQL and Redis on your resume. Let’s focus on them both today. Redis and MySQL are key players in back-end development. In practice, the two go hand in hand. To improve performance and response, Redis often stores hot data, while MySQL stores all data to ensure data persistence. So Redis is part of MySQL’s data.

How do you notify Redis when persistent data in MySQL changes? That is, how do you ensure that the cache and database data are double-write consistent?


Hello, interviewer, our plan in development is to update the database first, then delete the corresponding cache, until the next request cache found no data, then read from MySQL, and update the data to Redis.

So why delete the cache instead of updating it?


As shown in the figure below, it is possible to update the cacheRequest A occurs before request B, and the cache should be updated earlier than request B. However, due to network reasons, THE cache of B is updated earlier than that of A, which leads to dirty data.

Secondly, if you have a business requirement with more write database scenarios and less read data scenarios, using the update cache scheme will result in the data has not been read at all, but the cache has been updated frequently, wasting performance.

Would it be a problem to delete the cache first and then update the database?


As shown in the figure below, A is requested to perform write operations and delete the cache. When B finds that the cache does not exist, it queries the database to get the old value, and then writes the old value into the cache. At this time, A is requested to write the new value into the database.

This situation can lead toData inconsistency“. Also, if you don’t set an expiration time policy for the cache, the cached data will always be dirty.

Well, all of these scenarios do have problems in a concurrent environment. Will you update the database first and then delete the cache without concurrency problems?


The answer isNot necessarily. Concurrency problems can occur. As shown in the picture below,

If Step3 takes less time to update the database than Step2 to read the database, Step4 may precede Step5, and the cache is dirty. But in generalThe speed of database read operations is much faster than that of write operations (as can be seen from the amount of concurrent read and write in MySQL, the efficiency of concurrent read operations under the same hardware configuration is several times that of concurrent write operations).

Therefore, if you want to implement basic cache and database double write consistent logic, in most cases, update the database first and then delete the cache without doing too much design and work!

Update the database first, then delete the cache, in addition to the problem you just said, there will be other problems?


ifMySQL uses the read-write separation architectureWhen A is requested to update the data on the master database and the cache is removed, but the master/slave synchronization of the database has not been completed. After a Cache miss occurs in the query Cache of B, the old value is still read from the slave library, which will also cause data inconsistency.

You just mentioned that updating the database first and then deleting the cache may also cause data inconsistency. How to solve this problem?


usingDelay double delete. As shown in the figure below, request A update database, in order to prevent delete cache first occurred in request to write cache into the old value of B can use the request A after updating the database to sleep for A while (for example 100 ms and 200 ms, according to actual business scenarios for), and then delete the cache, such basic can guarantee deposit will not be dirty data cache. The master/slave schema works the same way: request A does not delete the cache immediately after updating the masterDelayed dual-delete ensures that primary/secondary synchronization is complete, and finally delete the cached data.

However, if you request A to sleep for A period of time, it may affect the RT of the interface and reduce the throughput of the system. How to solve this problem?


The more elegant solution here is to implement it asynchronously. That is, open A thread pool, open A single thread on request A, hibernate asynchronously for A period of time and then perform cache deletion. It is possible to remove the corresponding key from the cache asynchronously by dropping it on a message queue, but adding another layer of message queues to the cache just for asynchronous deletion can complicate the system design and cause other problems.

I’ve been talking about removing the cache, but what if I fail to remove the cache?


A retry mechanism is added to ensure successful cache deletion.

What if I must have database and cache consistency?


There is no way to achieve absolute consistency, which is determined by THE CAP theory. The cache system applies to scenarios that are not strongly consistent, so it belongs to the AP in CAP.

CAP theory is the classical theory of distributed system, namely Consistency, Availability and Partition tolerance.

According to the BASE (Basically Available, Soft State, and Eventually Consistent) theory, caching and databases only achieve the ultimate consistency of data.

The end of the interview

It’s not too early, so let’s stop here today. It can be seen that the young man has a more thorough grasp of this area. Our company is short of talents like you, so why not sign the Offer now? This time you must be eager to greet, a hand to accept the Offer side pendulum hand: no line, Shenzhen horse there is also anxious to wait for me to reply, urged me several days. When the interviewer listens, where is Payroll’s team?

summary

It is not easy to use the cache, especially in the case of strong consistency between the cache and the database, to know that consistency between the database data and the cache data is a very sophisticated knowledge. From the ancient days of hardware caching and operating system caching, caching is a unique science. This issue has been debated in the industry for a long time, and has been debated fruitlessly, because it’s really a matter of tradeoffs. I am Xiaoxia Lufei. I love technology and sharing.