Reprinted from: mp.weixin.qq.com/s/5rnkD-2cG…

preface

In April, a good friend went to Meituan for an interview. How can Redis and MySQL double write consistency be guaranteed? How can cache and database consistency be guaranteed in a dual-write scenario? Here’s how to answer that question.

Talk about consistency

Consistency is the consistency of data. In a distributed system, it can be understood as the consistency of data values across multiple nodes.

  • Strong consistency: This consistency level is the most intuitive, it requires the system to write, read what will also be good user experience, but the implementation of the system is often a big impact on the performance
  • Weak consistency: After data is written successfully, the system does not guarantee that the data can be read immediately or that the data consistency will be reached after the data consistency level is reached. However, the system tries to ensure that the data consistency will be reached after a certain time level (for example, second level)
  • Final consistency: Final consistency is a special case of weak consistency. The system ensures that data consistency can be achieved within a certain period of time. The reason why final consistency is singled out here is that it is a highly respected consistency model in weak consistency, and also a highly respected model in the industry for data consistency in large distributed systems

Three classic caching patterns

Caching can improve performance and relieve database stress, but using caching can also lead to data inconsistency problems. How do we use caching in general? There are three classic cache usage patterns:

  • Cache-Aside Pattern
  • Read-Through/Write-through
  • Write-behind

Cache-Aside Pattern

Cache-aside Pattern, also known as the bypass Cache Pattern, is proposed to solve the data inconsistency problem between the Cache and the database as much as possible.

Cache – value reading process

Cache-aside Pattern read request flow is as follows:

  1. When reading, the cache is read first. If the cache hits, data is returned directly
  2. If the cache doesn’t hit, it reads the database, fetches data from the database, puts it in the cache, and returns a response.

Cache – value writing process

Cache-aside Pattern write request flow is as follows:

When updating, update the database first and then delete the cache.

Read-through/write-through

In Read/ write-through mode, the server uses the cache as the primary data store. The application interacts with the database cache through the abstract cache layer.

Read-Through

The flow of Read-Through is as follows

  1. Read data from cache, read to return directly
  2. If not, load it from the database, write it to the cache, and return the response.

Isn’t this shorthand similar to cache-aside? Read-Through is simply a layer of cache-provider.

Read-through is really just a wrapper over cache-aside, which makes the program code cleaner while reducing the load on the data source.

Write-Through

Write-ThroughIn this mode, when a write request occurs, the value is also specified byCache abstraction layerUpdate data source and cache data as follows:

Write-behind (asynchronous cache Write)

Write-behind is similar to read-through/write-through in that the Cache Provider is responsible for Cache and database reads and writes. Read/ write-through updates the cache and data synchronously, while write-behind updates the cache asynchronously, not the database directly.

In this mode, the consistency between the cache and the database is not strong. Therefore, use this mode with caution on systems that require high consistency. However, it is suitable for frequent write scenarios, and MySQL’s InnoDB Buffer Pool mechanism uses this mode.

When operating on the cache, should I delete the cache or update it?

In everyday development, we use cache-aside mode. If cache-aside is used for writing requests, why should it delete the Cache instead of updating it?

Should we delete or update the cache when we operate on it? Let’s start with an example:

  1. Thread A initiates A write operation and starts by updating the database
  2. Thread B initiates another write operation, and the second step updates the database
  3. Thread B updates the cache first for network reasons
  4. Thread A updates the cache.

In this case, the cache stores A’s data (old data) and the database stores B’s data (new data). The data is inconsistent and dirty data appears. This dirty data problem does not occur if you delete the cache instead of updating it.

Updating the cache has two disadvantages over deleting the cache:

  • If you’re writing a cache value, it’s a complex calculation. Updating the cache too often wastes performance.
  • When there are many write database scenarios and few read data scenarios, the data will be updated before it is read, which also wastes performance.

In the case of double write, does the database or cache come first?

In cache-aside mode, some people might wonder why the database should be used first when writing a request. Why not do caching first?

Suppose there are two requests, A and B, asking A to perform the update operation and B to perform the query read operation.

  1. Thread A initiates A write operation, the first step being del cache
  2. At this point thread B initiates a read operation, cache Miss
  3. Thread B continues to read DB and reads old data
  4. Thread B then caches the old data
  5. Thread A writes the latest data to the DB

There is a problem with cache and database data inconsistency. The cache holds old data and the database holds new data. Therefore, cache-aside caching mode chooses to operate the database first rather than the Cache first.

  • Some people may ask, operate the database first and then cache, different will also lead to inconsistent data? They don’t operate atomically. This is true, but in this way, the probability of dirty data is very low because of the failure to delete the cache and other reasons. Friends can draw the operation flow chart, their first analysis. Let’s take a look at the case of cache deletion failure and how to ensure consistency.

The database is strongly consistent with the cached data, okay?

In fact, there is no way to achieve absolute database and cache consistency.

  • Can you lock it? Locks are locked during concurrent writes and any read operations are not written to the cache?
  • Cache and database encapsulate CAS optimistic lock, update cache through lua script?
  • Distributed transactions, 3PC? TCC?

In fact, this is determined by CAP theory. The cache system applies to non-consistent scenarios, which belong to the AP in the CAP. In my opinion, business scenarios that pursue absolute consistency are not suitable for the introduction of caching.

CAP theory refers to the incompatibility of Consistency, Availability and Partition tolerance in a distributed system.

However, weak consistency and final consistency can be guaranteed through some scheme optimization.

The three schemes ensure the consistency between database and cache

Cache delay dual-delete

Some people may say, it is not necessary to operate the database first, using the cache delay double deletion policy, can ensure data consistency. What is delayed double deletion?

  1. Delete the cache first
  2. Update the database again
  3. Sleep for a moment (say 1 second) and delete the cache again.

This one sleeps for a while. How long does it last? All one second, right?

This sleep time = time spent reading business logic data + several hundred milliseconds. To ensure that the read request ends, the write request can delete the cache dirty data that the read request may bring.

This solution is ok, only sleep for a short time (for example, only that 1 second), there may be dirty data, the general business will accept. But what if the cache fails to be deleted the second time? The cache and the database might still be inconsistent, right? How about setting a natural expire time for the Key to expire automatically? Does the business have to accept inconsistencies in the expiration date? Or is there something better?

Delete the cache retry mechanism

Whether it is delayed double deletion or cache-aside, it may fail to delete the Cache in the second step, resulting in data inconsistency. We can use this solution to optimize: delete the cache several times if the deletion fails, so we can introduce a retry mechanism for deleting cache

  1. Write requests update the database
  2. The cache failed to be deleted for some reason
  3. The key that failed to delete is placed on the message queue
  4. Consume messages from the message queue to get the key to delete
  5. Retry the cache deletion operation

Read biglog asynchronously delete cache

Retry delete cache mechanism is ok, but it will cause a lot of business code intrusion. In fact, you can optimize it by using the database’s binlog to asynchronously weed out keys.

Take mysql for example

  • Ali’s Canal can be used to send the binlog collection to the MQ queue

    Then, the ACK mechanism is used to confirm and process the update message, and the cache is deleted to ensure data cache consistency

Personal website

  • Github Pages
  • Gitee Pages