Introduction: IN April this year, I heard about the offline sharing of “library and cache consistency”, and I gained a lot. Based on this share and previous project experience, let’s summarize the consistency issues between libraries and caches.

Introduction to Caching:

Caching is a universal solution to speed up access and shorten RT on your computer, and it’s everywhere. The browser Cache, THE CDN Cache, the application server Cache, and the database itself can be considered as a kind of Cache when the hot data is stored in memory. Even the CPU has multi-level Cache to Cache data and speed up the data access rate. In summary, in order to speed up data access, shorten the access link, and reduce the cost of data acquisition, we can use caching technology solution, space for time. Getting back to the point, this article focuses on consistency between the database and the cache on the application server.

Four common solutions

Solution 1: Cache Aside Pattern

Read requests

  1. Read the cache first and then the library
  2. If the cache hits, the data is returned
  3. If the cache misses, the library is read and the data is written to the cache, and then returned

Write requests

  1. Write data library
  2. Delete the cache

It is important to delete the cache rather than update it during the write request. The cache is updated on the next read request. Why did you choose to delete the cache instead of updating it? If the cache is updated, there is no guarantee that multiple processes will execute in the order of concurrent writes, and the old data may overwrite the new data. Although the Cache Aside Pattern scheme is widely used, it also has some problems. Problem a

As shown above, process A writes data to the library at time T1 and deletes the cache at time T2. In high concurrency scenarios, read requests between time T1 and time T2 will read inconsistent data from the cache and data in the library. Question 2

As shown in the figure above, process A writes data to the database at time T1 and fails to delete the cache at time T2. The cause of the failure will not be discussed in detail. In this case, the library and cached data can be inconsistent for a long time.

Question 3

As shown in the figure above, process A is A read request and process B is A write request.

  1. Process A reads the cache miss and then reads the value A from the library.
  2. Process A may switch for some reason.
  3. Process B executes the write library, writing the value B into the library;
  4. Process B deletes the cache.
  5. Process A queues to finish and continues execution, writing the value A to the cache.

At this time, the data in the library is B, and the data in the cache is A, and the data inconsistency occurs.

Method 2: Double delete

Programme two is a development of programme one.

Read requests are the same as in scenario 1

Write requests

  1. Delete the cache
  2. The data is written to the library
  3. Sleep time (usually a few hundred ms, or a random value within a time range, depending on the business scenario)
  4. Delete the cache

Compared with the Cache Aside Pattern method, the first time of the write request in this scheme is one more time to delete the Cache, so as to avoid data inconsistency in T1 ~ T2 time difference. There is a short wait before the cache is finally removed, which avoids problem number three in Scenario 1: data inconsistency due to context switches and other reasons.

Scheme 3: based on distributed locking scheme

Read requests

  1. Read the cache first and then the library
  2. If the cache hits, the data is returned
  3. If the cache is not hit, take the lock (retry multiple times)
  4. Read library and write data to cache
  5. Release the lock

Write requests

  1. Take the lock
  2. After the lock is successfully obtained, data is written to the library
  3. Delete the cache
  4. Release the lock

After locking, the data consistency performance is guaranteed, but the data access efficiency is greatly affected. Therefore, many times locking is not the ideal solution. However, it can also be considered in a low-write, high-read business scenario.

Solution 4: Delete the cache based on the Binlog subscription mode

Read requests

  1. Read the cache first and then the library
  2. If the cache hits, the data is returned
  3. If the cache misses, the library is read and the data is written to the cache, and then returned

Write requests write only to the database

For cache updates, we subscribe to the database log. For example, ali’s open source Canal is used to subscribe to MySQL’s Binlog, and then it is put into MQ. The consumer side consumes the data, and then compares the data with the data in the cache, and deletes the inconsistent data from the cache. If the deletion fails, you can try to delete the file multiple times.

In this scenario, the cache removal logic can be removed from the business code, and the business development can focus on the business; But additional components need to be introduced, with higher maintenance costs.

conclusion

These are common solutions for dealing with database and cache data consistency issues. One thing they all have in common is deleting the cache, not updating the cache. Either way, it’s hard to keep the library and cached data exactly the same. Therefore, in each scheme, asynchronous reconciliation logic can be added to periodically check whether the data in the library and cache are consistent. If inconsistent, the cache data can be deleted.

We sincerely invite you to follow the public number: Boat Row I will update technical articles every week, and you learn and progress together.