Cache and database consistency issues
preface
In the process of our production, we will find that 80% of our business is driven by 20% of the data, which we often call hot data, and this phenomenon is also known as the 80/20 rule. This phenomenon of uneven data access allows us to adopt the most effective technology — caching to improve the performance of our entire system, but when we use caching, we inevitably have to consider the problem of cache and database data consistency.
Data consistency problems under distributed conditions refer to my blog: Data consistency Problems under Distributed Conditions
The body of the
Cache and database consistency issues
Cache and database double-write consistency problems
- Strong consistency: Cache and database data are always consistent
- Final consistency: The cache data is inconsistent with the database data for a period of time. The query result is not affected.
A solution to cache consistency
- Delayed dual-delete policy
- Update the cache via message queues
- through
binlog
To synchronizemysql
The database toredis
In the
Delayed dual-delete policy
Delayed dual-delete Policy A write operation performs the following operations:
- Make caching obsolete
- Rewrite database
- Hibernate for 1 second to flush the cache again
Next, we need to figure out why we should use the strategy of eliminating the cache first and writing the database later.
Disadvantages of writing to the database and then updating the cache
Request A and request B to update the database at the same time.
- Thread A updates the database;
- Thread B updates the database;
- Thread B updates the cache;
- Thread A updates the cache; (A network fluctuation)
A should update the cache earlier than B, but B updates the cache earlier than A because of network reasons. This leads to dirty data. In addition, this situation can only be resolved when the cache is invalid. In this case, services will be greatly affected.
2. Business direction (why phase-out cache instead of updating cache)
- The purpose of caching is to improve the performance of read operations. If you frequently update the cache without a read operation, the performance will be wasted. Therefore, the cache generation should be triggered by the read operation, and the cache elimination strategy should be adopted during the write operation.
- Sometimes we may do some other conversion while caching, but if it is changed immediately, it will also waste performance.
It is not a perfect solution, but it is the most reasonable method. It has the following special cases:
- Write request A performs write operations and deletes the cache.
- Read request B finds that the cache does not exist.
- Read request B to the database query to get the old value;
- Read request B writes the old value to the cache;
- Write request A writes the new value to the database. (Taking no action here will cause the database to be inconsistent with the cached data)
This leads to inconsistencies.
The delayed double deletion policy is used to solve the problem that data inconsistency may occur when the cache is eliminated before the database is written. In this case, write request A should sleep for one second and then the cache is eliminated again:
- If the above method is adopted, data inconsistency will occur for nearly 1 second (less than 1 second minus the read request operation time) during the first write operation. After 1 second, cache flushing will be performed again. After the next read operation, data consistency between the database and the cache will be guaranteed.
- The 1 second mentioned here is used to ensure that the read request ends (usually several hundred ms), and the write request can delete the cache dirty data caused by the read request.
In addition, there is an extreme case: if the second flush cache fails, the data and cache will always be inconsistent, so
- The cache is set to expire
- Set up a retry mechanism or use message queuing to ensure that the cache is eliminated.
Update the cache via message queues
Using message queue middleware can ensure the consistency between database data and cache data.
- Asynchronous update cache is implemented to reduce the coupling of the system
- But it destroys the timing of data changes
- The cost is relatively high
Use binlog to synchronize mysql database to Redis
Any changes made to the Mysql database at any time are recorded in the binglog; When data is added, deleted, or modified, all database objects are recorded in the binlog, and database replication is based on the binlog to synchronize data
- in
mysql
When the pressure is not high, the delay is low; - Completely decoupled from the business;
- Solve the problem of timing;
- The cost is relatively large.
\