background
Caching is a very useful concept in software development, and database caching is an inevitable scenario in projects. The guarantee of cache consistency is repeatedly asked in the interview. Here is a summary to choose the right consistency scheme for different requirements.
What is a cache
There is a difference in the speed of storage. Caching is a technology that temporarily stores the results of low-speed storage in high-speed storage.
Our discussion this time, mainly for the database cache scenario, will take Redis as mysql cache as a case.
Why cache is needed
Storage, such as mysql, usually supports complete ACID properties. Due to reliability, persistence and other factors, performance is generally not high. High concurrent queries will put pressure on mysql, causing instability of the database system. It is also prone to delays. According to the principle of locality, 80% of the requests will fall on 20% of the hotspot data. In the scenario of over-read and over-write scenarios, adding a layer of cache greatly improves system throughput and robustness.
There is a problem
The stored data can change over time, and the data in the cache can be inconsistent. The specific time of inconsistencies that can be tolerated needs to be analyzed on a case-by-case basis, but the final consistency is required for common services.
Redis acts as mysql cache
In the usual development pattern, mysql is used as storage and Redis is used as cache to speed up and protect mysql. However, when mysql data is updated, how does Redis keep in sync?
The cost of strong consistency synchronization is too high, so if you want strong consistency, you don’t need to use caching, just use mysql. It’s always about final consistency.
The solution
Plan a
Redis does not update mysql when the key expires. This approach is simple to implement, but inconsistent for a long time. If the read requests are very frequent and expire for a long time, a lot of long-term dirty data will be generated.
Advantages:
- Low development cost and easy to implement;
- Low management cost, the probability of problems will be relatively small.
insufficient
- Completely dependent on expiration time, too short is prone to frequent cache failures, too long is prone to long update delays (inconsistent)
Scheme 2
Expand on scenario 1, with the expiration of key, and update redis when updating mysql.
advantages
- Compared with scheme 1, the update delay is smaller.
insufficient
- If updating mysql succeeds but updating Redis fails, scheme 1 is degraded.
- In high concurrency scenarios, the service server needs to connect to mysql and Redis at the same time. This will consume double connection resources, easy to cause the problem of too many connections.
Plan 3
In view of scheme 2, the synchronous redis writing is optimized by adding message queues and handing the REDIS update operation to Kafka. The message queues ensure the reliability, and then a consumption service is set up to asynchronously update REDis.
advantages
- Message queue can use a handle. Many message queue clients also support local cache sending, which effectively solves the problem of too many connections in scheme 2.
- Message queues are used to achieve logical decoupling.
- The message queue itself is reliable and can be consumed at least once to Redis by means such as manual submission.
insufficient
- It still does not solve the timing problem. If multiple business servers are processing two requests for the same row, for example, a = 1; a = 5; If the first order is executed in mysql and the second order is executed in Kafka, the data will be inconsistent.
- Message queues are introduced, and the cost is high to increase service consumption messages.
Plan 4
Update Redis by subscribing to binlog. The consumption service we build is a slave of mysql. Subscribe to binlog, parse out the updated content, and then update to Redis.
advantages
- When mysql pressure is not high, the latency is low.
- Completely decoupled from the business;
- The timing problem is solved.
disadvantages
- It is expensive to set up a separate synchronization service and introduce binlog synchronization mechanism.
conclusion
Scheme selection
- Start by identifying the latency requirements on the product. If the requirements are high and the data is likely to change, do not use caching.
- Generally speaking, plan 1 is enough. The author consulted 4 or 5 teams, and basically used Plan 1, because it can be cached, usually read more and write less, and the business has a certain tolerance for delay. Plan 1 has no development cost and is actually more practical.
- If you want to increase the immediacy of updates, choose Option 2, but there is no need for a retry guarantee.
- Plan 3 and plan 4 are for businesses with high delay requirements, one is push mode and the other is pull mode. Plan 4 has stronger reliability. Since they are willing to spend time on message processing logic, it is better to use Plan 4 in one step.
conclusion
In general, plan 1 is sufficient. If the delay requirement is high, choose Plan 4. If it is the interview scene, from simple to complex, the interviewer will ask step by step, let’s deduce step by step, guests and hosts will enjoy.