When interviewing, always encounter such a scene.
1. Scenario analysis
Interviewer: What is the QPS of your service?
Me: We have quite a lot of traffic during the peak period, about 30,000.
Interviewer: Can your server handle all this traffic? Cache it?
I: Yes, we use Redis to do the cache, the interface preferentially queries the cache, the cache does not exist, only access the database. This can reduce the database access pressure and speed up the query efficiency.
Interviewer: The data is stored in two places. How do you keep the data consistent when you update it?
As you can see, a good interviewer will not ask you for a data consistency solution directly. Instead, he or she will ask you for a solution based on a specific situation. If you haven’t done this before, and you don’t have actual experience online, it’s hard to be organized and thoughtful.
There are generally four ways to ensure data consistency:
- Update the cache first, then the database.
- Update the database first, then the cache.
- Delete the cache first, then update the database.
- Update the database before deleting the cache.
Each option is discussed in detail:
2. Solutions
2.1 Update the cache first and then the database
If two concurrent write requests come at the same time, the execution process looks like this:
- Write request 1 updates the cache and sets age to 1
- Write request 2 updates the cache and sets age to 2
- Write request 2 updates the database and sets age to 2
- Write request 1 updates the database and sets age to 1
As a result, age in the cache is set to 2 and age in the database is set to 1, resulting in data inconsistency. This solution is not feasible.
2.2 Update the database first and then the cache
If two concurrent write requests come at the same time, the execution process looks like this:
- Write request 1 updates the database and sets age to 1
- Write request 2 updates the database and sets age to 2
- Write request 2 updates the cache and sets age to 2
- Write request 1 updates the cache and sets age to 1
As a result, age in the database is set to 2 and age in the cache is set to 1, resulting in data inconsistency. This solution is not feasible.
2.3 Delete the cache before updating the database
If two concurrent read and write requests arrive at the same time, the execution will look like this:
- The write request removed the cache
- The read request queries the cache for data, then queries the database, and writes the data to the cache
- The write request updates the database
The result is that the old data is in the cache and the new data is in the database, resulting in data inconsistency. This solution is not feasible.
2.4 Update the database before deleting the cache
This scheme will not cause problems when writing concurrently. Since the database is updated first and then the cache is deleted, there is no inconsistency.
However, data inconsistency can occur during concurrent reads and writes.
- The read request queries the cache for data and then queries the database
- The write request updates the database and removes the cache
- Read requests write back to the cache
The result is that the old data is in the cache and the new data is in the database, resulting in data inconsistency.
In fact, the probability of this occurrence is very low, write cache is several orders of magnitude faster than write database, read and write cache are memory operations, very fast.
Encounter this extreme scenario, we also need to do a bottom line scheme, cache to set the expiration time. This scheme belongs to weak consistency and final consistency of data, not strong consistency.
3. Summary and reflection
One reader may wonder why not add transaction annotations to update cache and database methods to achieve strong consistency, so that neither solution will be a problem.
Yes, adding local transactions is feasible when our service is only on one machine. However, in our work, we deploy a service on dozens or hundreds of machines, and sometimes add a layer of local cache to Redis cache to cope with more extreme query requests. In this case, local transactions will not work.
There are multiple copies of the same data on multiple machines. To achieve strong consistency, we can also use distributed transactions. Updating the cache would be more complicated than it would be worth.
However, in other scenarios, such as updating order status and updating user assets, we need to achieve strong data consistency no matter how much we pay. The specific implementation solutions are generally as follows:
- Phase 2 delivery
- TCC
- Local message table
- MQ transaction messages
- Distributed transaction middleware
In the next article, we will analyze the advantages and disadvantages of these schemes in detail.