Data consistency between Redis and MySQL

Original public account: “Bigsai”

preface

For the Web, the increasing number of users and visits to a certain extent to promote the change and progress of project technology and architecture. The following situations may occur:

Page concurrency and page views are not much, MySQLEnough to supportOwn logic business development. So you can actually do without caching. At most, static pages are cached.
The number of concurrent pages increased significantly, the database was somewhat stressed, and some data was updated less frequentlyRepeatedly queriedOr query speedThe slower. Consider using caching techniques for optimization. High-hit objects are stored in key-value Redis, so that if the data is hit, it can not pass through the inefficient DB. Find data from efficient Redis.
Of course, there may be other problems, but you can also increase the concurrency of your system through static page caching, CDN acceleration, and even load balancing. I won’t introduce you here.

The idea of caching is everywhere

Let’s start with an algorithmic problem to understand what caching means.

Question 1:

Enter a number n(n<20) and findN!;

Analysis of 1:

Consider only the algorithm, not the numerical out-of-bounds problem. Of course we know thatn! =n * (n-1) * (n-2) * ... * 1= n * (n-1)!; So we can solve the problem with a recursive function.

static long jiecheng(int n)
{
	if(n==1||n==0)return 1;
	else {
	  return n*jiecheng(n-1); }}Copy the code

So it takes n times for each input. Question 2:

Input t groups of data (maybe hundreds or thousands), each group of one XI (xi<20), findXi!;

Analysis of 2:

If you are usingrecursive, input t group data, input xi each time, then The Times of execution each time are:When the input Xi is too large or t is too large, it will cause a lot of burden! Time complexity isO(n2)
Well, maybe we can change our mind. Yes,The meter. Tabling is often used in ACM algorithm to solve multiple groups of input and output, graph theory search results, path storage problems. So let’s take the factorial of this. We just need to apply for an array, according to the number of the required number in the array from front to back, then directly output the array value, the idea is very clear:

import java.util.Scanner;
public class test {
public static void main(String[] args) {
	// TODO Auto-generated method stub
	Scanner sc=new Scanner(System.in);
	int t=sc.nextInt();
	long jiecheng[]=new long[21];
	jiecheng[0] =1;
	for(int i=1; i<21; i++) { jiecheng[i]=jiecheng[i-1]*i;
	}
   for(int i=0; i<t; i++) {intx=sc.nextInt(); System.out.println(jiecheng[x]); }}}Copy the code

Time complexityO(n). The idea here is the sameThe cacheThe idea is similar. The data is first stored in the Jiecheng [21] array. Perform a calculation. When you continue to access it, you’re asking for static array values. O(1 operation) each time.

Application scenarios of caching

Cache is suitable for high concurrency scenarios to increase service capacity. The main idea is to save data from a slower medium to a faster medium, such as hard disk – > memory, which is often accessed or has high query costs. We know that most relational databases are based on hard disk reads and writes with limited efficiency and resources, while Redis is based on memory and its read and write speeds vary greatly. When the concurrency is too high and the performance of the relational database reaches a bottleneck, the frequently accessed data can be strategically placed in Redis to improve system throughput and concurrency.

For common sites and scenarios, relational databases can be slow in two main places:

The read and write I/O performance is poor
A number may be computed on a larger scale

Caching can reduce disk I/OS and relational database computations. The speed of reading is also reflected in two aspects:

Memory-based, fast read and write
Direct location of results using hashing algorithm requires no calculation

So for decent, somewhat large sites, caching is very necessary, and Redis is undoubtedly one of the best choices.

Issues that need attention

Improper use of caching can cause many problems. So some details need to be carefully considered and designed. Of course, the most difficult data consistency is analyzed separately below.

Cache or not

The project should not use the cache for the sake of using the cache, cache is not necessarily suitable for all scenarios, if the data consistency requirements are very high, or the data changes frequently but there are not many queries, or there is no concurrency, simple query does not need to cache, may waste resources and make the project become bloated and difficult to maintain. And there may be some data consistency issues to consider when using redis cache.

Rational cache design

When designing the cache, it is very likely to encounter multiple table queries. If you encounter multiple table query cache key value pairs, you need to consider the reasonable, should be split or combined? Of course, if there are many types of combinations but not many common ones, it can also be directly cached. The specific design should be based on the business needs of the project, and there is no very absolute standard.

Expiration Policy Selection

The cache is loaded with relatively hot and commonly used data, Redis resources are also limited, it is necessary to choose a reasonable strategy to delete the cache expiration. We’ve learnedThe operating systemIt also knows that there are fifO algorithms in a computer’s cache implementation (FIFO); Least Recently used algorithm (LRU); Optimal elimination algorithm (OPT); Minimum page Access algorithm (LFR) disk scheduling algorithm. The Redis cache can also be used as a reference. FIFO based on time is best implemented. And in RedisGlobal keyExpiration policies are supported.
And the expiration time should be set reasonably according to the system situation. If the hardware is good, it can be a little longer at present, but if the expiration time is too long or too short, it may not be good. If the expiration time is too short, the cache hit rate may not be high, and too long may cause a lot of unpopular data stored in Redis not released.

Data consistency problems ★

Data consistency is mentioned above. Caching is not recommended if consistency is high. Let’s comb through the cached data a little bit. Data consistency issues are often encountered in Redis caches. For a cache, there are several cases:

read

Read: read from Redis, or if not, get the updated Redis cache from MySQL. The following flow chart describes the general scenario, without controversy:

Write 1: Update the database first, then update the cache (normal low concurrency)

Update the database information and then update the Redis cache. This is the general practice, the cache is based on the database, taken from the database.

However, there may be some problems, such as the above failure to update the cache (downtime and other conditions), which will make the database and Redis data inconsistent. Create DB new data, cache old data.

Write 2: Delete the cache before writing to the database (low concurrency optimization)

Problem solved

This situation can effectively avoid the problem of preventing Redis writing failure in write 1. Delete the cache for update. The ideal is to make the next visit to Redis null and go to MySQL to get the latest value into the cache. However, this is only true in low-concurrency scenarios and not in high-concurrency scenarios.

Existing problems

Write 2, thoughLooks like a problem writing a Redis exception. What seems like a good solution can be problematic in high-concurrency scenarios. We are inWrite 1Discussed how cache updates fail to result in dirty data if a library update succeeds. The ideal is to remove the cache letNext threadAccess is suitable for updating the cache. The question is: if thisThe next thread came too early, too coincidentally?

Because of multithreading you don’t know who’s first or who’s fast or who’s slow. As shown in the figure above, Redis cache data will be inconsistent with MySQL. Of course you can lock the key. However, something as heavy as locking can affect concurrency too much, so don’t use it if you can! In this case, high concurrency still results in the cache being old data and the DB being new data. And the problem persists if the cache is not expired.

Write 3: delayed dual-delete policy

This is the delayed double deletion policy, which can alleviate the Redis cache and MySQL data inconsistency caused by the entry of the reading thread during the MySQL update process in write 2. The method is to delete the cache -> update the cache -> delay (several hundred ms)(asynchronous) to delete the cache again. Even if a write 2 problem occurs on the way to updating the cache. Cause data inconsistency, but delay (the specific reality depends on the business, generally several hundred ms) delete again can quickly solve the inconsistency.

However, there are still some bugs in the write scheme, such as the second delete error, the pressure on MySQL access under the condition of high concurrency of multiple writes and multiple reads, etc. Of course you can choose to do this asynchronously with message queues such as MQ. In fact, the actual solution is difficult to take into account the foolproof, so many big bosses in the design of this link may be sprayed because of some mistakes. As the author of vegetables here is not more ugly, you welcome to contribute your plan.

Write 4: Direct cache operation, periodic write to SQL (suitable for high concurrency)

When a bunch of concurrent (write) messages are thrown at them, the first few schemes, even using message queues for asynchronous communication, are hardly a comfortable experience for the user. And large-scale SQL operations on the system will also cause a lot of pressure. So another option is to manipulate the cache directly, writing the cache to SQL periodically. Because Redis is a non-relational database and based on memory operation KV is much faster than the traditional relational database.

The above is suitable for business design under high concurrency, in which Redis data is the main data and MySQL data is the auxiliary data. Periodically insert (like a data backup library). Of course, such high concurrency often has different requirements on the order of read and write, and may be completed by message queue and lock. In view of the uncertainty and instability of data and order caused by high concurrency and multi-threading, business reliability can be improved.

In short, the higher the concurrency and the higher the requirements for data consistency, the more complex and more complex the data consistency design scheme needs to be considered. The above is also the author of Redis data consistency problem learning and self-divergence (nonsense) learning. If there is explanation understanding unreasonable or still ask each big guy to point out!

Finally, if you feel good, please also one key three, welcome to pay attention to the original public number: “Bigsai”, give you a lot of advanced information, reply to “Bigsai” password can get!