Distributed lock combat based on Redis

Background In many Internet applications, some scenarios need to be locked, such as: seckill, global increment ID, floor generation, and so on. Most of the solutions are based on DB, Redis is a single process single thread mode, using queue mode to change the concurrent access into serial access, and there is no competition between multiple clients on the connection of Redis. Secondly, Redis provides some commands SETNX and GETSET, which can facilitate the realization of distributed locking mechanism.

The Redis command describes the syntax of the SETNX command (SET if Not eXists). The SETNX key value function is as follows: Set key to value and return 1 if and only if key does not exist. If the given key already exists, SETNX does nothing and returns 0.

GETSET key value Sets the value of the given key to value and returns the old value of the key, an error if the key exists but is not a string, and nil if the key does not exist.

GET key function: Returns the string value associated with the key, or nil if the key does not exist.

Syntax of the DEL command: DEL key [key… Run the following command to delete one or more keys:

There are not many soldiers. Distributed locks, we rely on these four commands. But in the implementation, there are many details that need to be carefully considered, because in a distributed, concurrent multi-process, any error at one point can cause a deadlock that holds all processes.

Locking implementation

SETNX can be used directly to lock, for example, foo. Lock

If 1 is returned, the client has obtained the lock and can proceed. After the operation is complete, run DEL foo.lock

Command to release the lock. If 0 is returned, foo has been locked by another client, and if the lock is not blocked, you can choose to return the call. If the call is blocked, the following retry cycles need to be entered until the lock is successfully acquired or a retry timeout occurs. Ideal is beautiful, reality is cruel. Using only SETNX locks with race conditions can cause deadlock errors in certain cases.

To deal with the deadlock

In the above process, a deadlock occurs if the client that acquired the lock takes too long to execute, the process is killed, or some other exception crashes and prevents the lock from being released. Therefore, the need to lock to do timeliness detection. Therefore, when we are in the lock, the current timestamp as the value stored in the lock, through comparing the current timestamp and the timestamp of Redis, if more than a certain difference, think lock has limitation, prevent the lock lock down indefinitely, however, in the big concurrent conditions, if detection lock failure at the same time, delete the deadlock and simple and crude, Locking through SETNX may result in a race condition where multiple clients acquire the lock simultaneously.

C1 acquires the lock and crashes. When C2 and C3 call SETNX lock and return 0, they get the timestamp of foo.lock. By comparing the timestamp, they find that the lock timed out. C2 sends DEL to foo.lock. C2 sends SETNX to foo.lock for the lock. When C3 sends DEL to foo.lock, DEL drops C2’s lock. C3 sends SETNX to foo.lock for the lock.

At this point C2 and C3 both acquire locks, creating a race condition, and in the case of higher concurrency, more clients may acquire locks. So, the DEL lock operation cannot be used directly in the case of lock timeout, fortunately we have GETSET method, suppose we now have another client C4, let’s see how to use GETSET method to avoid this situation.

C1 acquires the lock and crashes. After C2 and C3 call SETNX lock and return 0, they call GET to GET the timestamp T1 of foo.lock. C4 sends the GESET command to foo.lock, GETSET foo.lock

and gets the old timestamp T2 in foo.lock

If T1=T2, then C4 gets a timestamp. If the T1! =T2, indicating that another client C5 obtained the timestamp by calling GETSET before C4, and C4 did not obtain the lock. I can only sleep and go into the next cycle.

The only question now is whether C4’s setting of foo.lock’s new timestamp has any effect on the lock. In fact, we can see that the time difference between C4 and C5 execution is very small, and the valid times written to foo.lock are all wrong, so there is no effect on the lock. To make the lock stronger, the client that acquired the lock should call the GET method again when calling the critical business to GET T1 and compare it with the written T0 timestamp, lest the lock be accidentally unlocked by DEL in other cases. The above steps and circumstances are easy to see in other resources. Client processing and failure can be complex, not just a crash, but a client that is blocked for a long time and then tries to execute the DEL command (but the lock is held by another client). It can also lead to deadlocks because of improper handling. It is also possible that Redis will be overwhelmed by the large number of concurrent sessions because sleep is not set properly. The most common problems are also

What kind of logic should I use when GET returns nil?

C1 client acquires the lock and, after processing it, DEL releases the lock before DEL locks it. C2 uses SETNX to set timestamp T0 to foo.lock to find that a client has acquired the lock and enter the GET operation. C2 sends the GET command to foo.lock to GET the return value T1(nil). C2 enters the GETSET process through T0>T1+ EXPIRE comparison. C2 calls GETSET to send a T0 timestamp to foo.lock, returning the original value of foo.lock T2 C2 if T2=T1 is equal, obtains the lock if T2! =T1, no lock was obtained.

Setnx logic C1 client acquires the lock, and after processing, DEL drops the lock, before DEL locks. C2 uses SETNX to set timestamp T0 to foo.lock to find that a client has acquired the lock and enter the GET operation. C2 sends the GET command to foo.lock to GET the return value T1(nil). C2 loops into the next SETNX logic

Both logics seem to be OK, but the first case is logically problematic. When GET returns nil, the lock was removed, not timed out, and should be locked using SETNX logic. SETNX = GETST = GETST = GETST = GETST = GETST = GETST = GETST = GETST = GETST = GETST = SETNX

What happens when GETSET returns nil?

C1 and C2 clients call the GET interface, C1 returns T1, C3 network is better, quick access to acquire lock and execute DEL to delete lock, C2 returns T2(nil), C1 and C2 both enter timeout processing logic. C1 sends the GETSET command to foo.lock to get the return value T11(nil). C1 finds the difference between C1 and C11, and the processing logic considers that the lock has not been acquired. C2 sends the GETSET command to foo.lock to get the return value T22(the timestamp written by C1). C2 compares C2 and C22 and finds that they are different. The processing logic considers that the lock has not been acquired.

At this point, both C1 and C2 consider that they have not acquired the lock. In fact, C1 has acquired the lock, but its processing logic does not consider the case that GETSET returns nil. It just uses GET and GETSET values to compare. One is that when multiple clients connect to Redis, the commands issued by each client are not consecutive, resulting in the seemingly consecutive commands seen from a single client. After Redis Server is connected, a large number of commands issued by other clients, such as DEL and SETNX, may have been inserted between the two commands. In the second case, the time between multiple clients is out of sync, or not strictly synchronized.

Time stamp problem

We see that foo.lock’s value is a timestamp, so to ensure that the lock is valid in the case of multiple clients, it is important to synchronize the time of each server. If the time of each server is different, the time of each server must be synchronized. Clients with inconsistent time will be biased in determining lock timeout, resulting in race conditions. The timeout of the lock is strictly dependent on the timestamp, and the timestamp itself is also precision limited. If our time precision is seconds, the general operation from locking to performing the operation and then unlocking can be completed within a second. In this CASE, the CASE above is easy to appear. Therefore, it is best to increase the time accuracy to the millisecond level. In this way, the lock at the millisecond level is guaranteed to be secure.

Distributed lock issues

1: Necessary timeout mechanism: Once the client that obtains the lock crashes, it must have an expiration mechanism; otherwise, other clients cannot obtain the lock, resulting in deadlock problems. 2: distributed lock. Time stamps of multiple clients cannot guarantee strict consistency, so there may be lock strings under certain factors. To moderate the mechanism, can withstand the occurrence of low probability events. 3: Only key processing nodes are locked. It is a good habit to prepare relevant resources. For example, after connecting to the database, the locking mechanism is called to obtain the lock. 4: During the lock holding period, if it depends strictly on the lock status, it is better to implement the lock CHECK mechanism in the key step. However, according to our test, in the case of large concurrency, each CHECK lock operation will consume several milliseconds, while our whole lock holding processing logic is less than 10 milliseconds. The player did not choose to do the lock check. 5. Sleep knowledge. In order to reduce the pressure on Redis, sleep operation must be performed between cycles when trying to acquire locks. But sleep time is a science. Need to be based on their own Redis QPS, plus lock processing time for reasonable calculation. 6. As for why we don’t use Redis’ muti, expire, watch and other mechanisms, please refer to the reference to find out the reasons.

7. If you want to have a deeper understanding of distributed technology, I recommend an architecture exchange learning group: 650385180, which will share some videos recorded by senior architects: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principles, JVM performance optimization has become an architect’s essential knowledge system. Also can receive free learning resources, I believe that has worked and encountered a technical bottleneck code friends, in this group will have the content you need.

Lock test data

Sleep is not used for the first type, which does not sleep when the lock retries. Single request, lock, execute, unlock time

You can see that lock and unlock time is very fast when we use

Ab – n1000 – c100 ‘http://sandbox6.wanke.etao.com/test/test_sequence.php?tbpm=t’ ab concurrent 100 cumulative 1000 requests, pressure measurement with this method.

We can see that the time to acquire the lock becomes, and the time to execute the lock becomes, and the time to delete the lock is approximately 10ms. Why is this? 1: After holding the lock, our execution logic includes calling the Redis operation again. In the case of large concurrency, the Redis execution becomes significantly slower. 2: The lock deletion time becomes longer, from 0.2ms to 9.8ms, and the performance drops by nearly 50 times. In this case, we had a QPS of 49, and it turned out that QPS was related to the total number of requests, and when we sent 100 requests for a total of 100 requests, QPS got more than 110. When we use sleep

When using the Sleep

When a request is executed once

We see performance comparable to that without the sleep mechanism. When the same pressure measurement conditions were used for compression

The lock acquisition time is significantly longer, and the lock release time is significantly shorter, which is only half of that without sleep. Of course, the execution time became longer because we recreated the database connection during execution. Meanwhile, we can compare the command execution pressure of Redis

In the figure above, the thin and tall part is the pressure map without the sleep mechanism, and the short and fat part is the pressure map with the sleep mechanism. It can be seen from the figure above that the pressure is reduced by about 50%. Of course, there is another disadvantage of the sleep method, the QPS decreases obviously, which is only 35 under our pressure measurement condition, and some requests have timeout. However, we decided to use sleep, mainly to prevent Redis from crushing in the case of large concurrency. It’s not going to work. We’ve seen this before, so sleep is definitely going to work.

The article reprinted from CSDN:https://blog.csdn.net/ugg/article/details/41894947

The resources

www.worlduc.com/FileSystem/… Redis. IO/commands/se… www.blogjava.net/caojianhua/…

Distributed lock combat based on Redis

Related Posts

How many HTTP requests can you guess from a TCP connection?

Raft algorithm analysis and implementation

Basic use of the.NET Core command line