Click on the blue letters above to follow us

This is the second original article of white Mouse

This article contains 10,263 words. Estimated reading time: 26 minutes.

Writing in the front

Before we look at the implementation of distributed locks, we should first consider some of the issues that must be considered when using distributed locks.

  • Mutual exclusion: Only one process can hold the lock at any time.

  • Deadlock prevention: Even if a process crashes while holding a lock and cannot voluntarily release the lock, there must be other ways to release the lock so that other processes can acquire it.

  • Lock and unlock must be the same process.

  • Lock renewal issues.

Common distributed lock implementation schemes

  • Distributed lock based on Redis

  • Distributed lock based on Zookeeper

This paper adopts the first scheme, which is based on Redis distributed lock implementation scheme.

Redis implements distributed lock main steps

  1. Specify a key as the lock marker, store it in Redis, and specify a unique user id as value.

  2. The value can be set only when the key does not exist. This ensures that only one client process obtains the lock at a time, meeting the mutual exclusion feature.

  3. Set an expiration time to prevent the key from being deleted due to system exceptions.

  4. After services are processed, the key must be cleared to release the lock. The value of the key must be verified. Only the lock holder can release the lock

The above implementation steps take into account the mutual exclusion, deadlock prevention, lock and unlock must be the same process, but the lock renewal cannot be realized. Therefore, bloggers use Redisson to realize distributed lock of Redis. With the help of Redisson WatchDog mechanism, the problem of lock renewal can be solved well. Redisson is also an official recommended distributed lock implementation scheme of Redis, which is relatively simple to implement.

Redisson implements distributed locking

The specific implementation code has been uploaded to the blogger’s warehouse, and friends in need can reply [distributed lock code] to get the code cloud or GitHub project download address in the public number.

The following analyzes the underlying principle of Redisson’s distributed lock implementation from five aspects, including locking mechanism, lock mutually exclusive mechanism, Watch Dog mechanism, reentrant locking mechanism and lock release mechanism.

Lock principle

Locking is done using a Lua script like this:

1if (redis.call('exists', KEYS[1]) == 0) then " + 2 "redis.call('hincrby', KEYS[1], ARGV[2], 1); " + 3 "redis.call('pexpire', KEYS[1], ARGV[1]); " + 4 "return nil; " + 5 "end; " + 6"if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " + 7 "redis.call('hincrby', KEYS[1], ARGV[2], 1); " + 8 "redis.call('pexpire', KEYS[1], ARGV[1]); " + 9 "return nil; " +10 "end; " +11"return redis.call('pttl', KEYS[1]);" 12Copy the code

Here KEYS[1] represents the key that you lock, for example, the key that you set to lock is “myLock”.

1// create a lock2RLock lock = redisson.getLock("myLock");Copy the code

Here ARGV[1] represents the default lifetime of the lock key, which is 30 seconds. ARGV[2] represents the locked client ID, a string generated by UUID and threadId: 285475DA-9152-4c83-822A-67EE2f116a79:52. And the last one is for reentrant counting, which we’ll talk about.

Let’s look at the storage structure of locks in Redis:

1127.0.0.1:6379> HGETALL myLock21) "285475da-9152-4c83-822a-67ee2f116a79:52"32) "1"Copy the code

If exists myLock does not exist, a lua statement is used to determine whether a lock exists. How do you lock it? Use the hincrby command to set up a hash structure similar to the following in Redis:

1127.0.0.1:6379> HINCRBY myLock 285475da-9152-4c83-822a-67ee2f116a79:52 12(integer) 1Copy the code

The pexpire myLock 30000 command is executed to set the lifetime of the lock key myLock (default is 30 seconds).

What if a second client requests a lock? This is the lock mutex mechanism.

Lock mutually exclusive mechanism

At this point, what if client 2 tries to lock? First, the first if judgment executes exists myLock and finds that the lock key myLock already exists. And then the second if decides if the hash data structure for myLock lock key contains the ID of client 2, which it obviously doesn’t because it contains the ID of client 1. So, client 2 will execute:

1return redis.call('pttl', KEYS[1]);Copy the code

A number that represents the remaining lifetime of the lock key myLock.

Let’s take a look at the main process for Redissson to implement locking:

1@Override 2 public boolean tryLock(long waitTime, long leaseTime, TimeUnit unit) throws InterruptedException { 3 long time = unit.toMillis(waitTime); 4 long current = System.currentTimeMillis(); 5 long threadId = Thread.currentThread().getId(); Long TTL = tryAcquire(leaseTime, unit, threadId); 8 // lock acquired 9 if (ttl == null) {10 return true; 14 time -= system.currentTimemillis () -current; 15 if (time <= 0) {16 acquireFailed(threadId); 17 return false; 18 }1920 current = System.currentTimeMillis(); 2122 /**23 *2. Subscribe lock release event and block await lock release with await method, effectively solve the problem of waste resources of invalid lock application: The current thread subscribs to the lock release event via Redis's channel when the lock is held by another resource based on the amount of information. Once the lock is released, it sends a message notifying the waiting thread to compete. Unsubscribe and return failed to acquire lock.27 * When this.await returns true, 28 */29 RFuture<RedissonLockEntry> subscribe = subscribe(threadId); 30 // inside the await method we block with CountDownLatch and subscribe asynchronously (with Netty's Future) 31 if (! subscribeFuture.await(time, TimeUnit.MILLISECONDS)) {32 if (! subscribeFuture.cancel(false)) {33 subscribeFuture.onComplete((res, e) -> {34 if (e == null) {35 unsubscribe(subscribeFuture, threadId); 36}} 37); 38 }39 acquireFailed(threadId); 40 return false; 45}4243 try {44 // Calculate the total time to obtain the lock. If the time is greater than or equal to the maximum waiting time, the lock fails to be obtained. 46 if (time <= 0) {47 acquireFailed(threadId); 48 return false; 4950}5152 /**53 * 3. After receiving the lock release signal, within the maximum waiting time, try again and again to obtain the lock 54 * successful, immediately return true, 55 * If the lock has not been obtained within the maximum waiting time, it is considered to have failed to obtain the lock. 56 */57 while (true) {58 long currentTime = system.currentTimemillis (); TTL = tryAcquire(leaseTime, unit, threadId); 62 // lock acquired63 if (ttl == null) {64 return true; 67 time -= system.currentTimemillis () -currentTime; 68 if (time <= 0) {69 acquireFailed(threadId); 70 return false; CurrentTime = system.currentTimemillis (); currentTimeMillis(); 77 the if (TTL > = 0 && TTL (time) {/ / if the remaining time (TTL) less than 78 wait time, in the TTL time, from the Entry of the signal to obtain a license (unless is interrupted or there has been no available license). 79 getEntry(threadId).getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS); GetEntry (threadId).getlatch ().tryacquire (time, timeunit.milliseconds); 86 time -= system.currentTimemillis () -currentTime; 87 if (time <= 0) {88 acquireFailed(threadId); 89 return false; 90 }91 }92 } finally {93 // 7. Unsubscribe to unlock messages regardless of whether the lock is acquired. 95 }96// return get(tryLockAsync(waitTime, leaseTime, unit)); 97}Copy the code
  1. If null is returned, the lock is successfully locked. If a value is returned, the lock already exists. TTL is the remaining lifetime of the lock.

  2. If client 2 fails to acquire the lock at this point, then the thread ID of client 2 (essentially the process ID) is used to subscribe to the lock release event via Redis’s channel. If the lock release event is not notified while waiting and the lock fails to be acquired when the maximum waiting time is exceeded, return false (line 39). If you wait to be notified of a lock release event, you begin a cycle of retries to acquire the lock.

  3. Each time the loop tries to acquire the lock and get the remaining lifetime of the existing lock. If the lock was taken in the retry, it is returned directly. The implementation uses the Semaphore to block the thread if the lock is currently occupied and waiting for a lock release message. When the lock is released and the lock release message is released, the Semaphore release() method is called and a thread in the waiting queue blocked by the Semaphore can continue to try to acquire the lock.

There is one detail in the above process, and it is necessary to explain the key points of distributed locking:

When the lock is being occupied, the process waiting for the lock does not acquire the lock through a while(true) loop, but uses Redis’ publishing-subscription mechanism to block the process waiting for the lock through await method, which effectively solves the problem of wasting resources in invalid lock application.

Lock renewal mechanism

The default lifetime of the lock key on client 1 is 30 seconds. What if client 1 wants to keep the lock for longer than 30 seconds?

Redisson provides a renewal mechanism that starts a Watch Dog as soon as client 1 successfully locks.

1private <T> RFuture<Long> tryAcquireAsync(long leaseTime, TimeUnit unit, long threadId) { 2 if (leaseTime ! = -1) { 3 return tryLockInnerAsync(leaseTime, unit, threadId, RedisCommands.EVAL_LONG); 4 } 5 RFuture<Long> ttlRemainingFuture = tryLockInnerAsync(commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout(), TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG); 6 ttlRemainingFuture.onComplete((ttlRemaining, e) -> { 7 if (e ! = null) { 8 return; 9 }1011 // lock acquired12 if (ttlRemaining == null) {13 scheduleExpirationRenewal(threadId); 14}}); 16 return ttlRemainingFuture; 17}Copy the code

Lose,

From the above source code, we can see that leaseTime must be -1 before the Watch Dog mechanism can be enabled. That is, if you want to enable the Watch Dog mechanism, you must use the default lock time of 30s. If you set your own time, the lock will automatically release after this time, and will not be extended.

1private void scheduleExpirationRenewal(long threadId) { 2 ExpirationEntry entry = new ExpirationEntry(); 3 ExpirationEntry oldEntry = EXPIRATION_RENEWAL_MAP.putIfAbsent(getEntryName(), entry); 4 if (oldEntry ! = null) { 5 oldEntry.addThreadId(threadId); 6 } else { 7 entry.addThreadId(threadId); 8 renewExpiration(); 9 }10}1112protected RFuture<Boolean> renewExpirationAsync(long threadId) {13 return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,14 "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +15 "redis.call('pexpire', KEYS[1], ARGV[1]); " +16 "return 1; " +17 "end; " +18 "return 0;" ,19 Collections.<Object>singletonList(getName()),20 internalLockLeaseTime, getLockName(threadId)); 21}Copy the code

Watch Dog mechanism is actually a background timed task thread. After acquiring the lock successfully, the thread holding the lock will be put into a redissonLock. EXPIRATION_RENEWAL_MAP. Then check every 10 seconds (internalLockLeaseTime / 3) that if client 1 still holds the lock key, Is actually traversal EXPIRATION_RENEWAL_MAP inside thread id and then according to the thread id to check in Redis, if there will be the key to extend the time), then it will continue to extend survival time of lock key.

If the service is down, there will be no Watch Dog thread, and the expiration time of the key will not be extended. After 30 seconds, the key will automatically expire, and other threads can acquire the lock without permanently locking it.

Reentrant locking mechanism

Redisson also supports reentrant locking, such as code like this:

1@Override 2public void lock() { 3 RLock lock = redissonSingle.getLock("myLock"); 4 try { 5 lock.lock(); 6 7 // Execute business 8 doBusiness(); 910 lock.lock(); 1112 } catch (Exception e) {13 e.printStackTrace(); 14} finally {15 // unlock 16 lock.unlock(); 17 lock.unlock(); 18 Logger. info(" Task completed, release lock!" ); 19}} 20Copy the code

Let’s look at the lua code:

1if (redis.call('exists', KEYS[1]) == 0) then " + 2 "redis.call('hincrby', KEYS[1], ARGV[2], 1); " + 3 "redis.call('pexpire', KEYS[1], ARGV[1]); " + 4 "return nil; " + 5 "end; " + 6"if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " + 7 "redis.call('hincrby', KEYS[1], ARGV[2], 1); " + 8 "redis.call('pexpire', KEYS[1], ARGV[1]); " + 9 "return nil; " +10 "end; " +11"return redis.call('pttl', KEYS[1]);" 12Copy the code

The first if statement is definitely not true. Exists myLock indicates that the lock key already exists. The second if judgment is true, since the ID contained in the hash data structure of myLock is the ID of client 1, then the reentrant lock logic is executed, using: Hincrby myLock 285475da-9152-4c83-822a-67EE2f116a79:52 1 Increase the lock count of client 1 by 1. The myLock data structure now looks like this:

1127.0.0.1:6379> HGETALL myLock21) "285475da-9152-4c83-822a-67ee2f116a79:52"32) "2"Copy the code

The hash key is the name of the lock, the field is the ID of the client, and the value is the number of times the client locks.

Is it?

Here’s a question. If locking supports reentrant locking, what about unlocking?

Lock release mechanism

A distributed lock is released by lock.unlock(). Let’s look at the lock release process code:

1@Override 2public RFuture<Void> unlockAsync(long threadId) { 3 RPromise<Void> result = new RedissonPromise<Void>(); RFuture<Boolean> Future = unlockInnerAsync(threadId); 7 future.onComplete((opStatus, e) -> {8 cancelationRenewal (threadId); 910 if (e ! = null) {11 result.tryFailure(e); 12 return; 13 }1415 if (opStatus == null) {16 IllegalMonitorStateException cause = new IllegalMonitorStateException("attempt to unlock lock, not locked by current thread by node id: "17 + id + " thread-id: " + threadId); 18 result.tryFailure(cause); 19 return; 20 }2122 result.trySuccess(null); 23}); 2425 return result; 26}Copy the code

1protected RFuture<Boolean> unlockInnerAsync(long threadId) { 2 return commandExecutor.evalWriteAsync(getName(), 4 "if (redis. Call ('hexists', KEYS[1], ARGV[3]) == 0) then " + 5 "return nil;" + 6 "end; "+ 7 // Decrement the hash value of the lock to 0 before deleting it 8 // Publish an UNLOCK_MESSAGE 9 to the channel name redisson_lock__channel "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " +10 "if (counter > 0) then " +11 "redis.call('pexpire', KEYS[1], ARGV[2]); " +12 "return 0; " +13 "else " +14 "redis.call('del', KEYS[1]); " +15 "redis.call('publish', KEYS[2], ARGV[1]); " +16 "return 1; "+17 "end; " +18 "return nil;" ,19 Arrays.<Object>asList(getName(), getChannelName()), LockPubSub.UNLOCK_MESSAGE, internalLockLeaseTime, getLockName(threadId)); 20}Copy the code

From the above code, there are three main steps to release the lock:

  1. Delete the lock (note the reentrant lock, which is examined in detail in the script above).

  2. Broadcast a lock release message notifying the blocking waiting process (publish an UNLOCK_MESSAGE to the channel name redisson_lock__channel).

  3. Disable the Watch Dog mechanism, that is, delete the thread ID in redissonLock. EXPIRATION_RENEWAL_MAP and cancel the scheduled task thread of the Netty.

Scheme advantages

  1. Redisson solved the lock renewal problem well through the Watch Dog mechanism.

  2. Compared with Zookeeper, Redisson has higher performance based on Redis and is suitable for scenarios that require high performance.

  3. Redisson works a little better than the original SET MyLock userId NX PX milliseconds + lua implementation. While the basic principles are the same, it helps us mask the internal execution details.

  4. Some optimizations are also made in the process of waiting for lock application to reduce invalid lock application and improve resource utilization.

Solution disadvantages

  1. The biggest problem with Redisson is that if you lock a Redis Master instance, the Master will be copied asynchronously to its slave instance. However, once the Master breaks down, the slave becomes the Master during the Master/slave switchover. Then, when client 2 tries to lock, it completes the lock on the new Master, and client 1 thinks it has successfully added the lock. This will cause multiple clients to complete the lock on a distributed lock, and the system will definitely have problems in business semantics, resulting in the production of various dirty data. Therefore, this is the biggest defect of Redis distributed lock caused by Redis Cluster or Master Slave asynchronous replication of Redis master-slave architecture (when the Redis Master instance goes down, it may cause multiple clients to complete locking at the same time).

  2. There are some opinions that using the Watch Dog mechanism to open a timed thread to continuously extend the lock time will cause loss to the system (this is just a saying on the network, the blogger has checked a lot of information and does not think there is a great loss in the system based on the actual production, this is just for your reference).

conclusion

The above is based on the analysis of all principles of Redis using Redisson to achieve distributed lock, hoping to help friends deepen their understanding of distributed lock. In fact, after the analysis of the source code found that based on Redis own manual implementation of a simple version of the distributed lock tool is not very difficult, interested partners can try.

reference

  • https://juejin.cn/post/6844904110295089160#heading-6

  • https://juejin.cn/post/6844904045899939853#heading-0

  • https://blog.csdn.net/tianyaleixiaowu/article/details/96112684

  • https://www.cnblogs.com/qdhxhz/p/11046905.html

If you want to follow my updated articles and the dry goods I share in real time, you can follow my official account we are all guinea pigs. There are some organized original high-quality brain maps in the public account, which not only contain the knowledge of technical points, but also comb out more underlying principles. You can get them by responding to “technical brain maps”, “interview questions” and “development tools” respectively in the public account.

New | interview often ask Java virtual during data area

Thank you for your attention, if you like, you can click on the bottom right to see, and you are welcome to share this article with more friends, thank you!

Make progress every day!!

In the early 2020.04.13