Why distributed locks?

Before we get to that, let’s look at a business scenario:

System A is an e-commerce system, which is currently deployed on A machine. There is an interface for users to place orders in the system, but before placing orders, users must check the inventory to ensure that the inventory is enough before placing orders to users.

Because the system has certain concurrency, the inventory of goods will be stored in Redis in advance, and the inventory of redis will be updated when users place orders.

The system architecture is as follows:

However, there is a problem with this: if at some point the inventory of an item in Redis is 1, two requests come in at the same time, one of which goes to step 3 in the figure above and updates the inventory of the database to 0, but step 4 has not yet been executed.

The other request reaches step 2, finds that the inventory is still 1, and continues with step 3.

As a result, 2 items are sold when there is only 1 item in stock.

Obviously not! This is a classic oversold inventory problem

At this point, it’s easy to think of a solution: lock steps 2, 3, and 4 so that when they’re done, another thread can come in and execute step 2.

As shown above, use the Java-provided synchronized or ReentrantLock to hold the lock during step 2 and release the lock after step 4.

Thus, steps 2, 3, and 4 are “locked” and can only be executed sequentially between threads. Follow the public account Internet architect, reply keyword 2T, get the latest architecture video

But the good times did not last long, the whole system concurrency soared, a machine can not bear. Now we need to add a machine as shown below:

After adding machines, the system looks like the one above. Oh my God!

Suppose that the requests from two users come at the same time, but fall on different machines, then can these two requests be executed at the same time, or there will be the problem of overselling inventory.

Why is that? Because the two A systems in the figure above are running on two different JVMS, the locks they add are valid only for threads in their own JVMS, not for threads in other JVMS.

Therefore, the problem here is that the native locking mechanism provided by Java fails in multi-machine deployment scenarios

This is because the locks on both machines are not the same (they are in different JVMS).

So, we just have to make sure that the locks on both machines are the same, and the problem is solved, right?

At this point, the distributed lock grand stage, distributed lock idea is:

Provide a global and unique lock acquisition “thing” in the whole system, and then each system needs to add a lock, to ask this “thing” to get a lock, so that different systems can be considered the same lock.

This “thing” could be Redis, Zookeeper, or a database.

Text description is not very intuitive, let’s look at the picture below:

Through the above analysis, we know that the thread safety cannot be guaranteed by using Java native locking mechanism in the case of distributed deployment system, so we need to use distributed locking scheme.

So how do you implement distributed locks? Keep reading!

Distributed lock based on Redis

The above analysis of why to use distributed lock, here we specifically look at the distributed lock landing should be how to handle. Extension: How does Redisson implement distributed locking?

One of the most common solutions is to use Redis for distributed locking

The idea of using Redis for distributed locking is as follows: set a value in Redis to indicate that a lock has been added, and then delete the key when the lock is released.

The code looks like this:

SET anyLock unique_value NX PX 30000 // Release lock: By executing a lua script // releasing the lock involves two instructions that are not atomic // using Redis lua script support features, If redis. Call ("get",KEYS[1]) == ARGV[1] then return Redis. Call ("del",KEYS[1]) else return 0 endCopy the code

This approach has several main points:

  • Be sure to use the SET key value NX PX milliseconds command

    If not, set the value first and then set the expiration time. This is not an atomic operation. It may break down before setting the expiration time, causing deadlock (key permanence).

  • Value must be unique

    When unlocking the key, verify that the value is the same as that of the lock before deleting the key.

    This is to avoid A situation: suppose A has acquired the lock, the expiration time is 30 seconds, then 35 seconds later, the lock has been automatically released, A to release the lock, but then B may have acquired the lock. Client A cannot delete client B’s lock.

In addition to thinking about how the client will implement distributed locking, there are also redis deployment issues to consider.

Redis can be deployed in three ways:

  • Stand-alone mode
  • Master-slave + Sentinel election mode
  • Redis cluster mode

The disadvantage of using Redis for distributed locking is that there is a single point of problem if the redis fails. Locking doesn’t work.

In the master-slave mode, only one node is locked during the locking process. Even if sentinel is used for high availability, if the master node fails and the master/slave switchover occurs, the lock loss may occur.

Based on the above considerations, the author of Redis also considered this problem. He proposed a RedLock algorithm, which means something like this:

Assume the deployment mode of Redis is Redis Cluster, with a total of 5 master nodes, obtain a lock by the following steps:

  • Gets the current timestamp in milliseconds
  • Take turns trying to create locks on each master node with short expiration times, typically tens of milliseconds
  • Try to create a lock on most nodes, e.g., 3 nodes for 5 (n / 2 +1)
  • The client calculates the lock establishment time. If the lock establishment time is shorter than the timeout period, the lock establishment is successful
  • If the lock fails, then the lock is removed in turn
  • Whenever someone else sets up a distributed lock, you have to keep polling to try and get it

But this algorithm is quite controversial, there may be a lot of problems, can not guarantee the lock process must be correct.

Another way: Redisson

In addition, Redis distributed lock, in addition to their own implementation based on Redis client native API, you can also use the open source framework: Redission

Redisson is an enterprise-level open source Redis Client that also provides distributed lock support. I highly recommend it. Why?

Recall from above that if you write code to set a value via Redis, you do so with the following command.

  • SET anyLock unique_value NX PX 30000

The timeout is set to 30 seconds. If I do not complete the business logic after 30 seconds, the key will expire and other threads may acquire the lock.

In this way, the second thread will come in before the first thread has finished executing the business logic. Therefore, we still need to maintain the expiration time, which is too troublesome

How does Redisson do that? Feel the thrill of using Redission:

Config config = new Config();
config.useClusterServers()
.addNodeAddress("Redis: / / 192.168.31.101:7001")
.addNodeAddress("Redis: / / 192.168.31.101:7002")
.addNodeAddress("Redis: / / 192.168.31.101:7003")
.addNodeAddress("Redis: / / 192.168.31.102:7001")
.addNodeAddress("Redis: / / 192.168.31.102:7002")
.addNodeAddress("Redis: / / 192.168.31.102:7003");
 
RedissonClient redisson = Redisson.create(config);
 
 
RLock lock = redisson.getLock("anyLock");
lock.lock();
lock.unlock();
Copy the code

It’s as simple as that, we just need to use the API lock and unlock to complete the distributed lock, it helps us to consider a lot of details:

  • All redisson instructions are executed through Lua scripts, and Redis supports atomic execution of Lua scripts

  • Redisson sets the default expiration time of a key to 30s. What if a client holds a lock for more than 30s?

    Redisson has the concept of a watchdog, which sets the key timeout to 30 seconds every 10 seconds after you acquire the lock

    This way, if the lock is held all the time, the key will not expire and another thread will acquire the lock.

  • Redisson’s “watchdog” logic ensures that no deadlocks occur.

    (If the machine goes down, so does the watchdog. The lock will expire automatically after 30 seconds, and other threads can acquire the lock.

Here is the implementation code:

// Lock logic
private <T> RFuture<Long> tryAcquireAsync(long leaseTime, TimeUnit unit, final long threadId) {
    if(leaseTime ! = -1) {
        return tryLockInnerAsync(leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
    }
    // Call a Lua script to set some keys and expiration times
    RFuture<Long> ttlRemainingFuture = tryLockInnerAsync(commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout(), TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);
    ttlRemainingFuture.addListener(new FutureListener<Long>() {
        @Override
        public void operationComplete(Future<Long> future) throws Exception {
            if(! future.isSuccess()) {return;
            }
 
            Long ttlRemaining = future.getNow();
            // lock acquired
            if (ttlRemaining == null) {
                // Watchdog logicscheduleExpirationRenewal(threadId); }}});return ttlRemainingFuture;
}
 
 
<T> RFuture<T> tryLockInnerAsync(long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
    internalLockLeaseTime = unit.toMillis(leaseTime);
 
    return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, command,
              "if (redis.call('exists', KEYS[1]) == 0) then " +
                  "redis.call('hset', KEYS[1], ARGV[2], 1); " +
                  "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                  "return nil; " +
              "end; " +
              "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                  "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                  "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                  "return nil; " +
              "end; " +
              "return redis.call('pttl', KEYS[1]);",
                Collections.<Object>singletonList(getName()), internalLockLeaseTime, getLockName(threadId));
}
 
 
 
// The watchdog will eventually call here
private void scheduleExpirationRenewal(final long threadId) {
    if (expirationRenewalMap.containsKey(getEntryName())) {
        return;
    }
 
    // This task will be delayed for 10 seconds
    Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
        @Override
        public void run(Timeout timeout) throws Exception {
 
            // This operation will reset the expiration time of the key to 30s
            RFuture<Boolean> future = renewExpirationAsync(threadId);
 
            future.addListener(new FutureListener<Boolean>() {
                @Override
                public void operationComplete(Future<Boolean> future) throws Exception {
                    expirationRenewalMap.remove(getEntryName());
                    if(! future.isSuccess()) { log.error("Can't update lock " + getName() + " expiration", future.cause());
                        return;
                    }
 
                    if (future.getNow()) {
                        // reschedule itself
                        // Call this method recursively, extending the expiration time in an infinite loopscheduleExpirationRenewal(threadId); }}}); } }, internalLockLeaseTime /3, TimeUnit.MILLISECONDS);
 
    if (expirationRenewalMap.putIfAbsent(getEntryName(), newExpirationEntry(threadId, task)) ! =null) { task.cancel(); }}Copy the code
RedissonClient redisson = redisson.create (config); RedissonClient redisson.create (config); RLock lock1 = redisson.getFairLock("lock1"); RLock lock2 = redisson.getFairLock("lock2"); RLock lock3 = redisson.getFairLock("lock3"); RedissonRedLock multiLock = new RedissonRedLock(lock1, lock2, lock3); multiLock.lock(); multiLock.unlock(); Summary: This section analyzes the specific landing scheme of using Redis as a distributed lock and some of its limitations. Then it introduces a Redis client framework redisson, which I recommend everyone to use, which will be much less care than writing code.Copy the code

Distributed lock based on ZooKeeper

Common distributed lock implementation scheme, in addition to the use of Redis to achieve, the use of ZooKeeper can also achieve distributed lock.

Before introducing zooKeeper’s mechanism for implementing distributed locks, let’s take a quick look at what ZK is:

Zookeeper is a centralized service that provides configuration management, distributed collaboration, and naming.

The zK model looks like this: ZK consists of a series of nodes, called ZNodes, which act like a file system. Each Znode represents a directory, and zNode has several features:

  • Ordered nodes: If we have a parent node that is /lock, we can create children under that parent node.

    Zookeeper provides an optional order feature. For example, you can create child nodes “/lock/node-” and specify order. Then, when ZooKeeper generates child nodes, it automatically adds integer numbers based on the current number of child nodes

    That is, if it is the first child node created, the child node is /lock/node-0000000000, the next node is /lock/ Node-0000000001, and so on.

  • Temporary node: A client can create a temporary node. After a session ends or times out, ZooKeeper automatically deletes this node.

  • Event monitoring: When reading data, we can set event monitoring on the node at the same time. When the node data or structure changes, ZooKeeper will notify the client. Zookeeper has the following four events:

    • Node to create
    • The node to delete
    • Modifying Node data
    • Child node change

Based on some of the above ZK features, we can easily come to the implementation of distributed lock using ZK landing scheme:

  1. With zK’s temporary and ordered nodes, each thread acquiring the lock creates a temporary ordered node in ZK, such as under /lock/.

  2. After the node is successfully created, obtain all temporary nodes in the /lock directory, and then determine whether the node created by the current thread is the node with the smallest serial number of all nodes

  3. If the node created by the current thread is the node with the smallest node serial number, the lock is considered successful.

  4. If the node created by the current thread is not the node with the lowest node number, an event listener is added to the node that precedes the node number.

    Such as the current thread to get to the node number, for the lock / / 003, then to list all of the nodes [/ lock / 001, the lock / 002 /, / lock / 003], for the lock / 002 / this node to add an event listener.

If the lock is released, it wakes up the node with the next ordinal number, and then performs step 3 again to determine whether it has the smallest node number.

For example,/lock/ 001 is released,/lock/ 002 monitors the time, and the node set is [/lock/002,/lock/003], then /lock/002 is the minimum serial number node, and the lock is obtained.

The whole process is as follows:

The specific implementation idea is like this, as for how to write the code, here is more complex will not post out.

Curator introduce

Curator is an open source zooKeeper client that also provides an implementation of distributed locks.

The way he uses it is simple:

InterProcessMutex interProcessMutex = new InterProcessMutex(client,"/anyLock");
interProcessMutex.acquire();
interProcessMutex.release();
Copy the code

The core source code of the distributed lock is as follows:

private boolean internalLockLoop(long startMillis, Long millisToWait, String ourPath) throws Exception
{
    boolean  haveTheLock = false;
    boolean  doDelete = false;
    try {
        if( revocable.get() ! =null ) {
            client.getData().usingWatcher(revocableWatcher).forPath(ourPath);
        }
 
        while( (client.getState() == CuratorFrameworkState.STARTED) && ! haveTheLock ) {// Get the sorted set of all current nodes
            List<String>        children = getSortedChildren();
            // Get the name of the current node
            String              sequenceNodeName = ourPath.substring(basePath.length() + 1); // +1 to include the slash
            // Determine whether the current node is the smallest node
            PredicateResults    predicateResults = driver.getsTheLock(client, children, sequenceNodeName, maxLeases);
            if ( predicateResults.getsTheLock() ) {
                // Get the lock
                haveTheLock = true;
            } else {
                Register a listener on the previous node of the current node
                String  previousSequencePath = basePath + "/" + predicateResults.getPathToWatch();
                synchronized(this){
                    Stat stat = client.checkExists().usingWatcher(watcher).forPath(previousSequencePath);
                    if( stat ! =null) {if( millisToWait ! =null ){
                            millisToWait -= (System.currentTimeMillis() - startMillis);
                            startMillis = System.currentTimeMillis();
                            if ( millisToWait <= 0 ){
                                doDelete = true;    // timed out - delete our node
                                break;
                            }
                            wait(millisToWait);
                        }else{ wait(); }}}// else it may have been deleted (i.e. lock released). Try to acquire again}}}catch ( Exception e ) {
        doDelete = true;
        throw e;
    } finally{
        if( doDelete ){ deleteOurPath(ourPath); }}return haveTheLock;
}
Copy the code

In fact, the underlying principle of distributed locking implemented by curator is similar to that analyzed above. Here we use a diagram to illustrate the principle in detail:

Summary:

This section introduces the distributed lock scheme implemented by Zookeeperr and the basic use of zK open source client, and briefly introduces its implementation principle.

Compare the advantages and disadvantages of the two schemes

After learning the two distributed lock implementations, this section discusses the pros and cons of redis and ZK implementations.

Redis distributed locks have the following disadvantages:

  • It can obtain the lock in a simple and crude way. If it cannot obtain the lock, it tries to obtain the lock continuously, which consumes performance.
  • On the other hand, the design orientation of Redis determines that its data is not highly consistent, which may cause problems in some extreme cases. The lock model is not robust enough
  • Even if the redlock algorithm is used for implementation, in some complex scenarios, it cannot be guaranteed that the implementation is 100% trouble-free. For discussion on Redlock, see How to do distributed locking
  • Redis distributed lock, in fact, you need to constantly try to obtain the lock, compared to consume performance.

On the other hand, distributed locks using Redis are common in many enterprises, and for the most part, they don’t have “extremely complex scenarios.”

Therefore, using Redis as a distributed lock is also a good solution. The most important point is that Redis has high performance and can support highly concurrent lock acquisition and release operations.

For ZK distributed locks:

  • Zookeeper is designed for distributed coordination and consistency. The lock model is robust, easy to use and suitable for distributed locking.
  • If you can’t get the lock, you just need to add a listener instead of polling all the time.

However, ZK also has its disadvantages: if more clients frequently apply for locks and release locks, the pressure on the ZK cluster will be greater.

Summary:

To sum up, both Redis and ZooKeeper have their advantages and disadvantages. When we do technology selection can be based on these problems as a reference factor.

advice

From the previous analysis, there are two common solutions for implementing distributed locks: Redis and ZooKeeper, each of which has its own advantages. How to choose?

Personally, I prefer the lock implemented by ZK:

Because Redis may have hidden dangers, which may lead to incorrect data. However, how to choose depends on the specific scene in the company.

If there is zK cluster condition in the company, ZK is preferred. But if there is only Redis cluster in the company, there is no condition to build ZK cluster.

In fact, it is also possible to use Redis to achieve, in addition, the system designer may consider the system has Redis, but do not want to introduce some external dependencies again, can choose Redis.

This is an architecture-based consideration for the system designer