This distributed lock interpretation, can be regarded as the second?

Hi, I’m Hei, long time no see. Today we have a good talk about some concepts and implementation of distributed lock. Yes, You can think of ZooKeeper. Redis has it

preface

Those of you who know something about multithreading are generally familiar with the concept of locking.

In multi-threaded concurrent scenarios, locking is usually required to ensure that only one thread can operate a service, data, or variable at the same time. Such as synchronized or Lock.

With the evolution of architecture and business development, our applications are often not deployed on a single server, but in a distributed cluster architecture, where multiple applications are the same.

For example, when an e-commerce website sells goods, the quantity of goods is limited, and each user purchases one product, the inventory of goods needs to be reduced by 1.

But let’s think for a moment, in the “Double 11” such a popular activity, there may be a large number of users to buy the same product, and at the same time, the inventory of the product is reduced by 1, if not locked, it is very likely to cause overselling of the product.

To ensure the security and consistency of shared data in this distributed scenario, distributed locks are required. The commodity inventory in the above example is shared data.

What is distributed locking

As the name implies, distributed locking ensures that only one thread of an application can operate shared data at a time in a distributed scenario. Used to ensure the security and consistency of shared data.

What requirements should a distributed lock meet

Now let’s analyze, what are the requirements that we need to implement a distributed lock?

First and most fundamentally, we want to ensure that only one thread of an application can perform a lock method at a time, or acquire a lock; (One application thread executes)
Then our distributed lock may have many servers to acquire, so we must be able to high-performance acquisition and release; (High performance)
Services obtained by a distributed lock cannot be unavailable so that all services cannot obtain or release locks. Therefore, high availability requirements must be met. (High availability)
If an application does not release the lock after obtaining it, the service itself may be suspended and cannot be released for a long time. As a result, other services cannot obtain the lock. (Lock failure mechanism to prevent deadlocks)
If an application successfully acquires the lock, it can also successfully acquire the lock again. (Reentrancy)
When a service comes to acquire a lock, assuming that the lock has already been acquired by another service, we need to be able to return a failure without waiting forever. (Non-blocking characteristics)

These are some of the basic requirements that all distributed locks must meet.

What are the implementation methods

So what approach can we take to implement distributed locks? At present, there are three common methods:

Database based implementation
Based on theZooKeeperimplementation
Based on theRedisimplementation

Let’s take a look at how each of the three approaches implements distributed locking.

Database-based

There are two ways to implement distributed locks using a database.

The first is based on database table implementation.

For example, we have the following table to store distributed lock records:

CREATE TABLE `methodLock` ( 
    `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'primary key',  
    `method_name` varchar(64) NOT NULL COMMENT 'Locked method name',
    `desc` varchar(1024) NOT NULL DEFAULT 'Remarks',
    `update_time` timestamp NOT NULL DEFAULT now() ON UPDATE now() COMMENT 'Save time'.PRIMARY KEY (`id`),
    UNIQUE KEY `uidx_method_name` (`method_name `)
) ENGINE=InnoDB COMMENT='Distributed Locking approach';
Copy the code

When one of our services wants to execute a method that requires distributed locking, it executes an insert statement, inserting a record into the table.

insert into methodLock(method_name,desc) values ('saleProduct'.'Sell products to reduce inventory');
Copy the code

Because we added a unique constraint to method_name when we defined the table, the database is guaranteed that only one service will succeed if multiple services perform this operation when we insert the record, and we assume that only the successfully inserted service has acquired the lock and can continue executing the method.

When the method completes and the lock needs to be released, a delete statement is executed to delete the inserted record from the table.

delete from methodLock where method_name = 'saleProduct';
Copy the code

The other is database – based exclusive locking.

In addition to the above by the way of insertion deletion, the use of exclusive lock to achieve distributed lock. We can use the table in the above method to insert a record in advance for the distributed method.

insert into methodLock(method_name,desc) values ('saleProduct'.'Sell products to reduce inventory');
Copy the code

When an application thread needs to lock this method, the following statement is used:

select method_name from methodLock where method_name = 'saleProduct' for update
Copy the code

When for UPDATE is used after a query statement, the database adds an exclusive lock to the record during the query, and no other thread can add a lock to the record.

We can assume that when the data is queried, the distributed lock is acquired and the logic in the method is then executed. After the method completes execution, the transaction is committed and the lock is automatically released.

Of course, instead of creating a table to insert data, you can simply lock the data to be distributed.

For example, if we want to operate the commodity inventory, there will be a table in the database, such as: T_product_quantity, before we reduce the inventory of a commodity, we can query the record through the following SQL:

select product_no,quantity where product_no = xxx for update
Copy the code

This query also adds an exclusive lock to the inventory record for the product, after which no other thread can lock the record.

After the transaction is committed, the lock will be released automatically. If an exception occurs during the operation, the transaction is rolled back and the lock is automatically released.

Using the above database based distributed lock approach is relatively simple. But let’s look back and see if this approach meets the requirements that we listed above for distributed locks.

Heavy_check_mark :heavy_check_mark:
High performance & high availability
- Because it is implemented based on the database, high performance and high availability depend on the database, requiring multi-machine deployment, master/slave synchronization, and master/standby switchover.
The failure mechanism
- Manual deletion is required, and no invalid mechanism is available. If you want to support the invalidation mechanism, you need to add scheduled tasks that are periodically deleted based on the update time of records.
reentrancy
- None because the lock record persists after a thread has successfully acquired the lock.
- You can add fields to record the information of application nodes and threads that possess the lock, and determine whether the lock obtained by the current thread is reentrant when acquiring the lock again.
Non-blocking characteristic
- If yes, the system returns failure if the lock fails to be obtained.
- However, timeout acquisition scenarios cannot be met. For example, the lock cannot be obtained within 5 seconds and then fails.

It can be found that although this method can meet the basic distributed locking capability, it still needs to be optimized for some problems in actual use, which will become more and more complex and have certain performance problems. Therefore, database – based distributed locking is generally not recommended.

Based on a ZooKeeper

Distributed locks can also be implemented based on ZooKeeper. Here we need to lay down some basic knowledge of ZK. In ZK, data is stored in data nodes called Znode, which stores all data in memory. The data model formed by all data is a Znode Tree. Nodes of different levels are divided by slash “/”, such as /zoo/cat, similar to file system structure.

ZNode

Data nodes in ZK are divided into the following four types:

Persistent node

A persistent node is the default node type of ZK. After a node is created, it persists regardless of whether the client is disconnected from the server.

Temporary node

The temporary node is deleted after the client is disconnected from the server.

Order node

As the name implies, sequential nodes have a sequence, and when a node is created, ZK assigns each node a sequential number based on when it was created.

Temporary sequence node

Temporary sequential nodes are a combination of temporary and sequential nodes. Each node is created with an order number and is deleted when the client is disconnected from the ZK server.

ZK distributed lock implementation principle

There is no API in ZK similar to Lock or Synchronized, which relies on temporary sequential nodes to implement distributed locking.

Acquiring a lock

You first need to create a persistent node in ZKParentLockRepresents a distributed lock node.
When the first client comes to get the lock, it’s right hereParentLockCreates a sequential temporary node under the001-Node, and then check all temporary sequential nodes under ParentLock to determine whether the current node is the first node. If so, the lock is successful.

When the second client then comes to acquire the lock, a sequential temporary node is also created under the ParentLock node002-Node“And then decide if you’re number one, because that’s number one001-NodeSo this is going to be ahead of you001-NodeSign up for aWatcherFor listening in001-NodeNode, the client fails to lock and enters the waiting state.

When there’s a third client, it’s the same thing because it’s newly created003-NodeIf it’s not in the first place, it goes to the 0 in front of it02-NodeSign up for aWatcherAnd so on.

Notice that there is a chain structure, similar to the AQS structure in JUC.

Release the lock

There are two scenarios for releasing locks. One is that the locks are released after services are processed. Another is when the client disconnects from the server.

First, the client explicitly deletes the data node in the ZK when it is released normally. For example, Client 1 deletes 001-node when service processing is complete.

The disconnection between the client and the server may occur after the client successfully obtains the lock, and an exception occurs during execution, or the application crashes, or the network is abnormal. In this case, ZK will automatically delete the corresponding Node Node.

After the 001-node is deleted, Client 2 receives a notification immediately. In this case, Client 2 checks the Node list again to determine whether it is the first one. If it is the first one, Client 2 holds the lock, indicating that the lock is successfully added.

After Client 2 releases the lock, Client 3 does the same.

These are the basic principles and procedures for implementing distributed locks using ZooKeeper. The overall process can be simplified as shown below.

ZkClient: ZkKeeper-3.4.6. jar: zkclient-0.1.jar: ZkClient-0.1. jar: ZkClient-0.1. jar: ZkClient-0.1. jar: ZkClient-0.1. jar: ZkClient-0.1. jar

You can also use third-party packaged toolkits such as Curator, Menagerie, etc.

From the above we can see that using ZooKeeper to achieve distributed lock can basically meet all our requirements for distributed lock. It is important to pay attention to that sequential temporary nodes must be used instead of temporary nodes, because the use of temporary nodes may cause herd effect.

Based on the Redis

Using Redis for distributed locks is also an especially common option. And there are multiple ways to do it. So let’s go through them one by one.

First: SETNX+EXPIRE

This approach is probably the first thing most friends would think of, obtaining the lock through SETNX and then adding the timeout through the EXPIRE command. A big problem with this approach is that the two commands are not atomic operations and need to interact with Redis twice. The client may hang after the first command is executed, resulting in no timeout period, and the lock will remain there forever.

To solve this problem, a second solution was born.

Second: SETNX+VALUE

In this way, the VALUE is stored in the expiration time calculated by the client, which is stored in Redis once by using the SETNX command.

public boolean getLock(String key,Long expireTime){
    long expireTime = System.currentTimeMills()+expireTime;
    String value = String.valueOf(expireTime);
    // Lock succeeded
    if(jedis.setnx(key,value)==1) {return true;
    }
    // Get the lock value
    String currentValueStr = jedis.get(key);
    // If the expiration time is shorter than the system time, the system time has expired
    if(currentValueStr ! =null && Long.parseLong(currentValueStr) < System.currentTimeMillis()) {
        // The lock has expired. Get the expiration time of the previous lock and set the expiration time of the current lock
        String oldValueStr = jedis.getSet(key_resource_id, expiresStr);
        if(oldValueStr ! =null && oldValueStr.equals(currentValueStr)) {
            // In the case of multi-threaded concurrency, a thread can be locked only if its value is the same as the current value
            return true; }}// In other cases, lock failure is returned
    return false;
}
Copy the code

This approach resolves the non-atomic problem of the first scheme by assigning the timeout time to value. But there are problems with this approach:

When the lock expires, if multiple threads try to lock it at the same time, the lock may be successfully locked by all threads.
If multiple threads are locked successfully, they can be unlocked because there is no thread identifier in the lock.
The timeout period is calculated on the client. Clocks on different clients may be different. As a result, no lock has timed out on the locking client, but the lock has timed out on the other client.

Third: Use Lua scripts

Also to solve the atomicity problem in the first scenario, we can use Lua scripts to ensure the atomicity of SETNX+EXPIRE operations.

if redis.call('setnx',KEYS[1],ARGV[1= =])1 then
    redis.call('expire',KEYS[1],ARGV[2])
else
    return 0
end;
Copy the code

In Java code, locking is performed using jedis.eval().

public boolean getLock(String key,String value){
    String lua_scripts = "if redis.call('setnx',KEYS[1],ARGV[1]) == 1 "
        + "then redis.call('expire',KEYS[1],ARGV[2]) return 1 else return 0 end";   
    Object result = jedis.eval(lua_scripts,Collections.singletonList(key),Collections.singletonList(value));
    return result.equals(1L);
}
Copy the code

This approach completely avoids the problem of not setting the timeout period for interrupts after locking. There are also no clock inconsistencies and high concurrency cases where multiple threads are locked. But is there nothing wrong with this approach? The answer is no, see the chart below.

When service A is successfully locked and in the process of executing services, the lock expires. In this case, service A does not feel that the lock expires.

Service B then acquires the lock and succeeds;

Then, service A finishes processing, releases the lock, successfully releases it, and service B thinks it still has the lock and executes the code.

Is it all messed up? Think you locked it, but you didn’t.

Thought he unlocked successfully, in fact, the solution is someone else’s lock;

The problems of this scheme are mainly caused by two points: the lock expires and is released, and the business is not finished; Locks do not have unique identifiers.

SET NX PX EX + unique identifier

To solve the problem of deleting a lock by mistake, the client can generate a unique ID as a value in the lock when adding the lock, and determine the identity before deleting the lock. The locking logic is as follows:

public boolean getLock(String key,String uniId,Long expireTime){
    / / lock
    return jedis.set(key, uniId, "NX"."EX", expireTime) == 1;
}

/ / unlock
public boolean releaseLock(String key,String uniId){
    // Since get and del operations are not atomic, lua scripts are used
    String lua_script = "if redis.call('get',KEYS[1]) == ARGV[1] then return redis.call('del',KEYS[1]) +"else return 0  end;"; Object result = jedis.eval(lua_scripts,Collections.singletonList(key),Collections.singletonList(uniId)); return result.equals(1L); }Copy the code

In this way, the lock is deleted by mistake. However, the lock expires due to timeout, but the service is not finished.

Fifth: Redission framework

So how to deal with the problem that the lock expires and the business is not finished?

After the lock is successfully added, a daemon thread can be started and the timeout period of the lock can be extended until the service processing is complete. The lock can be released to prevent the lock from being released before the service processing is complete.

The Redission framework uses this mechanism to solve this problem.

When a thread acquires a lock, it has saved the data in Redis with the Lua script if the lock is successful.

Then, when the lock is successfully added, the Watch Dog starts to check whether the lock is still held every 10 seconds. If so, the lock timeout period will be extended.

If the lock fails to be acquired initially, it is acquired in a loop.

Is there any problem with the Redis lock? That’s a little small, my friend. There’s still a problem.

These schemes are only discussed in Redis single machine mode, if Redis is a cluster mode, there will be some problems, but the problem is not very outrageous, we will briefly explain.

In clustered mode, the Master node will synchronize data to the Salve node. If we successfully lock the Master node, the Master node will hang before the synchronization to the Salve node, and then another Salve node will be upgraded to the Master node. There is no lock data on this node;

When another client thread attempts to acquire the same lock, it succeeds. In our application, two threads will acquire the lock at the same time, and the lock will be unsafe.

In order to solve this problem, the author of Redis came up with an advanced distributed lock algorithm, called RedLock.

Sixth: RedLock+Redission

Redis Distributed Lock Redis Distributed Lock Redis Distributed Lock Redis Distributed Lock Redis Distributed Lock

The core principles of RedLock are as follows:

Select multiple Master nodes in the Redis cluster to ensure that these Master nodes do not break down at the same time.
In addition, each Master node is independent from each other and data is not synchronized.
Lock and unlock using the same method as Redis singleton.

So how does RedLock ensure safety in the event of a node failure?

Assume there are N Master nodes in the cluster. First, obtain the current timestamp.
The client obtains locks by using the same keys and values in sequence, and the acquisition time should be shorter than the lock timeout time. For example, if the timeout period is 5s, the lock acquisition time is 1s at most. If the timeout period exceeds 1s, the lock is abandoned and the next lock is acquired.
The client subtracts the timestamp of the first step after acquiring all available locks. The time difference must be less than the lock timeout time, and at least N/2 + 1 nodes must be successfully acquired, which indicates that the lock is successfully acquired. Otherwise, the lock fails to be acquired.
If the lock is successfully acquired, the lock is valid for the original timeout time minus the third time difference;
If the lock fails to be obtained, all nodes must be unlocked, regardless of whether the lock is successfully applied to the node to prevent any fish from escaping the net.

The Redssion library implements the RedLock solution, so if your Redis is clustered, take a look at how to use it.

IO /topics/dist…

With the above six Redis-based approaches, how do we choose? The following principles can be followed:

If Redis is deployed on a single machine, Redission framework is used. The Watch Dog mechanism can be enabled according to the scenario when Redis is locked. Unlock using Lua script atom deletion;
In cluster deployment, RedLock is recommended.

conclusion

This content is mainly to explain the principle and implementation scheme of distributed lock, is based on the database, the ZooKeeper and Redis three options, and that there are different choices of different features and its problems, and hope that through this article can let you to distributed lock has a more comprehensive understanding, in the actual development process to do “, if only one keeps the color theme in mind Don’t panic.

I am xiao Hei, a programmer in the Internet “casual”. Public number [Xiaohei said Java]

Original is not easy, need a little positive feedback, like + favorites + attention, three even go a wave ~ ❤