I want to go to giant, "killing serial distributed lock 9 q | Dali version of the people in the town

To get started, this is a technical article, but yesterday was so disgusting that I can’t help but briefly mention what happened yesterday.

I flew to Dali at 11 o ‘clock yesterday morning, but when I was about to go out at 9 o ‘clock, I found that the combination lock was broken and could be opened without the password. While the driver was urging me to go, I called the landlord and customer service to find someone to repair it, which was the first one.

Then the plane was late, and it flew from 11:20 to 4 o ‘clock before landing. During the descent, it was called a jolt, and I thought it would all disappear. This was also the first time that I got dizzy in the plane, and I almost threw up.

Then almost 4 o ‘clock, the plane was finally about to land, the wheels are almost the ground, the result is to pull up and take off, finally know is Dali force 8 gale, the captain dare not land… This is number three.

Don’t know when the final notice, to Dali there such as notice, there is no way, we have to turn off the plane, rush all the way to turn, managed to catch the last 7 PM, otherwise will wait until after 9 o ‘clock, finally turn over, all the way, at more than 9 points to the hotel, the hotel is good, no let me too disappointed.

This day comes down, the whole person is in lost journey, too tired. All right, that’s it. Here we go.

What about distributed locks?

For a stand-alone system, we can use conventional locking methods such as synchronized or ReentrantLock to achieve this. However, for a distributed cluster system, simple local locking cannot solve the problem, so distributed locking is needed. Usually we introduce tripartite components or services to solve this problem, such as databases, Redis, Zookeeper, etc.

In general, distributed locks are mutually exclusive, undead, reentrant, and so on.

Mutual exclusion means that only one client can hold the lock on the same resource at any time.

Undeadlock means that a lock timeout mechanism must be in place to ensure that the lock is released in the event of a problem and no deadlocks occur.

Reentrant means that the same thread can be locked multiple times.

How do you use databases, Redis and Zookeeper?

The database can use optimistic lock or pessimistic lock implementation.

Optimistic locking is usually we will have a version number in the database, update the data through the version number when to update, so efficiency is high, the pessimistic locking is through the way of for update, but bring lots of problems, because he is a row-level locks, under the condition of high concurrency may lead to a deadlock, the problem such as the client connection timeout, This is generally not recommended.

Redis is implemented with the set command. Prior to version 2.6.2, this was possible:

The setNX command returns success if the key does not exist. Otherwise, failure is returned.

However, this implementation of the lock and set the expiration time into two steps, they are not atomic operations, if the lock is successful after the program crash, service downtime and other exceptions, resulting in no expiration time, then deadlock problems, other threads will never be able to obtain the lock.

In later versions, Redis provides the native set command, which is equivalent to the two commands in one. There is no atomicity problem, but it can also be solved through Lua scripts.

The format of the set command is as follows:

Key indicates the key of the distributed lock

Value indicates the value of the distributed lock. Generally, different values are set for different clients

NX indicates success if the key to be set exists, failure otherwise

EX means the expiration time is seconds and PX is milliseconds, such as 10 seconds in the example above

Zookeeper is implemented by creating temporary sequential nodes.

When a resource needs to be locked, a temporary sequential node is created under the parent node.
Client A locks the resource. First, it determines whether the node currently created is the smallest node. If so, the lock succeeds, and the subsequent locking thread blocks and waits
At this point, client B also tries to lock the node. Since client A has successfully locked the node, client B finds that its node is not the smallest node, and registers the listener for the previous node
When client A completes, the lock release operation is to delete the node. This will trigger the listening event and client B will be notified. Similarly, client B will determine whether it is the smallest node and if it is, then the lock is successful

You said you solved the problem by changing the set command? So could there be other problems?

Although set solves the atomicity problem, there are still two problems.

Lock timeout problem

For example, client A set the timeout time to 3 seconds while adding the lock. As A result, the program logic has not been executed after 3s and the lock has been released. Client B also tries to lock, and client B also tries to lock.

This leads to concurrency problems, which can occur if code idempotence is not handled properly.

The lock by mistake delete

It is A similar problem. Client A locks and sets the timeout time for 3 seconds. As A result, the program logic has not been executed and the lock has been released after 3s. Client B also tries to lock. At this time, client A completes the code execution and releases the lock. As A result, the lock of client B is released.

Do you have any good solutions to these two problems?

The lock timeout

There are two solutions to this.

For the problem of lock timeout, we can make a general evaluation based on the usual service execution time, and then set a reasonable timeout time according to the evaluation time, which can avoid the problem to a large extent.
Automatic renewal, which extends the holding time of expiring locks through other threads

The lock by mistake delete

The lock of each client can only be unlocked by itself. Generally, we can generate a random value when using the set command. When unlocking, lua script is used to determine whether the current lock is owned by the client.

#lock
SET key random_value NX EX 10
#unlock
if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end
Copy the code

Do you know the RedLock algorithm?

In the master-slave architecture of Redis, the master-slave synchronization is asynchronous. If the Master node locks successfully but the command is not synchronized to the Slave node, the Master will be suspended and the Slave will be promoted to Master. The new Master has no locked data, and other clients can still lock successfully.

To solve this problem, Redis authors put forward the concept of RedLock.

The concept of RedLock requires at least two Master nodes, which are completely independent of each other, with no Master/slave synchronization or data replication between them.

The main steps are as follows:

Gets the current Unix time
Try to lock from multiple nodes in sequence. If the time to obtain the lock is less than the timeout and more than half of the nodes successfully obtain the lock, the lock is successfully added. The purpose of this is to prevent the client from waiting for a response when some nodes have gone down. For example, suppose there are 5 nodes, and the expiration time is equal to 100ms. It takes 10ms for the first node to acquire the lock, 20ms for the second node, and 30ms for the third node. Then the expiration time of the last lock is 100-(10+20+30)
If the lock fails, the lock is released on all nodes

So what’s wrong with RedLock?

In fact, RedLock has a number of problems, so it is generally not recommended to use this method, but Redission. His main problems are as follows.

Performance, resources

Multiple nodes need to be locked and unlocked separately. Generally, distributed locks are applied in high concurrency scenarios, which takes a long time and affects performance to some extent. In addition, because of the need for multiple nodes, the use of more resources, simply put, is expensive.

Node crash restart

For example, if there are five nodes 1 to 5 and persistence is not enabled, client A successfully locks nodes 1 and 2. Then, node 3 crashes and restarts, and lock information is lost. Client B successfully locks nodes 3, 4, and 5.

So, two clients A\B obtain the same lock at the same time, the problem arises, how to solve?

The method suggested by the author of Redis is to delay the restart. For example, after node 3 goes down, it should not be restarted immediately, but wait for a period of time to restart, which must be longer than the effective time of the lock, that is, restart after the lock fails. It is difficult to implement such artificial intervention measures
The second option is to enable persistence, but this has a performance impact. For example, if AOF is enabled, the data will be lost for one second at most. If you want to avoid the loss completely, the performance will be greatly affected.

GC and network latency

Martin Kleppmann raised a lot of questions about RedLock, so I’ll just give you one example of GC or network. (this question is more, I will not, for example, one by one, is a concept in heart to go, post address: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html).

As can be seen from the figure, the client1 line acquires the lock, and then GC pauses. The lock is released after the lock expires, and then the lock is acquired by client2. Then both clients acquire the lock at the same time and write data, causing problems.

The clock jumps

Similarly, if network partitioning occurs, nodes 4 and 5 become an independent subnet, and node 3 always jumps (either manually or synchronously), the lock expires. At this time, another client can successfully lock nodes 3, 4 and 5, and the problem occurs again.

Can you suggest a good solution?

As mentioned above, it is better to use Redission, which is an open source Java version of Redis client. It can be supported by single machine, sentinel and cluster environment. It also solves lock timeout, fair and unfair lock, reentrant and other problems, and also implemented RedLock. It is also the official recommended client version.

How does Redission work?

Locked, reentrant

First of all, locking and unlocking are implemented through Lua scripts, which have the advantage of being compatible with older redis and atomic.

KEYS[1] is the key of the lock, ARGV[2] is the value of the lock in the format of uUID + thread ID, and ARGV[1] is the expiration time.

The main locking logic is also easy to understand. If the key does not exist, save it in hash mode and set the expiration time. Otherwise, +1 is used.

Hincrby ‘, KEYS[1], ARGV[2], 1 reenter the hash lock +1.

unlock

If none of the keys exist, return directly
If the key and field do not match, the lock is not its own and cannot be released. Null is returned
Release lock, reentrant count -1, refresh expiration time if still greater than 0, otherwise delete lock

watchdog

Also known as a watchdog, it solves the lock timeout problem and is essentially a background thread that automatically extends the lock expiration time every 10 seconds by default.

The default time is internalLockLeaseTime / 3. The default time is 30 seconds.

Finally, how to choose different scenarios in actual production?

First, for low-concurrency and relatively simple scenarios, most of the problems can be solved in the form of database optimistic locks or unique primary keys.

Then, for Redis implementation of distributed lock performance is high, their own implementation is more troublesome, to solve lock renewal, Lua script, reentrant and a series of complex problems.

For single machine mode, there is a single point of problem.

For master-slave architecture or sentinel mode, failover may cause lock loss, resulting in red locks. However, red locks are not recommended because of their many problems. Redission is recommended.

However, no matter which method is chosen, it is not a strong consistency for Redis, and there may be problems in some extreme scenarios.

For the implementation of Zookeeper, itself is to ensure data consistency, higher reliability, so there is no problem caused by various failures of Redis, its own implementation is relatively simple, but the performance is slightly worse than Redis.

But in practice, of course, we use what the boss says. I don’t care.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.