Let me introduce two concepts

Safety Properties, where an application does not enter an unexpected state (e.g. invalid call parameters, array subscript out of bounds, etc.)

The three properties that make distributed locks valid

  1. Safety Properties: In this case, mutual exclusion, only one client can hold the lock at any time
  2. Liveness Property A: Deadlock-free, locks can be acquired even if the client holding the lock crashes or is partitioned
  3. Liveness Property B: Fault tolerance, clients can acquire and release locks as long as most Redis nodes are healthy

Why is failover-based implementation not enough

The simplest solution is to create a key in an instance and set an expiration date on the key to ensure that the lock will eventually be released. Deleting the key when the client releases the lock may look good, but there is a problem: What happens when our primary Redis node fails? Ok, so let’s add a slave node that automatically switches to the slave node when the master node is unavailable. Unfortunately, this is not possible because Redis replication is asynchronous, so mutual exclusion is not guaranteed

There is an obvious race condition under this scheme:

  1. Client A obtains the lock on the master node
  2. The master node hangs up before the key write operation is transferred to the slave node
  3. Slave is promoted to master
  4. Client B violates the lock SAFETY VIOLATION mentioned in this article by accessing the lock VIOLATION client A has just obtained

Despite the drawbacks mentioned above, this scheme can be used in some special scenarios. For example, when a failure occurs, multiple clients holding the lock at the same time has no significant impact on system operation or business logic, this copy-based solution can be used. Otherwise, it is best to use the Redlock algorithm that will be covered later in this article

Correct implementation in single instance case

Before addressing the single instance single point of failure limitation, let’s look at how to correctly execute it to acquire a lock: set resource_name my_random_value NX PX 30000

  • Create resource_name when it does not exist (NX option) and set the expiration time to 30000 milliseconds
  • Value is A random value that is unique to each client and each lock request. The purpose of this is to safely release the lock without client A having the lock deleted by client B. A simple lua code tells Redis to remove the key only if it exists and the value is equal to the value currently held by the client
if redis.call("get", KEYS[1]) == ARGV[1] then
	return redis.call("del", KEYS[1])
else
	return 0
end
Copy the code

It is very important to prevent locks created by other clients from being deleted by mistake. For example, when a client obtains a lock, the key is deleted due to some long operation that blocks for longer than the available time of the lock (key expiration time). And then the key created by other client (that is, the other client get lock) if a client in a client before finished lock the lock is released before the operation, can lead to actually now belongs to a client after the lock to be deleted So you have to use the above script to ensure that the client must be released their hold the lock, And the generation of random values is important and must be globally unique

Next we extend the above algorithm to the distributed case

Redlock algorithm

In the distributed version of the algorithm, N Redis nodes are assumed. These nodes are all independent, master nodes, and do not use a distributed coordination scheme. Assume N=5, that is, five Redis master nodes are deployed on different machines (or virtual machines)

The client needs to do the following to obtain the lock:

  1. Get the current time in milliseconds
  2. Try to obtain locks on N nodes in sequence (set the same key value). The client requests a lock per node with a request timeout that is very small relative to the total lock expiration time. For example, if the lock expiration time is 10s, the request timeout time should be between 5 and 50ms. This prevents the client from blocking for a long time on a failed node: if the instance is not available, we should try to get the next instance as soon as possible
  3. The client calculates how long it takes to acquire the lock (the current time minus the time in step 1). The lock is considered successful if and only if the client successfully obtains the lock on most nodes (at least three) and the total time consumed is less than the lock validity time
  4. If the lock is acquired successfully, its lifetime is the original lock lifetime minus the time it took to acquire the lock
  5. If the lock acquisition fails for some reason (whether it fails on most nodes or the lock duration is less than zero), an attempt will be made to release the lock on all nodes (even those nodes that failed).

Is this algorithm asynchronous

This algorithm relies on the assumption that even in two processes without a synchronous clock mechanism, the local time of each process advances at the same rate, and even if there is an error, the error time relative to the automatic lock release time is negligible. This assumption is very much like that of computers in the real world: each computer has a local clock, and we often believe that the time difference between different computers is very small, at which point we need to refine the rules of mutex: You must ensure that clients do all of their work within lock expiration time – clock drift time. For more information read the interesting article Leases: an efficient fault-tolerant mechanism for distributed file cache consistency

Error retry

When a client fails to acquire the lock, it should try again after a random delay to avoid the situation where a large number of clients acquire the lock at the same time, in which case a split brain condition may occur, causing no one to acquire the lock. In addition, the faster a client tries to acquire locks on most nodes, the smaller the window of time for a split brain to occur. So ideally, the client should be parallel to all nodes at the same time acquiring a lock requests Here it is necessary to emphasize that the client did not succeed for the lock, be sure to as soon as possible parallel releases the lock on all nodes, so no need to wait until after the key timeout to acquire the lock, but if a network partition occurs, When the client cannot connect to the Redis node, the system availability will be lost during the period when the lock expires automatically.

Release the lock

Releasing locks is relatively simple, because all nodes need to do is release locks, regardless of whether or not a lock was previously acquired on that node

Security demonstration

Whether this algorithm is secure or not, we can observe the performance of a number of different cases. We assume that the client can obtain a successful lock on all nodes, and all nodes will have a key with the same lifetime. Note, however, that this key is set at different times, so the key will also timeout at different times. If, in the worst case, the first key is set at T1 time (sampled before the request is initiated) and the last key is set at T2 time (sampled after the server responds), We can confirm that the key with the earliest timeout will have at least MIN_VALIDITY=TTL-(T2-T1) -clock_drift time. All other keys will time out longer than that, so we can be sure that at least until that point in time all of these keys are co-existing and while most nodes have keys set, no other client can preempt the lock, An N/2+1 SET NX operation cannot succeed with N/2+1 keys. So if a lock is acquired successfully, it is not possible to acquire it again at the same time (violating security properties)

In addition, we also need to ensure that multiple clients will not succeed in acquiring locks at the same time. If the time it takes for a client to acquire locks on most nodes approaches or exceeds the maximum validity period of the locks, the system will consider the locks invalid and unlock them all. So we only need to consider the case where the lock acquisition time is less than the lock validity time on most nodes. In the case discussed earlier, no client successfully reacquired the lock during the MIN_VALIDITY time. So multiple clients can only acquire locks on most nodes for longer than TTL, which invalidates the lock

LiVENESS

System availability is based on three characteristics:

  1. Automatic lock release (based on key expiration) : Eventually the lock must be acquired again
  2. In reality, the client will release the lock voluntarily, so we do not need to wait until the key expires to obtain the lock
  3. When a client initiates a retry to obtain a lock, it will wait longer than most nodes to obtain the lock, which reduces the probability of multiple clients simultaneously requesting the lock and causing a split state

However, we lose ttL-time system availability when network partitioning occurs, so if partitioning occurs continuously, the unavailability will persist. This happens every time a client acquires a lock and encounters a network partition before releasing the lock. Basically, if the network partition continues, the system will continue to be unavailable

Performance, recovery after failure, and fsync

When using Redis for distributed lock services, many users require not only low latency but also high throughput (the number of add/unlock operations per second). To achieve this, you can use multiplexing to communicate with N servers in parallel, or you can set the socket to non-blocking mode, send all the commands at once, and then process all the returned commands at once, assuming that the network latency between the client and the different Redis service nodes is not large

In order to realize the fault recovery, we need to consider the issue of persistence Suppose you have a client to obtain the success lock (successful) at least 3/5 of the node, and have successfully gets the lock one node of the restart, then we will have three lock can be allocated nodes, so that other clients can successful lock again, A violation of mutex security

If AOF persistence is enabled, the situation is much better. For example, we can initiate SHUTDOWN and restart the server. Because Redis timeout is semantic, it still exists during server SHUTDOWN, so the expiration policy still exists. But what if there is an unexpected SHUTDOWN? If Redis is configured to synchronize data to disk once per second (the default), some keys may be lost on reboot. In theory, we must set fsync=always if we want to ensure that the lock is secure in any case of reboot. But it would be completely sacrificing performance, making it and the traditional distributed lock scheme of CP system makes no difference But things are never as bad as at first glance, basically, as long as a service node hang after the restart don’t control system in the existing active lock, such as node restart, the system of active lock must be used by the client are got lock, To ensure this, simply make the crashed and restarted instance unavailable for the maximum lock duration, and make all the old lock information on that node expire. Using a delayed restart can basically solve the security problem, but be aware that this may cause a decrease in availability: When most nodes in the system hang, the entire system will be unavailable (unable to acquire locks) for the duration of the TTL

Make the algorithm more reliable: extended locks

If the work performed by the client consists of small steps, a relatively small TTL time can be used to set the lock and refresh the lock validity time (renewal) when the lock is about to expire. But technically it does not change the nature of the algorithm, so you should limit the maximum number of attempts to reacquire the lock, otherwise it would violate availability