Redlock (Redis distributed lock) principle analysis

Redlock: Full name Redis Distributed Lock; That is, the distributed lock implemented by Redis;

Usage scenario: Multiple services can have only one request from a user at a time and within a period of time (to prevent concurrent attacks on key services).

 

IO /topics/dist…

This lock algorithm realizes the situation of multiple REDis instances. Compared with single REDis node, it has the advantage of preventing the whole service from stopping when a single node failure occurs. And in the multi-node lock design, and multi-node collapse and other accidents have their own unique design methods;

 

Concepts related to this blog or official document:

1.TTL: Time To Live; The expiration time or valid lifetime of only the redis key

The clock drifts. Refers to the difference between two computers (or between two processes) when the time flow rate between two computers is basically the same; If the computer is too far away, the clock drift value will be too large

 

The minimum requirements for ensuring the effectiveness and security of distributed locks are as follows:

1. The incompatible; Only one client can acquire the lock at any time

2. Release the deadlock. The lock can be released even if the service or partition that locks the resource crashes

3. Fault tolerance; As long as most redis nodes (more than half) are in use, the client can acquire and release locks

 

Redis master/slave cannot really implement Redlock:

For example, after clientA acquired the lock, the master Redis crashed in the process of copying data to slave Redis, resulting in no replication to slave Redis. Then, an upgraded primary Redis is elected from Redis, resulting in that the new primary Redis does not have the lock set by clientA. ClientB tries to obtain the lock and succeeds in obtaining the lock, resulting in the mutual exclusion failure.

The reason for this failure is that if you upgrade from redis to primary Redis immediately after TTL, or if you upgrade to primary Redis immediately but after TTL, you can execute the task of obtaining the lock, then the mutual exclusion effect can be successfully generated. Is this the way to implement redis master-slave Redlock?

 

The correct way to implement distributed locks in a single instance of Redis (atomicity is important) :

1. When setting a lock, use the set command, which includes setnx and expire functions, to perform atomic operations. Set a random value for the key, return True only if the key does not exist, and set the key expiration time (preferably in milliseconds).

SET key_name my_random_value NX PX 30000 # NX SET if not exist and return True, otherwise SET False PX 30000 indicates that the key expires after these millisecondsCopy the code

2. After obtaining a lock and completing related services, you need to delete the lock set by yourself (you can only delete the lock set by yourself, but not others).

Reason for deletion: To ensure high utilization efficiency of server resources, the lock is not deleted until it expires automatically.

Delete method: it is best to use Lua script to delete (Redis guarantees that no other operations are performed when executing this script to ensure the atomicity of the operation), the code is as follows; The logic is to get the key first, delete the key if it exists and the value is self-set. Otherwise skip;

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end
Copy the code

The python code is as follows:

redis.eval(f"""if redis.call("get",KEYS[1]) == ARGV[1] then return redis.call("del",KEYS[1]) else return 0 end""", 1, redis_key, random_val)
Copy the code

 

The algorithm flow chart is as follows:

                                          

 

Distributed lock algorithm (RedLock) implemented by multi-node Redis: Effectively prevent single point of failure

Suppose you have five completely independent Redis primary servers

1. Obtain the current timestamp

2. The client attempts to obtain the locks of all Redis services by using the same key and value in sequence. The acquisition time is much shorter than the lock expiration time, so as not to wait too long for the closed Redis services. And try to get the next redis instance.

For example, if the TTL is 5s, the maximum duration of lock acquisition is set to 1s, so if the lock cannot be acquired within a second, the lock is abandoned and the next lock is attempted

3. The client subtracts the time of the first step from the time after obtaining all available locks. The time difference must be less than the TTL time and at least three Redis instances successfully obtain locks

4. If the lock is successfully acquired, the actual time of the lock is TTL minus the time difference of the third step. For example, if TTL is 5s and it takes 2s to acquire all the locks, the actual lock duration is 3s(minus clock drift);

5. If the client fails to acquire the lock for some reason, it starts unlocking all instances of Redis. Less than three locks may have been obtained and must be released to prevent other clients from obtaining locks

The algorithm is shown as follows:

                              

 

 

Is RedLock an asynchronous algorithm?

You can view it as a synchronization algorithm; Because even though there is no synchronized clock between processes (multiple computers), the time flow rate of each process is roughly the same; Moreover, clock drift is small relative to TTL and can be ignored, so it can be regarded as a synchronization algorithm. (The algorithm takes into account clock drift, because if two computers are on opposite sides of the earth, the clock drift is very large.)

 

RedLock failed and retry

If the client cannot obtain the lock, try again after a random time. The set command should be sent concurrently to all instances of Redis at the same time. In addition, the client that has obtained the lock should release the lock in time after completing the task to save time.

 

RedLock releases the lock

Because the lock release will determine whether the value of the lock is set by itself, if so, will be deleted. Therefore, it is very simple to release the lock, just issue the lock release command to all instances, regardless of the success of the lock release;

 

Safety arguments:

1. Assume that the client obtains all instances. All instances contain the same key and TTL, but each instance cannot expire at the same time because the set command time is different. If T1 is used before the first set command and T2 is used after the last set command, the minimum time for the client to obtain the lock is TTL-(T2-T1)- clock drift.

2. If the value of N/2+ 1 is equal to half, the value of N/2+ 1 is equal to half. If the value of N/2+ 1 is equal to half, the value of N/2+ 1 is equal to half

3. If a client locks most instances for longer than or close to the lock expiration time, the lock is considered invalid and the Redis instance is unlocked (no service is performed). As long as more than half of the locks are successfully acquired within the TTL, the locks are valid. Otherwise it is invalid

 

The system has three characteristics of activity

1. Automatically release the lock

2. Release the lock automatically when the lock fails to be acquired (less than half) or the task is completed without waiting for the lock to expire automatically

3. The time before the client tries to obtain the lock (the interval between the first retry and the second retry) is longer than the time used to obtain the lock for the first time.

4. The number of retries for obtaining the lock must be limited

 

RedLock performance and crash recovery solutions

1. If Redis does not have the persistence function, after clientA successfully obtains the lock, all Redis restarts and clientB can obtain the lock again, which violates the exclusive mutual exclusion of the lock.

2. If the AOF permanency storage is enabled, things will be better. For example: When we restart Redis, because the redis expiration mechanism is in accordance with the Unix timestamp, so after the restart, it will expire in accordance with the specified time. However, the default mode of AOF synchronization to disk is once per second. If AOF is powered off within one second, data will be lost, and immediate restart will cause lock mutual exclusion failure. However, if the disk synchronization mode uses Always(every write command is synchronized to the disk), the performance deteriorates sharply. Therefore, there must be a trade-off between the complete validity and performance of the lock.

3. The effective solution to ensure the full effectiveness of lock and efficient performance and even if the power is off is to keep the default redis synchronization to disk mode per second, redis should wait for TTL time after stopping for any reason and then restart (scientific name: delayed restart); The disadvantage is that the service is equivalent to the suspended state during TTL time;

 

Conclusion:

1. The TTL duration must be longer than the time of normal service execution + time consumed by obtaining all REDis services + clock drift

2. The time consumed for obtaining all services of Redis should be much shorter than the TTL time, and the number of successful locks should be more than half of the total number :N/2+1

3. The time for attempting to acquire each Redis instance lock is much shorter than the TTL time

4. There must be a limit on the number of attempts to obtain all locks

5. After redis crashes (one or all), delay the TTL to restart Redis

6. In the realization of multiple Redis nodes, the single node distributed lock algorithm should be combined to achieve the joint implementation

 

The flow chart of Redis distributed lock algorithm found on the network is as follows (not recommended) :

Reasons not recommended:

1. According to the flow chart, it can be seen that the process is complicated

2. Use the older setnx method to obtain locks and expire methods (atomic operations cannot be guaranteed)

3. Redis single point, error compatibility cannot be achieved;

 

 

The following is the interpretation of the official website (English level is not enough, if there is any understanding problem, please point out) :

Source: www.cnblogs.com/rgcLOVEyaya…