A distributed lock

In the submission service of course-se, in order to limit the same user to submit again within a specified time (5 seconds), the developer implemented distributed lock based on Redis. This business scenario is often referred to as throttling.

As I read this section of code, I initially wondered why Redis should be used when a Map could be used to maintain relationships between individual users and their remaining time. Later, on reflection, I overlooked the fact that server services are deployed in clusters and that requests from the same user can be forwarded to different Node processes, so we need to implement distributed locking services rather than in-process locking services.

In order to provide distributed lock services, we need to ensure the following:

  • Mutually exclusive access: Only one client can obtain the lock.
  • Deadlock avoidance: For any client, the lock can eventually be acquired, even if the client with the lock is not unlocked due to various problems.

Single node lock

The existing implementation of course-se is based on the premise that the Redis service is a single node. We respectively discuss how the existing implementation ensures mutually exclusive access and avoids deadlocks.

In the implementation, we use acquire function of Lock class to achieve mutual exclusion operation, which calls getSet atomic instruction to determine whether the current key has been set: if not, set the value, indicating that the Lock can be obtained; Otherwise, the lock is occupied and cannot be obtained. When the value corresponding to the key is empty, the client obtains the lock and sets the expiration time of the value to avoid deadlocks caused by the failure to manually unlock the client. The core code is as follows.

async acquire() {
    const key = this.genKey();
    // If the value is null, the lock can be obtained. Set the value to obtain the lock.
    // If the value is not empty, the lock is occupied and cannot be obtained.
    const val = await this.tryExec('getset', key, '1');
   	// Set the expiration time of the value to avoid deadlocks caused by the client not being manually unlocked.
    await this.tryExec('expire', key, this.expireAfter);
    return val === null;
}

genKey() {
    const { curUser: { user_id }, asgn: { asgn_id }, ce: { isExam } } = this.paramData;
    return `submit-asgn:${user_id}-${asgn_id}-The ${Number(isExam)}`;
}
Copy the code

In fact, the above code is problematic, let’s pretend the following scenario: Existing service nodes node1 and node2, node1 received a user request and successfully called getSet to obtain the lock, but when preparing to set the expiration time (say 5s), Node1 unexpectedly quits, and node2 then receives the same user request. The expire command is executed to renew the lock on node1. If the user sends the same request within the expiration time, the lock cannot be released, resulting in a deadlock.

To solve the above potential problem, we need to use the SET key value NX PX expireAfter atomic instruction to replace the above combination of getSET and EXPIRE commands. We can determine lock usage based on whether the command operation result is empty or not.

async acquire() {
    const key = this.genKey();
    // If ret is not empty, the lock has been obtained; otherwise, the lock has been occupied.
    The px parameter guarantees that the value will expire after a certain amount of time, avoiding deadlocks.
    const ret = await this.tryExec('set', key, '1'.'nx'.'px'.this.expireAfter);
    return ret === null;
}
Copy the code

The function release can be used to release locks as follows.

async release() {
    return this.tryExec('del'.this.genKey());
}
Copy the code

The lock release code above is also potentially problematic. Imagine a scenario like this: Node1 has obtained the lock (its expiration time is 5s), but due to various reasons, node1 takes 6s to complete the service, so at the 5th second, the lock expires. If node2 has obtained the lock at this time, at the 6th second, A release manually called by Node1 will release the lock acquired by Node2, giving other nodes the possibility to acquire the lock.

The solution to this problem is to determine whether the lock was acquired by yourself before releasing it. As for the specific implementation, we can set the unique identifier of the current node to the value corresponding to the key when locking, and check whether the value corresponding to the key is its unique identifier before releasing the lock.

However, in the existing Redis instruction set, we cannot implement atomic operations in GET and DEL, so we have to use Lua scripts.

// Define Lua scripts to implement atomic operations on get and del.
this.redis.defineCommand('lua_unlock', {
  numberOfKeys: 1.lua         : ` local remote_value = redis.call("get",KEYS[1]) if (not remote_value) then return 0 elseif (remote_value == ARGV[1]) then return redis.call("del",KEYS[1]) else return -1 end`
});
Copy the code

Multi-node lock

For the realization of single-node lock, if Redis is deployed in master-slave mode, an error will occur. This is because the master-slave synchronization is asynchronous. When the master library is abnormal, the slave library has not obtained the lock information, which may lead to multiple processes holding the lock.

  1. Node1 has acquired the lock on the Master node.
  2. The Master crashed before the lock created on Node1 was written to the Slave.
  3. Due to the Sentinel mechanism, the Slave becomes the Master node. At this time, the Slave has no lock information held by Node1.
  4. Node2 has acquired the same lock on the Slave node that Node1 still holds.

In this case, we can useRedlockAlgorithm to achieve better robustness of distributed lock. Let’s say I haveEach Redis Master node is completely independent from each other without using distributed coordination algorithm. In this case, the process of client acquiring lock is as follows:

  1. The client gets the current time in milliseconds.
  2. Clients take turns using the same key and random values inA lock is requested on a node. In this step, the client requests a lock on each Master with a timeout that is much smaller than the total lock release time. For example, if the automatic lock release time is 10 seconds, the lock request timeout time on each node is in the range of 5-50 ms. In this way, you can prevent a client from blocking for too long on a failed Master node.
  3. The client calculates the total time spent in step 2. The lock is acquired successfully only if the client successfully acquires locks on most Master nodes (three in this case) and the total elapsed time does not exceed the lock release time.
  4. If the client successfully obtains the lock, the automatic release time of the lock is the initial lock release time minus the time consumed in the second step.
  5. If the client fails to acquire the lock, either because less than half of the locks were acquired in step 2 (), or because the total elapsed time exceeds the lock release time, the client will go to each Master node to release the lock, even those locks that it deems unsuccessful.

Initially, the Redlock algorithm flow is very clear, in the actual production, we can use Redlock package to provide strong robust distributed lock services.

npm install --save redlock
Copy the code

See the documentation for details on how to use Redlock.