Distributed 3: Solutions in high concurrency scenarios and implementation of distributed locking

This is the 15th day of my participation in the More text Challenge. For more details, see more text Challenge

This is the third installment of the Distributed column on the skills that must be mastered in the evolution of individual applications to microservices. I want to summarize some of my learning on distributed technology and share some of my thoughts on distributed technology.

background

Often encountered in project concurrent scenarios, such as the most typical seconds kill scenario, suppose the back-end have a inventory list, when large number of requests to come over, will experience reading inventory + inventory reduction of two steps, such as inventory is 100, two threads first read inventory 100 respectively, and then each perform minus 1 and write back to the database, the database data is 99 instead of the expected 98, This situation, if not dealt with by technical means, can easily lead to oversold inventory.

Another example is done before I leave system, there is a table to store the back-end please employees disposable vacation days, ask for leave when there are multiple application being approval at the same time, they are going to read + minus the operation of the holiday vacation, if two threads are read data to the holiday, and then decrease respectively carry out the operation of the holiday, it is easy to lead to less holiday buckle.

I’ll summarize some of my own thoughts on typical scenarios in this development.

Idea 1: SQL optimization

To simulate this scenario, assume that the inventory table is called stock, the number of items is called num, and the new number calculated by the business code is new_num=num-1. The original SQL is executed as:

update stock set num=new_num where id = id=#{id};
Copy the code

We can improve SQL so that the database updates the data based on its current values, such as writing like this:

update stock set num=num- 1 where id = #{id};
Copy the code

This may seem to solve the problem, but if the amount of inventory is 1, and the two threads decrease by 1, the inventory will be deducted to -1, and the concurrency problem will not be solved completely.

Idea 2: Code lock

Since there is still a problem with both threads holding data at the same time, there is no way to get them to execute in sequence, and it is easy to think of adding a lock, such as Java’s synchronized keyword, before checking the inventory.

This is a good guarantee of serialization, ensuring that only one thread at a time performs the inventory and inventory section, but it also has many disadvantages:

It is impossible to achieve fine-grained control. If some people kill commodity A and some kill commodity B, they both need to use the seckill method, which does not conflict with each other but can only be executed serial, and access will become very slow.
Since @service in Spring is a singleton, you can write it this way. If you use Python’s Django framework, you can’t do similar control at the code level.
It only supports single point (single machine, server environment) and cannot be scaled out horizontally. At present, it is basically deployed in clusters and can only take effect on one machine.

Perhaps we need a third-party mechanism to implement such a lock.

Mysql-based pessimistic locking

Because all threads are accessing the same table, the database can be locked, the typical approach is pessimistic lock and optimistic lock two kinds.

Pessimistic locking uses select… For update syntax, for example:

select * from stock where id=#{id} for update;
Copy the code

Note: To use pessimistic locking, place the business code in the transaction.

Pessimistic locks lock data rows when acquiring data. Other transactions must wait for the end of the original transaction to acquire the lock.

Use the selec… MySQL InnoDB defaults to row-level Lock, so MySQL will only perform Row Lock if the primary key is “explicitly” specified. Otherwise, MySQL will perform Table Lock (Lock the entire data form).

The disadvantage of pessimistic locking is that because the table is locked, the concurrency is not high. If the concurrency is too high, the database will be overstressed and break down.

Idea 4: Optimistic locking based on MySQL

Compared with pessimistic lock, optimistic lock assumes that data generally does not cause conflict, so when data is submitted for update, it will formally check whether data conflict or not. If a conflict is found, it will return error information to the user and let the user decide what to do.

Optimistic locking can be implemented in two ways.

Use version numbers to implement optimistic locking

There are two ways to implement the version number, one is the data version mechanism, the other is the timestamp mechanism. The details are as follows.

It is implemented using the data Version recording mechanism, which is the most common implementation of optimistic locking. What is data version? This is to add a version identifier to the data, usually by adding a version field of a numeric type to the database table. When data is read, the value of the version field is read together, and the version value is increased by one each time the data is updated. When we submit the update, we judge the current version of the corresponding database table and compare it with the version value fetched for the first time. If the current version number of the database table is the same as the version value fetched for the first time, it will be updated; otherwise, it is regarded as outdated data.

If optimistic locking is used, the SQL change of the lock inventory is as follows:

1. Query the product information

select num,version from stock where id=#{id};
Copy the code

The amount deducted. new_num = num -1
Modify the database.

update stock set num=new_num, version=version+1 where id=#{id} and version=#{version};
Copy the code

If the version has changed, the update fails. You can retry or report an error according to the specific service.

Use conditional restrictions to implement optimistic locking

This applies to only update is to do data security check, suitable for inventory model, buckle share and rollback share, higher performance, very convenient and practical.

SQL is changed to:

update stock SET num = num - #{buyNum} where id = #{id} and num - #{buyNum} > = 0;
Copy the code

Note: Optimistic lock update operations, preferably with the primary key or unique index update, which is a row lock, otherwise the update will lock the table.

Optimistic locks have higher concurrency capability than pessimistic locks, so they are used more. However, as the concurrency increases, the pressure on the database is also greater, because if the update fails and the database is retried, the database may be constantly queried.

Idea 5: Distributed locking based on Redis

Since database lock implementations never get around concurrency issues, we turned to third-party middleware such as Redis.

By the way, if concurrency is not high, a pessimistic or optimistic lock based on MySQL can solve the problem, and it is relatively easy to maintain, no additional components need to be introduced, and system availability is high. By doing architecture, we are actually doing trade-offs.

Thread A sets the key (such as goods_1) to 1 in Redis. If thread B also finds the key (such as goods_1) to 1, thread B will not get the lock until thread A releases it.

We use the redis-py library to create a Lock class with two methods: acquire a Lock and release a Lock. We use the redis-py library to create a Lock class with two methods: acquire a Lock and release a Lock.

import redis
class Lock:
    def __init__(self, name) :
        self.redis_client = redis.Redis(host="127.0.0.1")
        self.name = name

    def acquire(self) :
        # If null or None indicates that the lock was acquired
        if not self.redis_client.get(self.name):
            self.redis_client.set(self.name, 1) 
            return True
        else:
            # Return False if you do not want to block the polling lock
            while True:
                import time
                time.sleep(1)
                if self.redis_client.get(self.name):
                    return True

    def release(self) :
        self.redis_client.delete(self.name)
Copy the code

Idea 6: Redis distributed lock issues and improvements

Ensure atomicity

If not self.redis_client.get(self.name), the distributed lock will be invalid. If not self.redis_client.get(self.name), the distributed lock will be invalid.

So we want to make sure that we have atomicity in the reading and setting of the values.

How to solve it? Redis naturally provides the setnx command to guarantee atomic operations, which sets the specified value for the specified key if it does not exist.

So we can update the Lock code above as follows:

import redis
class Lock:
    def __init__(self, name) :
        self.redis_client = redis.Redis(host="127.0.0.1")
        self.name = name

    def acquire(self) :
        if self.redis_client.setnx(self.name, 1) :Return 1 if no setting is present, 0 if no, this is atomic
            return True
        else:
            # Return False if you do not want to block the polling lock
            while True:
                import time
                time.sleep(1)
                if self.redis_client.setnx(self.name, 1) :Note that this area should also be changed
                    return True

    def release(self) :
        self.redis_client.delete(self.name)
Copy the code

Expiration time of the lock

Let’s think about it further. Suppose a thread gets the lock, but for some reason (such as the network can’t connect to Redis, code exceptions, power outages) does not release the lock. As a result, no other thread can get the lock and is in a state of waiting forever, resulting in a deadlock.

A better solution is to set the expiration time of the lock. We use the set function provided by the Redis-py library to add atomic operations and expiration parameters.

self.redis_client.set(self.name, self.id, nx=True, ex=15)
Copy the code

The renewal of lock

However, the expiration setting creates a new problem:

If the current thread does not finish executing after a certain period of time and the key expires, another thread will get the lock and continue executing, and the program will not be safe.

In addition, another thread may finish executing first and release the current lock, freeing the lock that belonged to the original thread.

Therefore, if the current thread does not finish executing, it should renew the lease at an appropriate time and reset the expiration time.

As a rule of thumb, if the expiration time is 15s, renew the lease 2/3 of the time, which is 10 seconds later to reset the expiration time to 15s. You can start a new thread in the program to complete the lease renewal process on a regular basis.

Delete lock security

As mentioned earlier, there is a risk that another thread will release the lock of the current thread.

We can set the value of the lock to uUID instead of 1, and determine whether it is the uUID of the current thread when it is released, so as to prevent it from being released by another thread.

So the distributed lock code now evolves to look like this:

import uuid
import redis
class Lock:
    def __init__(self, name, id=None) :
        self.id = uuid.uuid4()
        self.redis_client = redis.Redis(host="192.168.0.104")
        self.name = name

    def acquire(self) :
        # Set the expiration time to 15s, return 1 if there is no set, return 0 if there is no set, this is atomic operation
        if self.redis_client.set(self.name, self.id, nx=True, ex=15) :Start a thread and then periodically refresh the expiration. This is best done using lua scripts as well
            return True
        else:
            while True:
                import time
                time.sleep(1)
                if self.redis_client.set(self.name, self.id, nx=True, ex=15) :return True

    def release(self) :
        Select * from lock where id = id; select * from lock where id = id; select * from lock where id = id
        # This code is unsafe and atomizes get and DELETE operations, but Redis can use lua scripts to complete the operation and atomize the operation
        id = self.redis_client.get(self.name)
        if id == self.id:
            self.redis_client.delete(self.name)
        else:
            print("You cannot delete locks that are not your own.")

Copy the code

My above code is not complete, because to atomize get and DELETE operations, Redis does not have relevant operation instructions, which will still cause security problems, and the same will happen in the process of lease renewal.

However, Redis provides lua scripting support for such user scenarios, where users can send lua scripts to the server to perform custom actions and get the response data from the script. The Redis server executes the Lua script atomically in a single thread, ensuring that the lua script is processed without being interrupted by any other requests.

Summary: Redis distributed locks

So, let’s summarize the problems that distributed locking needs to solve:

Mutually exclusive: Only one client can own the lock at any time. Multiple clients cannot acquire the lock at the same time
Deadlock: The client that acquired the lock is down for some reason and cannot release the lock. Other clients cannot acquire the lock. A mechanism is needed to prevent this type of problem
- For example, the code is abnormal, so it can’t run to release
- Current server network problems, such as the release, suddenly can not access Redis
- Server power failure
Security: A lock can only be deleted by the user who holds the lock, but not by other users
Fault tolerance: When some nodes break down, clients can still acquire or release locks

Third party open source library for Python

I mentioned the need to use Lua scripts to solve security problems, but Lua scripts can be difficult to write, here I recommend a third-party open source library on the network, there is a very complete Python implementation of Redis distributed locking method.

Github: github.com/ionelmc/pyt…

See github.com/ionelmc/pyt… This file

I post a flowchart of the library here:

The overall implementation of the library is the same as we talked about earlier, but there are some differences in the details. After reading the source code, I summarize as follows:

To reduce pointless polling, a separate Redis list will be created, using RedisblpopDirective to move out and get the first element of the list. If the list has no elements, the list will be blocked until the wait times out or an eject element is found.
Renewal is a little bit harder to read, but basically, it starts a thread to renew the lease, assuming that the lock is 15s, and the renewal check interval is 2/3 of that 15s, which is 10s, and what renewal does is reset the timeout of the lock to 15s, and continue the check after 10s.
The library provides optional parameters that should be used with caution if the business code is blocked and the lease renewal may cause the code to wait indefinitely.
Using the__enter__.__exit__Magic method, client call supportwithThe implementation.

So far, Redis lock part of the distributed lock has been all introduced, its advantages are simple to use, while high performance, and Redis is also a very common component, do not need additional maintenance.

In addition, since stand-alone Redis also has the risk of failure, we will use Redis Cluster or Redis Sentinel in production to ensure high availability.

Idea 8: Distributed locks based on ZooKeeper

In addition to the Redis implementation, ZooKeeper can also implement distributed locking. However, if zooKeeper is not used in the project, I don’t see the need to introduce additional components, and I won’t go into much detail about this approach.

Thread 9: Redission in Java

For Java code, you can use the Redission package to implement distributed locking. Redis Distributed Locking

The example code is as follows:

// 1. Construct redisson to implement the necessary Config for distributed locking
Config config = new Config();
config.useSingleServer().setAddress("Redis: / / 127.0.0.1:5379").setPassword("123456").setDatabase(0);
// 2. Construct the RedissonClient
RedissonClient redissonClient = Redisson.create(config);
// 3. Acquire lock object instances (not guaranteed in thread order)
RLock rLock = redissonClient.getLock(lockKey);
try {
    /** * 4. Attempts to acquire a lock * waitTimeout Maximum wait time for attempts to acquire a lock. If the value exceeds this value, the lock is considered to have failed * leaseTime Time for holding the lock. Ensure that the business can be processed within the lock period) */
    boolean res = rLock.tryLock((long)waitTimeout, (long)leaseTime, TimeUnit.SECONDS);
    if (res) {
        // The lock is successfully acquired and the business is processed here}}catch (Exception e) {
    throw new RuntimeException("aquire lock fail");
}finally{
    // Unlock at the end, no matter what
    rLock.unlock();
}
Copy the code

Subsequent planning

Due to time, this article only focused on the distributed locking theory, did not involve the complete project code, later I will do a complete demo for this piece, to simulate the seckill scene, welcome to continue to pay attention, one key three connection ~