I missed work today in order to finish this article quickly. I really couldn’t be more sweet.

Think about it over a cup of tea. You don’t want to fuck for nothing! Three in a row?

The introduction

Why learn distributed locking?

The simplest reason is to recruit programmers as a society, when the interview must be faced, you see how many public numbers are turned over and over the theme of the hair distributed lock, visible it is very important, in the college entrance examination this is to send points, do not blame the pity.

Will the fresh graduates also ask? It’s not certain, but if you do, the interviewer will definitely pay you that much more.

Third, distributed locking is a must-have skill on a slightly larger scale system. Take a look.

Distributed lock to solve the problem

Distributed lock is an important primitive in distributed environment. It indicates that different processes operate on shared resources in a mutually exclusive manner. A common scenario is introduced into large projects as an SDK that addresses two main types of problems:

  • Improved efficiency: Locking is used to avoid unnecessary reprocessing. For example, prevent idempotent tasks from being preempted by multiple performers. At this time, the correctness of the lock is not high;
  • Ensure correctness: Locking is done to prevent Race conditions from causing logic errors. For example, direct use of distributed locks to implement anti-weight, idempotent mechanisms. At this time, if the lock error will cause serious consequences, so the correctness of the lock requirements are high.

Locks in Java:

Lock is a very common tool in the development process, you must not be unfamiliar with, pessimistic lock, optimistic lock, exclusive lock, fair lock, unfair lock and so on, many concepts, if you do not understand the Java lock, you can refer to this article: Said Java “lock” (https://tech.meituan.com/2018/11/15/java-lock.html), the article written by very full, but for a beginner, know the concept of the lock, because of the lack of practical work experience, may not know the actual usage scenarios of lock, In Java, thread security can be implemented using the keywords Volatile, Synchronized, and ReentrantLock.

In distributed system, Java lock technology is unable to lock the code on two machines at the same time, so it should be realized through distributed lock. Skilled use of distributed lock is also a necessary skill for dachang development.

1. Interviewer: Have you ever come across a situation where you need to use distributed locks?

Problem analysis:

This problem is mainly used as an introduction, first to understand what scenarios need to use distributed lock, distributed lock to solve what problems, in this premise to help you better understand the implementation principle of distributed lock.

Distributed locks generally meet the following requirements:

  1. The system is a distributed system, and the Java lock is no longer locked.
  2. Manipulating shared resources, such as unique user data in the library.
  3. Synchronous access means that multiple processes operate on shared resources simultaneously.

I:

Give me an example of a distributed lock scenario I used in a project:

Consumption points can be found in many systems, such as credit cards and e-commerce websites, exchanging gifts through points, etc. Here, the operation of “consumption points” typically requires the use of locks.

Event A: Taking points for gifts as an example, the complete point consumption process can be divided into 3 steps:

A1: The user selects the product, initiates exchange and submits the order.

A2: The system reads the user’s remaining points to judge whether the user’s current points are sufficient.

A3: Deducts user points.

Event B: The system can issue points to users in 3 steps:

B1: Calculate the credits earned by the user on the day

B2: Read the user’s original score

B3: Add credits to the original credits

So the question is, what if user consumption points and user accumulation points happen at the same time?

Assume: when the user is consuming points, the offline task is calculating points and issuing points to the user (for example, according to the amount of consumption of the user that day), and the two things are going on at the same time. The following logic is a little convoluting, please be patient to understand.

User U has 1000 points (the data recorded by user points can be understood as shared resources), and 999 points will be consumed for this exchange.

Without locking: when the program of event A reads the points in step 2, the result of operation A:2 reads 1000 points, judging that the remaining points are enough for this exchange, then the operation A:3 is performed to deduct points (1000-999 = 1). The normal result should be the user or 1 point. However, event B is also executing at this time, this time to issue 100 credits to user U, two threads at the same time (synchronous access), without locking, there will be the following possibility, A:2 -> B:2 -> A:3 -> B:3, before A:3 is completed (deduct credits, 1000-999), User U’s total score was read by the thread of event B, and user U’s total score became 1100 points and a gift of 999 points was exchanged for nothing, which obviously did not meet the expected result.

How could someone says so qiao integral operation users at the same time, the CPU is so fast, as long as the user enough, concurrency is enough big, sooner or later, murphy’s law came into effect, this bug is only a matter of time, and may be made in black industry stuck this bug crazy while wool, this time as a developer to solve the hidden trouble, you must understand the use of the lock.

(Writing code is a serious business!)

Java itself provides two built-in Lock implementations, synchronized and JDK locks implemented by the JVM, and many atomic operation classes are thread-safe, which can be used to implement locks when your application is standalone or single-process.

However, the systems of Internet companies are almost distributed. At this time, Java’s synchronized or Lock can no longer meet the requirements of Lock in distributed environment, because the code will be deployed on multiple machines. In order to solve this problem, distributed Lock comes into being. Multiple physical machines cannot share memory. The common solution is interference based on memory layer. The landing solution is distributed lock based on Redis or ZooKeeper.

(I can’t go into more detail. Is the interviewer dissatisfied?)

Interviewer: What are the common solutions for distributed locks?

I: common of 3 kinds of way!

  1. Distributed locks for Reids, which many large companies extend based on.
  2. Based on a Zookeeper
  3. Based on databases, such as Mysql.

3. Interviewer: Talk about the Redis distributed lock implementation method

Problem analysis:

At present, there are two main ways to realize distributed lock: 1. Based on Redis Cluster mode. 2. Based on the Zookeeper cluster mode.

Master these two first, deal with the interview basically no problem.

There are roughly three kinds of locks, which are DB distributed lock, Redis distributed lock, and Zookepper distributed lock.

I:

1. Distributed lock based on Redis

Method 1: Use the setnx command to lock

public static void wrongGetLock1(Jedis jedis, String lockKey, String requestId, int expireTime) {

  // Step 1: Lock

    Long result = jedis.setnx(lockKey, requestId);

    if (result == 1) {

        // Step 2: Set the expiration time

        jedis.expire(lockKey, expireTime);

    }



}

Copy the code

Code explanation:

  • The setnx command means set if not exist. If lockKey does not exist, the key is stored in Redis. If result returns 1 after saving successfully, it means that the setting is successful; if not, it means that the setting has failed.
  • Expire () sets the expiration time to prevent deadlocks, assuming that if a lock is not deleted after being set, the lock will exist forever, causing deadlocks.

(At this point, I need to emphasize a “but” with the interviewer.)

Thinking, where is the flaw in my method above? Continue explaining to the interviewer…

Expire. Setnx and expire are not the same atomic operation. If an exception occurs after executing the first step, jedis. The second step, jedis.expire(lockKey, expireTime), is not executed. How can this problem be improved?

Improvement:

public class RedisLockDemo {



    private static final String SET_IF_NOT_EXIST = "NX";

    private static final String SET_WITH_EXPIRE_TIME = "PX";



    / * *

* Get distributed locks

     * @paramJedis Redis client

     * @paramLockKey lock

     * @paramRequestId indicates the requestId

     * @paramExpireTime Expiration time

     * @returnSuccess Or not

* /


    public static boolean getLock(Jedis jedis, String lockKey, String requestId, int expireTime) {



    // Two steps in one, one line of code to lock and set + expiration time.

        if (1 == jedis.set(lockKey, requestId, SET_IF_NOT_EXIST, SET_WITH_EXPIRE_TIME, expireTime)) {

            return true;// Lock succeeded

        }

        return false;// Failed to lock



    }



}

Copy the code

Code explanation:

Lock and set expiration time for one, a line of code to fix, atomic operation.

(Without waiting for the interviewer to ask, the interviewer is satisfied)

Interviewer: What about unlocking?

I:

To release a lock is to delete a key

Run the del command to unlock the account

public static void unLock(Jedis jedis, String lockKey, String requestId) {

        

    // Step 1: Use requestId to determine whether locking and unlocking are the same client

    if (requestId.equals(jedis.get(lockKey))) {

        // Step 2: If, at this point, the lock is suddenly not the same client, the lock will be misunderstood

        jedis.del(lockKey);

    }

}

Copy the code

Code explanation: Jedis. Del (lockKey) is not an atomic operation. In theory, the lock has expired after executing the first step if and is acquired by another thread. This is the time to execute the jedis.del(lockKey) operation, which is equivalent to releasing someone else’s lock, which is not reasonable. Of course, this is a very extreme case, and if there are no other business operations in the first and second steps of the unLock method, and the code above is thrown online, it probably won’t really be a problem, because the number of concurrent operations is not high enough to expose the defect, so it’s not a problem.

But writing code is rigorous work, and if it can be perfect, it must be perfect. In view of the problems in the above code, improvements are proposed.

Code improvements:

public class RedisTool {



    private static final Long RELEASE_SUCCESS = 1L;



    / * *

* Release distributed locks

     * @paramJedis Redis client

     * @paramLockKey lock

     * @paramRequestId indicates the requestId

     * @returnCheck whether the release is successful.

* /


    public static boolean releaseDistributedLock(Jedis jedis, String lockKey, String requestId) {



        String script = "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end";

        Object result = jedis.eval(script, Collections.singletonList(lockKey), Collections.singletonList(requestId));



        if (RELEASE_SUCCESS.equals(result)) {

            return true;

        }

        return false;



    }



}

Copy the code

Code explanation:

Through the Jedis client eval method and script script one line of code to solve the atomic problem in method one.

4. Interviewer: Talk about the implementation principle of distributed lock based on ZooKeeper

I:

Or an example of consumption and accumulation of points: Both event A and event B need to modify the integral at the same time. The two machines need to modify the integral at the same time. According to the correct business logic, one machine should complete the integral first and then the other machine should execute it. This ensures that A:2 -> B:2 -> A:3 -> B:3 doesn’t end up costing more and more points (I’m probably going to cry at the thought of my boss getting angry).

How to do? Use the ZooKeeper distributed lock.

After a machine receives the request, it first obtains a distributed lock on ZooKeeper (ZK creates a ZNode) and performs the operation. Then another machine tried to create the ZNode, only to find that it couldn’t because someone else had created it and had to wait until the first machine finished executing before it could get the lock.

Using the sequential node feature of ZooKeeper, if we create 3 nodes under /lock/, the ZK cluster will create the nodes in the order that they are initiated. The nodes are /lock/0000000001, /lock/0000000002, /lock/0000000003. The last digit is incremented, and the node name is done by Zk.

There is also a type of ZK node called a temporary node, which is created by a client and automatically deleted when the client disconnects from the ZK cluster. EPHEMERAL_SEQUENTIAL is a temporary sequential node.

According to whether the node in ZK exists, it can be used as the lock state of distributed lock to realize a distributed lock. The following is the basic logic of distributed lock:

  1. The client calls the create() method to create a temporary sequence node named “/ dsm-locks /lockname/lock-“.
  2. The client calls the getChildren(” lockName “) method to get all the created child nodes.
  3. After the client obtains the path of all the child nodes, if it finds that the node it created in Step 1 is the node with the smallest sequence number, it will check whether the sequence number it created is the first. If the sequence number is the first, then the client is considered to have obtained the lock, and no other clients have obtained the lock in front of it.
  4. If the created node is not the smallest required of all nodes, then the largest node that is smaller than the sequence number of the node that you created is monitored and waits. Until the next time the monitored child node changes, the child node is acquired to determine whether to obtain the lock.

The lock release process is relatively simple, just delete the child node created by yourself, but you still need to consider exceptions such as node deletion failure.

Interviewer: What are the advantages and disadvantages of ZK and Reids?

Say first Reids:

  1. Between Rdis guarantee eventual consistency, only copy the data replication is asynchronous (Set is written, the Get is read, Reids cluster separation architecture is generally, speaking, reading and writing, and the presence of master-slave synchronization delay), in the wake of the master-slave switch there may be some data without replication may be lost in the past the lock, so the strong consistency of the business is not recommended use Reids, Zk is recommended.
  2. The response time of all methods in Redis cluster is the lowest. With the increase of concurrency and business quantity, the response time will increase significantly (the influence factor of public cluster is relatively large), but the limit QPS can reach the maximum and there is almost no exception

Besides, ZK:

  1. In the ZooKeeper cluster, the locking principle is that the temporary ZooKeeper node is used. The life cycle of the temporary node ends when the Session between the Client and the cluster ends. Therefore, if a Client node has a network problem and is disconnected from the ZooKeeper cluster, the Session timeout will also result in the incorrect release of the lock (resulting in the incorrect holding of the lock by other threads). Therefore, ZooKeeper cannot guarantee the complete consistency.
  2. ZK has good stability. Response time jitter is very small and no exceptions occur. However, with the increase of concurrency and business volume, the response time and QPS will decrease significantly.

How to choose? (For reference only, based on my personal experience)

Pay close attention to indicators Redis ZK
Response time sensitivity Square root
High concurrency Square root
Need read/write lock Square root
Need fair lock Square root
An unfair lock is required Square root

prompt

To use distributed locks, one of two conditions must be met:

  1. The business itself does not require strong consistency and can accept the occasional lock being acquired repeatedly by other threads.
  2. The business itself requires strong consistency. If a lock is acquired repeatedly by mistake, a degradation plan must be in place to ensure consistency.

Regardless of ZooKeeper and Redis, in extreme cases (for example, the whole ZK cluster fails, for example, the Master of Reids fails and the Slave is not fully synchronized), the resource being locked will be repeatedly locked. The probability of this unreliability is extremely low, and mainly depends on Zk cluster and Redis cluster.

Mysql > create distributed lock

Distributed locks can also solve problems from the database

Method one:

Mysql > select * from table where lock = ‘KEY’; select * from table where lock = ‘KEY’; Handling the same KEY database ensures that only one node can be successfully inserted and all other nodes fail to be inserted.

DB distributed lock implementation: Through the uniqueness of the primary key id lock, namely the form of the lock is to insert a data in a table, the id of the data which is a distributed lock, such as when a request data insert an id of 1, other order want to insert the data concurrent requests must be after completion of a request execution to delete the id of 1 data can continue to insert, Realize the function of distributed lock.

This is a very simple way to lock and unlock, pseudo-code:

Def the lock:

Insert into locked -- table (XXX) values (XXX)

    if result == true :

        return true

    else :

        return false



Def unlock:

    exec sql: delete from lockedOrder where order_id='order_id'

Copy the code

Method 2:

An idempotent operation with a sequence number + timestamp can be thought of as a lock that will not be released.

Interviewer: Do you know the distributed locking frameworks of the big companies in the industry

Me: It’s time to show the breadth of my knowledge. This B needs to be installed

1.Google:Chubby

Chubby is a distributed coordination system that internally uses Paxos to coordinate Master and Replicas.

Chubby Lock Service is used in projects such as GFS, BigTable, etc. Its primary design goal is high reliability, not high performance.

Chubby is used as a coarse-grained lock, for example for master selection. Locks are typically held for hours or days, not seconds.

Chubby provides an API similar to that of a file system. When you create a file path on Chubby, you lock it. Chubby uses Delay and SequenceNumber to optimize the locking mechanism. Delay ensures that the client releases the lock abnormally, but Chubby still believes that the client has been holding the lock. Sequence number refers to the Sequence number that the lock holder requests from the Chubby server for a Sequence number (including several attributes). The Chubby server then sends the Sequence number to the Chubby server when it needs to use the lock. The Chubby server checks the validity of the Sequence number, including whether the number is valid.

2. Jingdong SharkLock

SharkLock is a distributed lock based on Redis implementation. Lock exclusivity is implemented by SETNX primitives, which use timeout and renewal mechanism to enforce lock release.

3. Ant Financial Sofaraft-RheakV distributed lock

RheaKV is an embedded, distributed, highly available, strongly consistent KV storage class library based on the implementation of SOFAJRaft and RocksDB.

RheaKV provides a lock interface to optimize data read and write, and provides different locking features for different storage types. RheaKV provides the WathCDog scheduler to control the automatic renewal mechanism of locks, preventing locks from being released before tasks are completed and locks from never being released, resulting in deadlocks.

4.Netflix: Curator

Curator is the client encapsulation of ZooKeeper, and the implementation of distributed lock is completely completed by ZooKeeper.

When ZooKeeper creates the EPHEMERAL_SEQUENTIAL node, it is regarded as a lock. The EPHEMERAL nature of the node ensures that the lock is forcibly released when the lock holder is disconnected from ZooKeeper. The SEQUENTIAL properties of nodes avoid the stampede effect with more locks.

conclusion

According to the two implementation methods of distributed lock, which needs to be used depends on the business scenario. If the read and write operation of the system interface is completely based on memory operation, it is obviously more appropriate to use Redis, while Mysql table lock or row lock is obviously not appropriate. The same is based on memory Redis lock and ZK lock specific choice, depending on whether there is a specific environment and the architect of which technology more understanding, the principle is to choose you know the best, the purpose is to solve the problem.

reference

Distributed locks with Redis

https://tech.meituan.com/2018/11/15/java-lock.html

To contact me

VX search [turned programmer] reply “add group”, I will pull you into the technical group. Honestly, in this group, even if you don’t talk, just reading the chat is a kind of growth. Ali/Tencent/Baidu senior engineers, Google technology gods, IBM engineers, as well as wang Zha, all the big bull, there is any do not understand into the group to ask questions.

Finally, think wang fried good article to a three even bar: attention to forwarding point praise