Abstract: In a single-process system, when there are multiple threads that can change a variable at the same time, it is necessary to synchronize the variable or code block, so that the variable can be modified in a linear execution to eliminate concurrent variable modification, and synchronization is essentially achieved by locking.

This article is from the huawei cloud community “can’t use distributed lock? From scratch based on ETCD distributed lock”, original author: AOho.

Why do we need distributed locks?

In a single-process system, when there are multiple threads that can change a variable at the same time, it is necessary to synchronize the variable or code block so that the variable can be modified linearly to eliminate the concurrent modification of the variable. Synchronization is essentially done by locking. In order to realize the multiple threads in the same block of code at a time can have only one thread can perform, so need somewhere to make a mark, the mark must be each thread can be seen, when there is no tag can set the tag, the rest of the subsequent threads found has a tag is waiting for the end of the thread that owns tag synchronized code block cancel to try to set up again after marking.

In distributed environment, data consistency is always a difficult problem. Distributed environments are more complex than single-process environments. The biggest difference between distributed and stand-alone environments is that it is not multi-threaded but multi-process. Because multiple threads can share heap memory, they can simply take memory as the token storage location. Processes may not even be on the same physical machine, so you need to store tags in a place where all processes can see them.

A common scenario is the second kill scenario, where the order service deploys multiple service instances. If there are 4 seconds kill goods, the first user buys 3, the second user buys 2, ideally the first user can buy successfully, the second user prompts the purchase failure, and vice versa. However, the actual situation may be that both users get the inventory of 4, the first user buys 3, and before the inventory is updated, the second user places 2 orders and the inventory is updated to 2, resulting in business logic error.

In the scenario above, the inventory of goods is a shared variable, and in a high concurrency situation, access to resources needs to be mutually exclusive. In a stand-alone environment, such as the Java language, there are many apis for concurrent processing. However, these apis cannot be used in a distributed environment. Because distributed systems have the characteristics of multi-threading and multi-process, and are distributed on different machines, The synchronized and lock keywords lose the effect of the original lock. Relying on the apis provided by these languages alone is not enough to implement distributed locking, so we need to think of other ways to implement distributed locking.

Common locking schemes are as follows:

  • Implement distributed lock based on database

  • Distributed lock based on Zookeeper

  • Distributed lock based on cache, such as Redis, ETCD, etc

Let’s briefly introduce the implementation of these locks, and focus on the implementation of etCD lock method.

Database-based locks

There are also two ways of database based lock implementation, one is based on database table, the other is based on database exclusive lock.

Add and delete based on database tables

Add and delete based on the database table is the simplest way, first create a lock table mainly contains the following fields: method name, timestamp and other fields.

To lock a method, insert a related record into the table. It is important to note that method names have a uniqueness constraint. If multiple requests are submitted to the database at the same time, and the database guarantees that only one operation will succeed, we can assume that the thread that succeeded in the operation has acquired the lock of the method and can execute the method body content. After the command is executed, delete the record.

The above scheme can be optimized, such as the application of master and slave databases, two-way data synchronization. If the primary library fails, the application service is quickly switched to the secondary library. In addition, it can also record the host information and thread information of the current machine, so the next time to obtain the lock, the first query database, if the host information and thread information of the current machine can be found in the database, directly allocate the lock to the thread, to achieve reentrant lock.

Database based exclusive locking

We can also implement distributed locking through exclusive locking of the database. InnoDB engine based on Mysql can use the following methods to implement lock operation:

public void lock(){ connection.setAutoCommit(false) int count = 0; while(count < 4){ try{ select * from lock where lock_name=xxx for update; If (the result is not empty){// get the lock return; }}catch(Exception e){} sleep(1000); count++; } throw new LockException(); }Copy the code

Add for UPDATE to the end of the query statement, and the database adds an exclusive lock to the database table during the query. When an exclusive lock is added to a record, other threads cannot add an exclusive lock to that record. Anything else that does not acquire the lock will block on the select statement. The lock can be acquired before the timeout, or the lock is not acquired before the timeout.

The thread that obtains the exclusive lock can obtain the distributed lock. After obtaining the lock, it can execute the business logic and release the lock after executing the business.

Summary based on database lock

The above two methods are dependent on a table of the database, one is through the existence of records in the table to determine whether there is a lock, the other is through the database exclusive lock to achieve distributed lock. Advantage is direct use of the existing relational database, simple and easy to understand; The disadvantages are the overhead of operating the database, performance issues, and SQL execution timeout exceptions to consider.

Based on a Zookeeper

Zookeeper-based temporary nodes and sequence features enable distributed locking.

When a method is locked, a unique temporary ordered node is generated in the directory corresponding to the specified node on Zookeeper. To obtain the lock, you only need to determine whether the node in the ordered node is the one with the smallest serial number. When the business logic execution is done releasing the lock, it simply removes the temporary node. This method can also avoid deadlock problems caused by locks that cannot be released due to service downtime.

Netflix has an open-source Zookeeper client framework that you can use for yourself. InterProcessMutex, provided by Curator, is an implementation of a distributed lock. Acquire method acquires lock, release method releases lock. In addition, lock release, blocking lock, reentrant lock and other problems can be effectively solved.

To implement blocking locks, the client can create sequential nodes in Zookeeper and bind the listener Watch to the nodes. Once a node changes, Zookeeper notifies the client. The client can check whether the node it creates has the smallest serial number among all nodes. If so, it can obtain the lock and execute the service logic.

The Distributed lock implementation of Zookeeper also has some drawbacks. Distributed locking may not perform as well as a cache-based implementation. Because every time in the process of lock creation and release, the dynamic creation and destruction of instantaneous nodes to achieve lock function.

In addition, the creation and deletion of nodes in Zookeeper can only be performed by the Leader node, which then synchronizes data to other nodes in the cluster. In a distributed environment, network jitter inevitably occurs, which causes the session connection between the client and the Zookeeper cluster to be interrupted. In this case, the Zookeeper server deletes the temporary node thinking that the client has hung up. Other clients can obtain distributed locks, resulting in inconsistent simultaneous lock acquisition.

Implement distributed lock based on cache

Compared with the distributed locking scheme based on database, the implementation based on cache can perform better in terms of performance and access speed. And many caches can be clustered to solve a single point of problem. There are several kinds of cache-based locks, such as memcached and Redis. This article focuses on implementing distributed locks based on ETCD.

Distributed lock through ETCD TXN

Distributed locking through ETCD also needs to meet the requirements of consistency, mutual exclusion, and reliability. The transaction TXN, lease and watch listening features in ETCD can realize the distributed lock based on ETCD.

Thought analysis

The transaction features of ETCD help us achieve consistency and mutual exclusion. IF (create_REVISION = 0, create_REVISION = 0, create_revision = 0, create_revision = 0) Because the key already exists, its create_revision version number is not 0. IF the IF condition is met, then is used to perform the PUT operation, otherwise the else statement returns the result of a lock capture failure. Of course, in addition to determining IF based on whether the key was successfully created, you can also create keys with the same prefix and compare revisions of those keys to determine which request the distributed lock belongs to.

After obtaining the distributed lock, the client needs to release the lock in time if an exception occurs. So we need a lease, and we need to specify the lease time when we apply for a distributed lock. When the lease expires, the lock is automatically released, ensuring service availability. Is that enough? If a time-consuming operation is initiated by the client during service logic execution, the lease expires if the operation is not completed. As a result, other requests obtain distributed locks, causing inconsistency. In this case, the lease is renewed, that is, the lease is renewed so that the client can maintain heartbeat with the ETCD server.

The specific implementation

Based on the above analysis, we draw the flow chart of implementing etCD distributed lock, as shown below:

Etcd distributed lock based on Go language, the test code is as follows:

Func TestLock(t *testing.T) {// Client config = clientv3. config {Endpoints: []string{"localhost:2379"}, DialTimeout: If client, err = clientv3.new (config); err ! = nil { fmt.Println(err) return } // 1. Lease = clientv3.newlease (client) if leaseGrantResp, err = lease.Grant(context.todo (), 5); err ! = nil {panic(err)} leaseId = leaseGrantResp. CancelFunc = context.withcancel (context.todo ()) // 3. CancelFunc = context.withcancel (context.todo ()) CancelFunc () defer lease.Revoke(context.todo (), leaseId) if keepRespChan, err = lease.KeepAlive(CTX, leaseId); err ! = nil {panic(err)} // Go func() {for {select {case keepResp = < -keeprespchan: If keepRespChan == nil {FMT.Println(" the lease has expired ") goto END} else {// FMT.Println(" the lease has expired ") keepResp.ID) } } } END: // create transaction TXN = kv.txn (context.todo ()) //if no key exists, Then sets it, If(clientv3.Compare(clientv3.createrevision ("lock"), "=", 0)). Then(clientv3.opput ("lock", "g", Clientv3.withlease (leaseId))).else (clientv3.opget ("lock")) // Commit transaction if txnResp, err = txn.mit (); err ! = nil { panic(err) } if ! TxnResp.Succeeded {fmt.println (" Lock occupied :", String (txNresp.responses [0].getresponserange ().kvs [0].value)) return} // Execute business logic after lock, Println(" processing task ") time.sleep (5 * time.second)}Copy the code

The expected result is as follows:

=== RUN TestLock Process task: 7587848943239472601:7587848943239472601:7587848943239472601: 7587848943239472601 -- PASS: TestLock (5.10s) PASSCopy the code

In general, the implementation process of etCD distributed lock as described above is divided into four steps:

  • Client initialization and connection establishment;

  • Create lease, automatically renew lease;

  • Create transaction, acquire lock;

  • Execute the business logic and finally release the lock.

When creating a lease, you need to create a cancelable lease, mainly for release upon exit. The steps for releasing the lock are in the defer statement above. When the defer lease is closed, the key corresponding to the distributed lock will be released.

summary

This paper mainly introduces the case of distributed lock based on ETCD. First this paper introduces the background and necessity of distributed lock, distributed architecture is different from the monomer architecture, involves many services between multiple instances of the call, across processes under the condition of the programming language used to own concurrency primitives have no way to realize the consistency of the data, so the distributed lock, used to solve the operation in a distributed environment resources. Then it introduces two ways to realize distributed lock based on database: add and delete of data table and exclusive lock of database. Distributed locking can also be implemented with zooKeeper-based temporary nodes and sequential features, both of which have more or less performance and stability drawbacks.

Then this paper focuses on the implementation of distributed lock scheme based on ETCD, according to the characteristics of ETCD, using transaction TXN, lease and watch monitoring to achieve distributed lock.

In our case, the lock grab failed and the client returned directly. After the lock is released, or the client that holds the lock exits due to a failure, how can other locks acquire the lock quickly? Therefore, the above code can be improved based on the monitoring feature of Watch. Students can try it by themselves.

Click to follow, the first time to learn about Huawei cloud fresh technology ~