The lock
As I mentioned earlier, one of the most difficult aspects of concurrent programming is the problem of concurrent access control for shared resources. If synchronization is not done well, it is easy to have inconsistencies, which can lead to errors in business logic. However, if the control of shared resources is too strict, it can easily cause a great impact on performance. In concurrent programming, on the one hand, learn from Daniu’s source code exquisite ideas and structural design; On the other hand, if you have a solid grasp of concurrency fundamentals, you can easily combine design patterns and architectural ideas to create high-quality, high-concurrency, high-performance systems. Concurrent programming is very dependent on design pattern and architecture design. Therefore, concurrent programming requires high accumulation of experience and knowledge.
As mentioned earlier, shared resources can easily cause inconsistencies in concurrent access, and the solution is known as pessimistic locking and optimistic locking. Pessimistic locks are as their name suggests: pessimistic locks assume that concurrent access will inevitably lead to state inconsistencies, so resources must be locked before concurrent operations, allowing concurrent threads to serialize access one by one. Optimistic locks, on the other hand, assume that concurrent access will not cause state inconsistencies in most cases, so it can be accessed safely. In case of a problem, optimistic locks will not add lock restrictions on shared resources. This is my personal understanding, maybe straightforward is not so precise, the definition of pessimistic lock, optimistic lock can search for more accurate official explanation.
Let’s look at how pessimistic locking and optimistic locking are implemented in Java. Pessimistic Lock is known to us in Java and can be implemented in two ways: synchronized and Lock, while optimistic Lock is mainly realized through CAS operation. Here’s a general comparison between synchronized and Lock: 1. Synchronized mainly relies on the underlying implementation of JVM, while Lock is implemented through coding, and its implementation methods are quite different. 2. In general programming usage is still relatively large, but in the real concurrent programming system, Lock mode is significantly better than synchronized: a. In older JDK versions, synchronized has been optimized, and the performance differences between synchronized and Lock are no longer noticeable b.synchronized Synchronized does not support interruption and timeout. That is to say, once synchronized is blocked, it will be blocked all the time if it cannot obtain all the resources. Even if synchronized is interrupted, it will be useless, which has a great impact on the performance of concurrent systems. Lock supports interrupts and timeouts, and attempts to acquire locks, extending synchronized well, so Lock is significantly superior to synchronized in terms of flexibility
In Java. Util. Concurrent. The locks in the package there are a lot of the Lock implementation class, commonly used have already, ReadWriteLock (implementation class ReentrantReadWriteLock), ReentrantLock is very common in the use of locks. In this section, we will take ReentrantLock as an example to analyze the implementation of Lock.
ReentrantLock
A ReentrantLock is a reentrant mutex in which threads can repeatedly acquire locks they already hold. Locks are basically designed to support reentrancy, otherwise deadlocks can easily occur. Such as: if the lock B does not support reentrancy, thread A under the condition of the lock is held B again acquiring A lock B, due to not support reentrancy thread A blocked, know lock B resources were released, but the lock B resources is held by A thread, so the thread A can never be awakened by access to the lock B, this leads to A deadlock.
Already internal implementation mainly through AbstractQueuedSynchronizer class implements, AbstractQueuedSynchronizer is an abstract class, in class already has two implementation class: NonfairSync and FairSync correspond to the implementation of unfair lock and fair lock respectively. The class structure relation is as follows:
|
|
The ReentrantLock class contains a variable of type Sync. The main implementation is basically the implementation mechanism of Sync. The default construction is NonfairSync, which is not fair lock.
|
|
Principle of lock acquisition
When ReentrantLock is used, the general process is as follows: 1. Call lock() to apply for the lock resource. 2. After applying for the lock successfully, you can share the resource. 3. Unlock () is called to release the resource.
|
|
Reentrantlock. lock() is a very simple source code, call sync.lock(), which reflects the core mechanism of ReentrantLock is implemented in sync, as mentioned above, sync in ReentrantLock has two subclasses respectively for fair lock and unfair lock. Here we will first look at the non-fair lock NonfairSync implementation:
|
|
Sync’s exclusiveOwnerThread variable holds the current thread for subsequent reentrant use. Sync’s exclusiveOwnerThread variable holds the current thread for subsequent reentrant use. If the CAS operation fails, it indicates that there is a competition. The existing thread obtains the lock, but the current thread fails to obtain the lock, and needs to enter the acquire(1) branch. As you can see, the core of ReentrantLock is to determine whether it is occupied by the value of the state field.
|
|
AbstractQueuedSynchronizer acquire () logic is roughly as follows:
1. First call tryAcquire(). The purpose of this method is: A. Re-spin to obtain the lower lock to see if it succeeded, and return true on success; B. Determine whether the thread holding the lock is the current thread. If so, add 1 to state and return true to indicate that the lock was acquired successfully. C. If neither of the preceding two goals is fulfilled, false is returned, indicating that the lock fails to be obtained. Note: tryAcquire () is directly in AbstractQueuedSynchronizer throws an exception, specific invocation is implemented in subclasses NonfairSync logic, the source code is as follows:
|
|
AcquireQueued (addWaiter(node.exclusive), arg) acquireQueued(addWaiter(node.exclusive)) Its function is to append the thread that cannot obtain the lock to a bidirectional linked list, and then let the thread sleep, when the lock resource is available, it will wake up a thread from the bidirectional linked list head to compete for lock resources, its source code is as follows:
|
|
AcquireQueued () acquireQueued() after addWaiter(Node.exclusive) adds the current thread’s Node to the Queue, acquireQueued() is used to obtain the core lock logic. This method is designed to be an infinite for loop, which exits only if it is able to acquire a lock through tryAcquire(). If it does not acquire a lock, it does not loop through the loop indefinitely. Instead, it uses parkAndCheckInterrupt() to sleep the thread. A for loop is executed when the thread is awakened to see if it can acquire the lock. If the lock is obtained, the head pointer of the Queue points to the current thread and the previous head is discarded. When tryAcquire() compets for lock resources, there will be p == head judgment to determine whether the precursor node of the current thread is head node. Only the thread whose precursor is head node is qualified to call tryAcquire() to compete for lock resources. The design logic is mainly as follows: Apply for lock resources failed thread will join the Queue, in turn head to head and tail pointing to the tail, if the head is not null, then the nodes represent the thread for the possession of the lock, when this thread lock is released, it will wake up after its displacement node, and not all threads in the Queue, therefore, each time you release the lock only will wake a thread, In this way, if there are hundreds or thousands of threads, competing for lock resources will cause a great loss to system performance and effectively avoid performance storm C. This method will spin the thread to acquire the lock once more before putting it to sleep. If it fails again, it will immediately go to sleep. Let thread dormancy or comparing performance cost resources, involving a context switch, and waked up when a thread may be assigned to other CPU execution, due to caching of L1 and L2 is unique to the CPU, will reduce the cache hit ratio, the performance impact is bigger, so try not to sleep will not let the thread to sleep d. When a thread releases the lock, it will wake up the rear node of the thread in the head of the Queue. After waking up, the thread still has to compete for the lock. The object of contention is the thread that has just applied for the lock but has not entered the Queue waiting Queue. In terms of performance, if the thread applying for the lock can acquire the lock immediately, it avoids the subsequent operation of creating nodes, adding nodes to the Queue, etc., while the thread waking up from the Queue does not acquire the lock but just re-enters the sleep, which costs much less. Let them compete for lock resources together to avoid Queue waiting for threads in the Queue to be unable to acquire locks and starve to death
|
|
ReentrantLock acquires lock resources unfairly. The process of fair and unfair lock is basically the same, the only difference is that the lock logic in tryAcquire() is different as follows:
|
|
When the lock resource is available (state=0) and there is no thread waiting for the lock resource in the Queue, the cas operation will set the state from 0 to 1, indicating that the lock resource is successfully applied. Otherwise, the lock resource will be added to the end of the Queue. Compare nonfairTryAcquire() with nonfairTryAcquire(). An unfair lock obtains lock resources through CAS as soon as it is determined that lock resources are available. A fair lock applies for lock resources only when the lock resources are available. Otherwise, the thread will be appended to the end of the Queue.
Reentrantlock. lock() : reentrantLock. lock() : reentrantLock. lock() : ReentrantLock also provides lockInterruptibly(), tryLock(), tryLock(long Timeout, TimeUnit Unit), and so on. Let me give you an overview.
Already. LockInterruptibly () : According to the previous source code analysis, if the thread does not obtain the Lock, it will sleep through locksupport.park (). When the Lock resource is released, other threads call lock.unpark () to wake up the dormant thread to compete for the Lock resource. Locksupport.park () can also be used to wake up a thread in interrupt mode, but interrupt lock is used to determine if it is interrupted to wake up, then directly throw an exception, while lock() can be used to directly compete for lock resources, which is the difference between them. Specific implementation to see the source code:
|
|
Already. TryLock () : If nonfairTryAcquire() is acquired, the lock will return true. If nonfairTryAcquire() is acquired, the lock will return true. If nonfairTryAcquire() is acquired, it will return false. Instead of adding to the Sync Queue and waiting on the blocking Queue.
Reentrantlock. tryLock(long timeout, TimeUnit Unit) : returns false if the lock cannot be obtained for a period of time
|
|
conclusion
According to the source code analysis of ReentrantLock, the core of the Lock mechanism is to operate the state attribute through CAS atom. State =0 means that the Lock resource is available. To obtain the Lock is to set the state from 0 to 1 through CAS atom operation. If state is greater than 0, the CAS operation fails. That is, the lock has been occupied and cannot be obtained. If the lock fails to be acquired, the processing logic varies depending on whether it is interruptible or timeout, but it is roughly as follows: 1. Encapsulate the thread that fails to acquire the Lock into Node. On the one hand, it is necessary to build a bidirectional queue, on the other hand, it is necessary to add additional status information to Node to control the Node. Will head to tail in the order from the Queue awakened threads (will skip CANCELLED tag node, because these nodes represent the thread is invalid), note that there will only be wake up a single thread, wake up the thread only said this thread lock competitive resources, but not in the need and the new application to the Queue of threads in the lock resources for competition, This is a feature of an unfair lock, which is designed primarily for performance. If the race is successful, exit the for loop and return, otherwise continue to sleep
Finally, through a rough flow chart, the overall implementation process has a clearer understanding.
Principle of lock release
Let’s examine the process of ReentrantLock releasing lock resources. Releasing locks does not distinguish between fair and unfair. The main job is to decrease the value of state. When state equals 0, the lock is released and other threads in the Queue are awakened to acquire the lock.
|
|
Release () are implemented in AbstractQueuedSynchronizer,
|
|
In-depth analysis
ReentrantLock acquires a lock and releases a lock. The more you look at the ReentrantLock, the more you’ll be amazed at the ingenuity and concurrency of Doug Lea and the IO model we’ll talk about later. There will still be great work by Doug Lea, if you don’t know about him, do a bit of research.
To understand the process does not necessarily mean that you are completely familiar with already, do you know what he is doing so, but you don’t know what considerations behind him to do so is, after all, concurrent programming is much higher than single-threaded programming complexity, it is hard to focus on all the threads run branch process, it is very easy to cause the root cause of the bug. We’ve already looked at the addWaiter() method above, but let’s take a closer look at it and hopefully get a better understanding of concurrent programming.
The addWaiter() method does what it does: encapsulate the unlocked thread as a Node and append it to the end of the Queue, which does not have a defined data structure but is a bidirectional linked list of dequeue and entrant operations simulated by head, tail, Next, and prev. Append the current node to the end of the bidirectional list
|
|
1. Point the prev of the current node to the tail node. 2. Use the CAS atomic operation to point the tail pointer to the current node. 3. Point next of the previous tail node to the current node.
The schematic diagram is as follows:
If we reverse the three steps of the append node, first switch the tail node next to the current node, then switch the CAS atom tail pointer to the current node, then switch the cas atom tail pointer to the current node, then switch the cas atom tail pointer to the current node, then switch the cas atom tail pointer to the current node.
Tail next points to the current node, and then cas is executed to modify tail to the current node. Due to the concurrency problem of multiple threads, that is, multiple threads may apply for the lock resource at the same time. 1. After t1 thread changes next, t2 thread also changes next pointer, resulting in overwriting t1’s modified pointer. 2. If t1 thread succeeds in replacing tail with CAS, T2 thread fails to perform cas as well. T1 Because the CAS operation succeeded, the PREv pointing was modified
As you can see, the appended T1 nodes are problematic due to concurrency. Normally, Node1’s next should point to T1, but is overwritten by T2 nodes. Therefore, the 1 and 3 swap is problematic under concurrent conditions.
What if 1 and 2 were swapped, doing CAS first, then prev, and then next? First, use the CAS atomic operation to point tail to the current node, as shown in the following diagram:
The tail node is still an isolated node, prev and next are not pointed to yet, and there is no relation between the tail node and the other nodes. Then it is dangerous if other threads need to go through the bidirectional linked list, for example, the unparksucceeded () will be called when the lock is released.
|
|
There is a possibility that there will be a process that looks up from tail to head. If this process is executed, there will obviously be a problem finding nodes from tail to head, so the process of 1 and 2 switching will also be problematic under concurrent conditions. The unparksucceeded () did not find the next effective node of head from head to tail but from tail to head in the opposite direction. Logic would surely have been faster to find the next effective node of head from head to tail, but why did it go the other way? If you only look at this code, you will never see the problem. For specific reasons, you can take part in the normal flow analysis below.
I will not give an example of the wrong order, but it is almost the same. Now let’s analyze why this order is not a problem in the source code under concurrent execution. Now assume that neither thread acquires a lock at the same time, and both threads need to append to the end of the Sync Queue as follows: Thread T1 sets prev to tail, thread T2 sets prev to tail, thread T1 sets prev to tail, thread T2 sets prev to tail, thread T1 sets prev to tail, thread T2 sets prev to tail, thread T1 sets prev to tail, thread T2 sets prev to tail. Multiple threads operate on Node1 at the same time, resulting in inconsistent state. If prev is set first, there is no multi-thread concurrency problem on the node of the thread itself when the object is operated, as shown in the diagram below:
The cas atomic operation on Node1 will be performed on both t1 and T2. If the cas atomic operation on Node1 is performed on t1, one thread will succeed. If the CAS atomic operation on Node1 is performed on T1, then the next operation on Node1 will be set to T1. As the CAS operation of T1 is successful, the CAS operation of T2 thread must fail. The schematic diagram is as follows:
The enq() method uses cas+ infinite loop to ensure that t2 nodes are appended to the end of Sync Queue. Each loop retrieves the latest tail, then points T2’s prev to the latest tail, then points tail to itself via cas, and finally points next from the previous tail node to T2. In this case, the latest tail is t1. Therefore, T2 nodes will be appended to T1 nodes, which can ensure that nodes can be added normally even under high concurrency, and there will be no inconsistent status as before, as shown in the diagram below:
|
|
4. The search for the next effective node of the head (), not from head to tail but from tail to head, will be easier to explain if you can follow the logic of my analysis. Such as: T1 sets the prev to Node1 and then cas directs tail to T1. Then the Queue is structured as follows. If unparkprecursor () was executed, Node0 finds its successor to be Node1, and if Node1 is an invalid node, Node1 needs to continue to look for its rear drive node, but Node1’s next is not set and cannot be found, so it must look from tail to head.
Through an in-depth analysis of addWaiter, you’ll get a better sense of how difficult concurrent programming can be. It’s really important to be careful where you’re going, but Doug Lea’s clever design makes it easy.
At this point, a relatively complete process analysis of ReentrantLock is completed, and this section will be followed by some analysis of other Lock implementation classes and the underlying implementation mechanism of synchronized.