Through the last article, I have learned about Synchronized. Its biggest characteristic is that only one thread can obtain the monitor of the object at the same time, so as to enter the Synchronized code block for execution. Other threads need to wait outside, showing a kind of mutual exclusion. But this has a very obvious problem, low efficiency, so multithreading outside waiting for you to execute, at this time we need to optimize the lock, since the form of only one thread can not be changed at a time, so we can optimize the lock, shorten the time to obtain the lock.

1. Optimistic and pessimistic locks

This question is often asked in interviews, “Please briefly talk about the optimistic lock and the pessimistic lock.” I believe that everyone who has been interviewed has basically been asked. So let’s look at what these two types of locks are. First of all, these two types of locks are not specific locks, but a strategy.

  • Pessimistic locking

    As the name implies, it is pessimistic. Every time you go to get data, you think someone else will change it, so every time you go to get data, you lock it, so that someone else will block it until it gets the lock. Have a strongExclusivity and exclusivity. Traditional relational database inside used a lot of this locking mechanism, such as row lock, table lock, read lock, write lock, etc., are in the operation before the first lock. The JavasynchronizedandReentrantLockIt’s the pessimistic lockSynchronizedIs dependent on the JVM implementation, whileReenTrantLockIs the JDK implementation, what is the difference, to put it bluntly similar to the operating system to control the implementation and the user’s own code implementation of the difference. The implementation of the former is harder to see, while the source code for the latter is available directly.ReenTrantLockYou can specify whether the lock is fair or unfair. whileSynchronizedCan only be an unfair lock. The so-called fair lock is that the line that waits first gets the lock first.

  • Optimistic locking

    As the name implies, it is very optimistic. Every time I go to get the data, I think that others will not modify it, so I will not lock it. But when I update the data, I will judge whether others have updated the data during this period, and I can use the version number and other mechanisms. Optimistic locks are suitable for multi-read applications to improve throughput. For example, if a database provides a mechanism similar to write_condition, it will provide optimistic locks. Java in Java. Util. Concurrent. Atomic package this atomic variable classes is to use the optimistic locking a way of implementation of CAS.

Usage scenarios

Optimistic locking applies to the scenario with few writes (multi-read scenario), reducing conflicts, reducing lock overhead, and increasing system throughput. Once a conflict occurs, upper-layer applications are retry, resulting in high performance loss. Therefore, optimistic locking is suitable for a large number of writes (multi-write scenario).

1.1 Implementation of optimistic lock

Optimistic locking is generally implemented by version number mechanism or CAS algorithm

  • Version number mechanism

    Generally, a version field is added to the data table to indicate the number of times the data has been modified. When the data has been modified, the version value will be increased by one. When thread A wants to update the data value, it also reads the version value. When submitting the update, it updates the version value only when the version value it just read is the same as the version value in the current database. Otherwise, the update operation is retried until the update succeeds.

  • CAS (Compare and Swap) algorithm

CAS is a well-known lock-free algorithm. Lockless programming, Synchronization of variables between multiple threads without locking or Synchronization of variables with no threads blocked, so it’s also called non-blocking Synchronization

The CAS operation contains three operands

  • Memory value V that needs to be read or written

  • The value A for comparison

  • The CAS instruction executes by changing the value of memory address V to B if and only if the value of memory address V is equal to the expected value A, otherwise nothing is done. The entire compare and replace operation is an atomic operation. (Augmentation is not an atomic operation)

    Usage scenarios

    The addition of java.util.concurrent (J.U.C) in JDK1.5 builds on CAS. Compared with synchronized, CAS is a common implementation of non-blocking algorithms. For example, the implementation classes in atomic package are almost all implemented with CAS. Let’s use AtomicInteger as an example to see how to ensure thread safety without using locks

public class AtomicInteger extends Number implements java.io.Serializable {  
    private volatile int value; 

    public final int get() {  
        return value;  
    }  

    public final int getAndIncrement() {  
        for (;;) {  
            int current = get();  
            int next = current + 1;  
            if (compareAndSet(current, next))  
                return current;  
        }  
    }  

    public final boolean compareAndSet(int expect, int update) {  
        returnunsafe.compareAndSwapInt(this, valueOffset, expect, update); }}Copy the code
  1. In the absence of locking, the value field uses a volatile primitive to ensure that data is visible across threads. This can be read directly when fetching the value of a variable. And then let’s see how plus plus I does that.

GetAndIncrement executes the CAS operation each time after the data is read from the memory and then performs the CAS operation with the result after +1. If the result is successful, the CAS operation is returned; otherwise, retry until the result is successful.

3. CompareAndSet uses JNI (Java Native Interface) to complete the operation of CPU instructions. The following should be assembly code. Because do not understand will not explain, know what meaning can leave a message to explain.

template<>
template<typename T>
inline T Atomic::PlatformXchg<4>::operator()(T exchange_value,
                                             T volatile* dest,
                                             atomic_memory_order order) const {
  STATIC_ASSERT(4 == sizeof(T));
  // alternative forInterlockedExchange __asm { mov eax, exchange_value; mov ecx, dest; xchg eax, dword ptr [ecx]; }}Copy the code
Disadvantages of CAS:

1.ABA problem: If the memory address V was first read with the value A, and the value is still A when it is ready to assign, can we say that its value has not been changed by other threads? If its value has been changed to B and later changed back to A during this period, the CAS operation will assume that it has never been changed. This vulnerability is called the “ABA” problem for CAS operations. However, as of Java1.5 the JDK’s atomic package provides a class AtomicStampedReference to address ABA issues. The compareAndSet method of this class first checks whether the current reference is equal to the expected reference, and whether the current flag is equal to the expected flag, and if all are equal, sets the reference and flag values to the given updated value atomically.

Public Boolean compareAndSet(V expectedReference,// expectedReference V expectedReference,// updated reference int expectedStamp, // Expected stamp int newStampCopy the code

GetAndIncrement = for(;;); If the CAS fails, attempts are kept. If the CAS remains unsuccessful for a long time, it may incur significant CPU overhead. 3. Atomic operations of only one shared variable can be guaranteed: when performing operations on one shared variable, we can use cyclic CAS to guarantee atomic operations, but when performing operations on multiple shared variables, cyclic CAS cannot guarantee atomic operations. However, the all-powerful JDK1.5 provides the AtomicReference class to ensure atomicity between reference objects, and you can put multiple variables into one object to perform CAS operations.

2. Lock optimization

As we all know, once locks are used in concurrency, in order to block, performance naturally degrades, because lock optimization is to improve performance in the case of blocking, to minimize the obstacles caused by locks, but it does not solve the performance problems caused by lock blocking, it is impossible. The general performance optimization problem is to reduce time and reduce their own complexity, so the lock optimization method summarized by predecessors has the following points:

  • Reduce lock holding time
  • Reduce lock granularity
  • Lock the separation
  • Lock coarsening
  • Lock elimination

2.1 Reduce lock holding time

Using a lock causes other threads to wait, and because the lock is held for less time and the scope of the lock is reduced, other threads acquire the lock more quickly, minimizing the collision time, to take the most familiar singleton example

 public synchronized static SingleDoggetInstance() {
    if(singleDog== null){
        singleDog= new SingleDog();
    }
    return singleDog;
 }

Copy the code

If each incoming thread is locked before determining whether the instance already exists, however, unnecessary locks are added. Therefore, in order to reduce the number of unnecessary locks, the following optimization is performed.

 public static Singleton getInstance() {
    if(singleDog== null) {synchronized (singledog.class) {if(singleDog== null) { singleDog= new SingleDog(); }}}return singleDog;
 }
Copy the code

We return the instance when it already exists, so we don’t need to add unnecessary locks

2.1 Reducing the lock granularity

Splitting large objects (which may be accessed by many threads) into smaller objects greatly increases parallelism and reduces lock contention. Reduce the lock competition, biased lock, lightweight lock success rate will be improved. The most obvious example is ConcurrentHashMap(although I’ve never used it for Android). The counter example is HashTable, where every method is locked. However, it seems that FaceBook has recently released a modified HashTable F14. To over the wall

This section mainly uses ConcurrentHashMap to introduce lock granularity reduction optimization. Because jdK1.7 and 1.8 are implemented in different ways, this section only introduces the segment-based lock optimization. The maximum concurrency is equal to the number of segments in the segment-based lock. JDK 1.8 further improves concurrency by eliminating the piecewise locking scheme and instead using a large array. At the same time, in order to improve the addressing performance under hash collision, Java 8 converts linked lists (addressing time complexity O(N)) to red-black trees (addressing time complexity O(log(N)) and changes the segment ReentrantLock for each segment to CAS+Synchronized when the list length exceeds a certain threshold (8) ConcurrentHashMap is similar to HashMap in principle, but the difference is that ConcurrentHashMap implements a linked list to a red-black tree and thread safety mechanism

final V putVal(K key, V value, boolean onlyIfAbsent) {
        if(key == null || value == null) throw new NullPointerException(); //1hashValue of the inthash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if(tab == null || (n = tab.length) == 0) tab = initTable(); //2. The current TAB element is null, and CAS is used to insert the valueelse if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break; // no lock when adding to empty bin } //3. capacityelse if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else{ V oldVal = null; Synchronized (f) {// synchronized (f) {if (tabAt(tab, i) == f) {
                        if (fh >= 0) {
                            binCount = 1;
                            for (Node<K,V> e = f;; ++binCount) {
Copy the code

The put of ConcurrentHashMap reduces the granularity of the lock until it is stored to the linked list (or red-black tree) of the corresponding array. For a simple example, if you use a hashtable where 100 data is stored, and the put method is synchronized, then multiple threads writing data must wait for the previous thread to write to the list of 100 data. After using ConcurrentHashMap, if the data is divided into 10 tabs on average and each tabhas 10 data, then the data should be determined on which TAB. The lock is only written to a linked list of 10 data, which reduces the granularity of the lock to 1/10 and improves the performance greatly.

2.3 lock separation

According to the nature of synchronous operation, locks are divided into read locks and write locks. Read locks are not mutually exclusive, which improves the concurrency. This one is easier, as shown in the table below

Read lock Write lock
Read lock Can be accessed inaccessible
Write lock inaccessible inaccessible

The read-write separation idea can be extended so that locks can be separated as long as the operations do not affect each other. A typical example of a LinkedBlockingQueue queue, within which the take and Put operations themselves are isolated, writing data from the tail and fetching data from the head, each holding a separate lock.

2.4 lock coarsening

In order to ensure that multi-threaded concurrent between, should try to shorten the duration of each thread holds a lock, the execution of the code block, after should immediately release the lock, so that subsequent threads to obtain resources to continue to perform as soon as possible, but a lock request, synchronization, and release, its itself also consume the system resources, adverse to the performance optimization, Similar to Java, instead of creating objects frequently, encapsulate the contents of the lock.

for(int i = 0; i < 100; I++){synchronized (this){}}for(int i = 0; i < 100; I++){}} synchronized (this){A()} synchronized (this){B()}Copy the code

2.5 lock elimination

Lock elimination is what the compiler does when it finds objects that cannot be shared at compile time. Sometimes locks are performed on code that is completely impossible to lock, because locks are not created by us, they are referenced by JDK classes and are automatically imported when we use them, so we perform locks when there is no possibility that multiple threads need to be synchronized. Under certain conditions, the system will eliminate these locks.

For example, StringBuffer is a thread-safe manipulation string class

 public static String createStringBuffer(String s1, String s2) {
        StringBuffer sb = new StringBuffer();
        sb.append(s1);
        sb.append(s2);
        return sb.toString();
    }
Copy the code

If you write the above method, because it returns a string, it does not use the inside synchronization operation when called from the outside, so the lock is meaningless unless it does

 public static StringBuffer createStringBuffer(String s1, String s2) {
        StringBuffer sb = new StringBuffer();
        sb.append(s1);
        sb.append(s2);
        return sb
    }
Copy the code

So we’re going to return a StringBuffer, so there might be multiple threads out there and we’re going to return append, and that’s where the lock is going to be. Unlock cancellation is set on the JVM parameters, of course in server mode:

-server -XX:+DoEscapeAnalysis -XX:+EliminateLocks
Copy the code

3. Lock optimization policies for VMS

Lock mechanism upgrade process: no lock -> biased lock -> Lightweight lock -> heavyweight lock

Lock optimization was introduced after JDK1.6. Synchronized was known as a heavyweight lock until 1.6, and the underlying implementation was implemented using the operating system mutex lock

3.1 no lock

None Lock No resource is locked. All threads can access and modify the same resource, but only one thread can modify the resource successfully.

The implementation principle of no lock is CAS mentioned in the previous article. Threads will constantly try to modify the shared resource. If there is no conflict, the modification will be successful; if there is a conflict, for(;;) will be implemented. Loop to continue to modify, multiple threads to modify the resource, there must be a fastest line to modify the first successful, the following thread will continue to modify after the failure of the cycle retry until all successful modification

3.2 biased locking

Biased locking is similar to a synchronized code block that has been held by a thread for a long time. The thread will automatically acquire the lock, which reduces the performance cost of the lock, just like the regular customer who frequented the restaurant, you do not speak, the boss will know to give you the “same old”.

It can be seen from the above that there is a 25Bit storage location in the Mark Word of the object header, in which 23Bit space will be allocated to store as the current frequent guest thread Id when the thread enters and exits the synchronization block no longer through CAS operation to lock and unlock. Instead, it checks whether the Mark Word stores bias locks to the current thread.

The purpose of biased locking is to reduce the execution of unnecessary lightweight locks without multithreading competition. The acquisition and release of lightweight locks depend on multiple CAS atomic instructions, while biased locking only requires one CAS atomic instruction when ThreadID is replaced.

If the ThreadID is inconsistent, it means that there is multi-thread contention, and the lock cannot be biased to one thread, then the lock will expand into lightweight locks to ensure fair competition between threads.

Biased locking is enabled by default in JDK 6 and later JVMS. Biased locking can be turned off by JVM arguments: -xx: -usebiasedLocking =false. After this is turned off, the program enters the lightweight locking state by default.

3.3 Lightweight Lock

When biased locks are accessed by multiple threads, they are upgraded to lightweight locks, and other threads try to acquire the locks in the form of spin without causing blocking, thus improving performance.

When a thread enters a block of synchronized code, if the current synchronized object is lock-free (flag bit is “01”, whether bias Lock is “0”), the virtual machine creates a Lock Record space in the stack frame of the current thread to store a copy of the Lock object’s current Mark Worder. Copies the Mark Worder data in the object header into the record

After the copy is successful, the VIRTUAL machine uses the CAS atomic instruction operation to update the object’s Mark Worder to a pointer to the Lock Record, and to point the _owner in the Lock Record to the object’s Mark Worder.

If the update succeeds, the thread owns the lock on the object, and the object’s Mark Word lock bit is set to 00, indicating that the object is in a lightweight locked state. If the lightweight lock fails to update, the virtual machine first checks whether the object’s Mark Word refers to the current thread’s stack frame. If it does, the current thread already owns the lock and can proceed directly to the synchronization block. Otherwise, multiple threads compete for the lock.

If there is currently only one waiting thread, the thread waits through spin. But when the spin exceeds a certain number, or when one thread is holding the lock, one is spinning, and a third person is calling, the lightweight lock is upgraded to the heavyweight lock.

3.4 the spin lock

When a thread fails to acquire a lock during contention, it does not immediately suspend it, but spins a few times and then tries to acquire it.

Blocking or waking up a Java thread requires the operating system to switch CPU state, which takes processor time. If synchronizing the contents of a code block is too simple, state transitions can take longer than user code execution.

So if in the thread competition, the failed lock does not immediately suspend, but to do a few spins (empty operation), and then try to get the lock, if the last thread has finished execution, you can get the lock, so as to save the thread suspension switch time, improve the system performance.

However, if the lock competition is very high and the lock is still unavailable for multiple spins, the spin lock will gradually expand into a weight station, improving the overall performance of the system.

  • In JDK1.6, -xx :+UseSpinning enable 1.6 Can close and open operations,
  • In JDK1.7, remove this parameter and change it to built-in implementation
  • If the synchronization block is too long, the spin fails, which degrades system performance
  • If the synchronization block is very short, the spin succeeds, saving the time of thread suspension switching and improving system performance

3.5 Heavyweight Locks

The Mark Word stores a pointer to the heavyweight lock and the thread waiting for the lock starts to block. This is also known as Synchronize.

Therefore, the overall process of lock upgrade is 1. If biased lock is available, try to use biased lock 2. If a biased lock gets into a race, upgrade to lightweight lock 3. If a lightweight lock fails to compete, spin 3 will be attempted. When a spin attempt fails, it is upgraded to a heavyweight lock that is suspended at the operating system level

Here’s a picture from Meituan’s article to summarize

conclusion

This article is in the learning lock reference after many articles summed up a note, after all, each person to write articles have their own style, their summary after review is also more familiar with and convenient, limited to length and personal level, if there are mistakes, but also hope to point out.

Java CAS algorithm for optimistic locking (Compare and Swap) Java concurrency problem — Optimistic lock and pessimistic lock and an implementation way of optimistic lock — CAS Java High Concurrency combat (nine) — lock optimization and notes Java efficient concurrency (four) can not say the Java “lock” matter