Locking is one of the most common synchronization methods. In a high-concurrency environment, intense lock contention can lead to performance degradation.

For single-task or single-threaded applications, the main resource consumption is spent on the task itself, which does not need to maintain the consistency state between parallel data structures, nor does it need to spend time on thread switching and scheduling. For multi-threaded applications, in addition to processing functional requirements, the system also needs to maintain additional multi-threaded environment specific information, such as thread itself metadata, thread scheduling, thread context switch, etc. The reason why parallel computing can improve system performance is not because it “does less work”, but because parallel computing can carry out more reasonable task scheduling and make full use of each CPU resource.

How to improve lock performance

Reduce lock holding time

For applications that use locks for concurrent control, the lock holding time of a single thread is directly related to system performance during lock competition. The longer the lock is held, the more competitive the lock is.

To put it simply: you have 100 people to fill out the information form, but you only have one pen, and if each person doesn’t figure out how to fill it out, then each person will hold the pen for a long time, so the total time will be longer.

Therefore, the holding time of a lock is reduced to reduce mutual exclusion between threads. For example, the following code:

public synchronized void synMethod(){ method1(); mainMethod(); method2(); }Copy the code

Method1 () and method2 () are the only methods that need to be synchronized. In this code, mainMethod() is the only method that needs to be synchronized. This will lead to a longer execution time of the entire program. So we can optimize as follows:

public void synMethod(){ method1(); synchronized(this){ mainMethod(); } method2(); }Copy the code

The advantage of this is that only the mainMethod() method is synchronized, and the lock is held for a relatively short time, resulting in high concurrency. Less lock holding time can reduce the possibility of lock conflicts and improve the concurrency of the system.

Reduce lock granularity

Reducing the lock granularity is also an effective means to weaken the multi-thread lock competition. A typical use scenario for this technique is the implementation of the ConcurrentHashMap class. For those of you familiar with ConcurrentHashMap, the traditional HashTable is thread-safe because it locks the entire method. The performance of ConcurrentHashMap is high because it internally subdivides several small hashMaps, called segments. By default, a ConcurrentHashMap class can be subdivided into 16 endpoints, resulting in a 16-fold improvement in performance.

To add data to ConcurrentHashMap, instead of locking the entire HashMap, we first determine which segment should be stored in hashCode, then lock that segment, and perform put(). When multiple threads perform a PUT () operation, if the lock is not on the same segment, then parallelism can be implemented.

However, reducing lock granularity introduces a new problem: when a system needs to acquire a global lock, it consumes more resources. For example, when ConcurrentHashMap calls the size() method, locks are required for or for all of the child segments. In the case of high concurrency, ConcurrentHashMap still performs worse than synchronous HashMap, even though the size() method actually uses a lockless summation first and only tries it if it fails.

Reducing the lock granularity is to reduce the range of locked objects, thus reducing the possibility of lock conflicts and improving the concurrency capability of the system

Replace the exclusive lock with a read-write lock

Read/write separation locks can effectively reduce lock contention and improve system performance. For example, three threads A1, A2, and A3 perform write operations, and three threads B1, B2, and B3 perform read operations. If a reentrenter lock or internal lock is used, all read operations, read operations, and write operations are serial. However, since read operations do not compromise data integrity, this wait is not reasonable.

Therefore, you can use the read-write separation lock ReadWriteLock to improve system performance. The following is an example:

 

Lock coarsening

In general, in order to ensure effective concurrency between multiple threads, each thread is required to hold the lock for as short a time as possible. After the use of common resources, the lock should be released immediately. Only in this way, other threads waiting on the lock can obtain resources to perform tasks as soon as possible.

Examples of errors:

public void synMethod(){
    synchronized(this){
        method1();    }    synchronized(this){
        method2();    }}
Copy the code

After the optimization:

public void synMethod(){
    synchronized(this){
        method1();        method2();    }}
Copy the code

In particular, watch out for lock coarsening in loops

Examples of errors:

public void synMethod(){ for (int i = 1; i < n; i++) { synchronized(lock){ //do sth ... }}}Copy the code

After the optimization:

synchronized(lock){ for (int i = 1; i < n; i++) { //do sth ... }}Copy the code

Lock optimization by the JVM

 

Biased locking

Lock bias is an optimization method for lock operation. The main idea: if a thread acquires a lock, the lock is in bias mode. When the thread releases the lock, the next time it requests another thread, it does not need to do any synchronization. This saves a lot of lock request-related operations.

However, this will not work well in a highly competitive situation, because in a highly competitive situation, it is most likely that a different thread will request each time, so the bias mode will fail, so it is better not to enable bias locking. Biased locking can be enabled using the JVM parameter -xx :+UseBiasedLocking.

Lightweight lock

If biased locking fails, the virtual machine does not immediately suspend the thread, but uses an optimization called lightweight locking. Lightweight locks are also convenient to operate by simply pointing the object head to the head of the thread stack that holds the lock to determine whether a thread holds the object lock. If the thread succeeds in obtaining the lightweight lock, it can successfully enter the critical area. If the lightweight lock fails to be locked, it means that other threads have grabbed the lock first, and the current thread’s lock request will swell to the heavyweight lock.

spinlocks

After lock inflation, the virtual machine does a last-ditch effort to keep threads from actually hanging at the operating system level — spin locking. The current thread can’t acquire the lock for the time being, but simply suspending the thread is not worth the cost, so the virtual machine makes the current thread do several empty loops, and after several loops, if the lock can be obtained, the critical section is successfully entered.

Heavyweight lock

If the spin fails to acquire the lock, the thread is actually suspended at the operating system level, upgrading to a heavyweight lock

Lock elimination

When a Java VIRTUAL machine is JIT compiled, it scans the running context to remove locks that cannot compete for shared resources. Lock elimination saves meaningless lock requests.

public String[] createArrays() {
    Vector<Integer> vector = new Vector<>();
    for (int i = 1; i < 100; i++) {
        vector.add(i);
    }    return vector.toArray(new String[]{});
}
Copy the code

In the above code, because vector is defined in createArrays(), it is a local variable that is allocated in the thread stack and is thread private, there is no resource contention. All lock synchronization within the Vector is unnecessary, and if the virtual machine detects this, it will remove unnecessary locks.

One of the key techniques involved in lock elimination is escape analysis, which is to observe whether a variable will escape from a certain scope. In the above example, the vector does not escape the createArrays() function, so the virtual machine removes the lock from the variable. If createArrays() returns not a String array but a vector itself, then the vector is considered to have escaped the current function and will be accessed by another thread. For example:

public Vector<Integer> createList() {
    Vector<Integer> vector = new Vector<>();
    for (int i = 1; i < 100; i++) {
        vector.add(i);
    }    return vector;
}
Copy the code

ThreadLocal

In addition to controlling access to resources, we can add resources to keep all objects thread-safe. To put it simply: if we need 100 people to fill out the information form, we can assign 100 pens to them, one for each of them, and the filling speed will be greatly increased.

 

The code above, if there is no synchronization control Java will appear. Lang. A NumberFormatException: multiple points and Java lang. A NumberFormatException: For input string: Exception, because SimpleDateFormat is not thread-safe unless locked. The answer is to use ThreadLocal, which assigns a SimpleDateFormat to each thread.

 

Assigning different objects to each thread requires that ThreadLocal act as a simple container at the application level

How ThreadLocal works

Set () method:

public void set(T value) { Thread t = Thread.currentThread(); ThreadLocalMap map = getMap(t); if (map ! = null) map.set(this, value); else createMap(t, value); }Copy the code

We get the current thread object, then use getMap() to get the thread’s ThreadLocalMap, and store the value into ThreadLocalMap. ThreadLocalMap can simply be thought of as a Map, where key is the current thread object and value is the desired value.

The get () method:

public T get() { Thread t = Thread.currentThread(); ThreadLocalMap map = getMap(t); if (map ! = null) { ThreadLocalMap.Entry e = map.getEntry(this); if (e ! = null) { @SuppressWarnings("unchecked") T result = (T)e.value; return result; } } return setInitialValue(); }Copy the code

Get the ThreadLocalMap of the current thread, and then get the actual data inside by using itself as the key

If you want to recycle objects in a timely manner, you should use the threadlocal.remove () method to remove this variable. Otherwise, if some large objects are set in a ThreadLocal and are not recycled in a timely manner, you may cause a memory leak.

unlocked

There are optimistic And pessimistic locks, And locking without is an optimistic strategy, which uses a technique called Compare And Swap (CAS) to identify thread conflicts. Once a conflict is detected, the current operation is retried until there are no conflicts.

Compare and exchange

The algorithm process of CAS is as follows: it contains three parameters CAS (V,E,N), where V represents the variable to be updated,E represents the expected value, and N represents the new value. V is set to N only if V is equal to E. Finally returns the true value of the current V. When multiple threads operate on a variable using CAS at the same time, only one will win and update successfully, and all the others will fail. Failing threads are not suspended, but are simply notified of their failure and allowed to try again, as well as aborting the operation.

Thread safe integer (AtomicInteger)

AtomicInteger is in the ATOMIC part of the JDK and packages, and can be thought of as an Integer, unlike Integer, which is mutable and thread-safe. Any operations such as modifying them are performed with CAS instructions. Here are the common methods for AtomicInteger:

Public final int getAndSet(int newValue) public Final int getAndSet(int newValue) Public final Boolean compareAndSet(int expect,int u) // If the current value is expect, Public Final int getAndDecrement() public Final int getAndDecrement() Public final int getAndAdd(int delta) public final int getAndAdd(int delta) DecrementAndSet () public final int incrementAndSet() public final int decrementAndSet() Public final int addAndGet(int delta) public final int addAndGet(int deltaCopy the code

Internally, AtomicInteger holds one core field:

private volatile int value;
Copy the code

Example:

 

As you can see, AtomicInteger is thread-safe in the case of multiple threads.

AtomicReference without locking

An AtomicReference is very similar to an AtomicInteger, except that an AtomicInteger is a wrapper around an integer, whereas an AtomicReference is a reference to a normal object, that is, it makes it thread-safe when you modify an object reference.

In general, the thread determines whether the modified object can be written correctly if the current value of the object is consistent with the expected value. However, there is a special case where the current value of the object is changed twice by another thread before it is ready to modify the new value. The last time the value is changed to the old value, the thread reassigns the value without knowing it. Take the following illustration as an example:

 

Object reference with a timestamp (AtomicStampedReference)

AtomicReference cannot solve the above problem because the state information of the object is lost in the process of modification. Therefore, as long as we can record the state value of the object in the modification process, we can solve the problem that the thread cannot correctly judge the state of the object caused by repeated modification of the object.

An AtomicStampedReference maintains both the object value and a time stamp. When an AtomicStampedReference is modified, the time stamp must be updated in addition to the data. If an AtomicStampedReference is set to an object value, both the object value and timestamp must meet the expected value. Therefore, the object is read and written back to its original value even if the timestamp changes.