If it is a single thread, it is almost never scheduled after the CPU calls it. If the number of runnable threads is much greater than the number of cpus, the operating system will eventually schedule one of the running threads out so that other threads can use the CPU, leading to context switches.
When a contention lock is used in multithreading, when a thread is blocked waiting for a contention lock, the JVM will typically suspend the lock and allow it to be swapped out. Cpu-intensive programs will have more context switches if they block frequently.
Multithreading is necessary in some scenarios, but the added performance overhead of multithreading programming is also real as it introduces context switching to the system. So how can we optimize multithreaded context switching?
Optimization of lock contention The competition of multiple threads for lock resources will cause context switch, and the more threads are blocked by lock contention, the more frequent the context switch will be, and the higher the performance cost of the system will be. Thus, in multithreaded programming, locks are not the source of performance overhead, but contention locks are.
- Reduce lock holding time
The longer the lock is held, the more threads are waiting for the competing resource to be released. Synchronized lock resources not only introduce context switching between threads, but may also increase context switching between processes.
You can move lock-unrelated code out of the synchronized code block, especially those that are expensive and potentially blocked.
- Reduce the granularity of locks
Synchronous locking can guarantee atomicity of objects, so we can consider splitting the lock granularity into smaller ones to avoid too much competition among all threads for a lock resource. There are two specific ways:
Lock separation Different from traditional locks, read-write locks implement lock separation, that is, read-write locks are implemented by two locks: read locks and write locks. The rule is that read locks can be shared, but only one write is available.
The nice thing about this is that in multithreaded reading, reading is not mutually exclusive, reading and writing are mutually exclusive, writing is mutually exclusive. However, when the traditional exclusive lock does not distinguish read and write locks, the read and write operations are generally read and write mutually exclusive, read and write mutually exclusive, and write mutually exclusive. Therefore, in multithreaded scenarios where read is much larger than write, lock separation avoids resource contention in the case of high concurrent reads, thus avoiding context switching.
Lock segmentation When we use locks to ensure the atomicity of sets or large objects, we can consider further splitting the lock object. For example, versions of ConcurrentHashMap prior to Java1.8 used lock segmentation.
- Non-blocking optimistic locks replace competing locks
The volatile keyword is used to ensure visibility and order. Volatile reads and writes do not cause context switches and are therefore less expensive. However, volatile does not guarantee atomicity of operation variables because there is no lock exclusivity.
While CAS is an atomic if-then-act operation, CAS is a lock-free algorithm implementation that guarantees the consistency of read and write operations on a shared variable. There are three operands in the CAS operation, the memory value V, the old expected value A, and the new value B to be modified. If and only if A and V are the same, change V to B, otherwise nothing is done and the CAS algorithm will not cause A context switch. Java’s Atomic package uses the CAS algorithm to update data without additional locking.
In JDK1.6, the JVM divides Synchronized locks into biased, lightweight, biased, and heavyweight locks, and the optimization path follows the above order. JIT compilers also optimize synchronized locks by eliminating and coarsening locks when dynamically compiling synchronized blocks.
In Java, we can implement communication between threads by combining the wait() method with notify() or notifyAll() method of calling an Object. Calling wait() in a thread blocks waiting for notify() or notifyAll() from another thread. Calling notify() or notifyAll() in a thread blocks notifyAll. Other threads are notified to return from the wait() method.
The Object. Wait () method is executed before the consumer first applies for the lock. The thread suspends and blocks.
When the producer acquires the lock and notifyAll() is executed to wake up the blocked consumer thread, another context switch occurs.
The waked waiting thread needs to re-apply for the internal lock of the corresponding object while continuing to run, and the waiting thread may need to compete with other newly active threads for the internal lock, which may also lead to a context switch.
If more than one consumer thread is blocked at the same time, the notifyAll() method wakes up all the blocked threads. Some items are still out of stock, and waking up the consuming thread prematurely for those items that are not in stock can cause the thread to block again, causing unnecessary context switches.
First, we can use Object.notify() instead of Object.notifyAll() in several different consumption scenarios. Because Object.notify() only wakes up the specified thread, it does not prematurely wake up other unmet blocking threads, reducing context switching.
Second, after the producer performs object.notify ()/notifyAll() to wake up other threads, the internal lock should be released as soon as possible to prevent other threads from holding the lock processing operation for a long time. In this way, the wakened thread can not wait for the release of the lock when applying for the corresponding internal lock again.
Finally, in order to avoid long waits, we often use object.wait (long) to set the wait timeout, but the thread cannot distinguish whether it returns due to the wait timeout or is awakened by the notification thread, which causes the thread to try to acquire the lock again, adding a context switch. I recommend using the Lock Lock and Condition interface to replace wait/notify with Synchronized internal locks. Not only does this solve the above problem of object.wait (long) being indistinguishable, but it also solves the problem of threads being woken up too early.
The await, signal, and signalAll methods defined by the Condition interface are equivalent to Object.wait(), Object.notify(), and Object.notifyall (), respectively.
Do not set the number of threads that have created a multi-threaded thread pool too large, because if the total number of worker threads in the thread pool exceeds the number of processors on the system, it will cause too many context switches.
In some methods of creating a thread pool, the thread count Settings are not directly exposed to us. . For instance, with Executors newCachedThreadPool () to create a thread pool, the thread pool can reuse the internal free thread to handle the tasks submitted, if not, then create a new thread (without being limited by the MAX_VALUE), Such a thread pool can create too many worker threads if it encounters a large and time-consuming task scenario, leading to frequent context switches. Therefore, this type of thread pool is suitable only for large and short non-blocking tasks.
Using a coroutine to implement a non-blocking wait coroutine is a lighter weight than threads. In contrast to processes and threads managed by the operating system kernel, coroutines are completely controlled by the program itself, executing in user mode. Coroutines provide significant performance gains by avoiding context switches like thread switches.
Reducing Java Virtual Machine garbage collection Many JVM garbage collectors (Serial collectors, ParNew collectors) generate memory fragmentation when collecting old objects, requiring memory defragmentation, in the process of moving surviving objects. Moving a memory object means that the memory address of the object changes, so the thread needs to be paused before moving the object and woken up again after the move is complete. Thus reducing the frequency of JVM garbage collection can effectively reduce context switching.