This is the 20th day of my participation in the August More Text Challenge

cpu-intensive

CPU intensive is also called computing intensive, which means that the performance of the hard disk and memory of the system is much better than that of the CPU. In this case, the system is in a state where the CPU is Loading 100%, and the I/O can be completed in a very short time by the CPU, while the CPU has a lot of operations to process. CPU Loading is high.

– Writes a CPU-intensive example. Programs that continually increment their CPU bound tend to have very high CPU usage. This may be because the task itself does not require much access to the I/O device, or because the program is multithreaded and therefore blocks out waiting for I/O

For CPU-intensive applications, set the thread pool size to N+1. (For computationally intensive tasks, on systems with N processors, optimal efficiency is usually achieved when the thread pool size is N+1. (This extra thread ensures that CPU clock cycles are not wasted, even when computationally intensive threads are occasionally suspended due to miss failures or other reasons. Java Concurrency In Practise)

So let’s write a CPU intensive demo; Let’s try out the results for different threads

public class SrcTest {
    static int threadNum = 20;
    final static int taskNum = 200;
    static ExecutorService executorService = Executors.newFixedThreadPool(threadNum);
    static CountDownLatch endGate = new CountDownLatch(taskNum);


    public static void main(String[] args) throws InterruptedException {

        long start = System.currentTimeMillis();

        for (int i = 0; i< taskNum; i++){
            executorService.submit(cpuCal());
        }
        endGate.await();
        long end = System.currentTimeMillis();

        System.out.println("Number of thread pools:"+threadNum+"CPU cores:"+Runtime.getRuntime().availableProcessors()+"Time :"+(end-start));
    }

    public static Thread cpuCal(a){
        return new Thread(()->{
            long start = System.currentTimeMillis();
            // Write an arbitrary CPU operation
            for (int i = 0; i<10000000; i++){ StringBuffer sb =new StringBuffer();
                sb.append(i).append(",");
            }
            long end = System.currentTimeMillis();
            System.out.println("The thread ID."+Thread.currentThread().getId()+"; Elapsed time :"+(end-start)); endGate.countDown(); }); }}Copy the code

The execution result

threadNum=1

threadNum=2

threadNum=4 You can see that at the thread count of 4, four of the cpus on my 8-core machine soared

threadNum=8 Turn on CPU usageYou can see that at the thread count of 8, all 8 cpus of my 8-core machine are running at full load

threadNum=14 threadNum=20

Results the icon

Configuration of the experimental system:

Number of physicalcpu cores: 4 sysctl hw.physicalcpu number of logicalcpu cores: 8 sysctl hw.logicalcpu

With hyper-threading enabled, you have four cores and eight threads

The number of threads It all takes time Average time spent on a single task
1 63007 320
2 35828 345
4 24252 430
8 21340 700
14 22837 1100
20 22081 1920

Come to the conclusion

Through the above experimental data, we can conclude that the CPU intensive scenario; When the number of threads is equal to the number of CPU cores, the total elapsed time is minimal. In addition, when the number of threads increases, the average time of single task will increase, because the more threads there are, the more thread context switches will occur, consuming part of performance. When the number of threads > the number of CPU logic cores, the total time consumed will not be significantly reduced, but will slightly increase, and the single task time does gradually increase;

So the final conclusion: when the CPU intensive scenario; When the number of threads is equal to the number of CPU logical cores, the overall efficiency is the highest.

Of course, we can usually set the number of CPU cores +1; The reason for this 1 is that this extra thread ensures that CPU clock cycles are not wasted even when computationally intensive threads are occasionally paused due to missing failures or other reasons

IO intensive

If the application is IO intensive, set the thread pool size to 2N+1

If only one application is deployed on a server and only one thread pool is deployed, then this estimate may be reasonable, but you need to test it yourself.

Then I found an estimation formula in the document server Performance IO optimization:

Optimal number of threads = ((thread wait time + thread CPU time)/thread CPU time) * number of cpus

It can be concluded that the higher the proportion of thread wait time, the more threads are required. The higher the percentage of thread CPU time, the fewer threads are required

Experiment (omitted)

A hybrid

Hybrid tasks can be divided into IO – and CPU-intensive tasks, which can then be processed in separate thread pools. As long as the execution time of the two tasks is similar, it will be more efficient than serial execution. If the execution time of the two tasks is far from that of the two tasks, the task executed first will have to wait for the task executed later, and the final time still depends on the task executed later, plus the cost of task splitting and merging, which is not worth the loss.

Why does thread context switching cost performance

The concept of context switching

Let’s first explain what context switch is. In a multitasking system, the number of jobs is usually greater than the number of cpus. To give the user the impression that these tasks are going on simultaneously, the CPU allocates a certain amount of time to each task, saves the current task state, transfers the current running task to the ready (or suspended or deleted) state, and the other selected ready task becomes the current task. The CPU can then go back and process the previously suspended task. Context switching is one such process that allows the CPU to record and restore the state of various running programs, enabling it to complete the switching operation. During this process, the CPU stops processing the currently running program and saves the exact location of the current program for later execution

Context switching costs

Context switching causes the CPU to go back and forth between registers and the run queue. This consumption can be divided into two types

Types of loss describe
Direct loss CPU registers need to be saved and loaded, system scheduler code needs to be executed, TLB instances need to be reloaded, and CPU pipelines need to be flushed
Indirect loss Multi-core caches share data with each other

Reference documentation

What are CPU intensive and IO intensive?

Calculate the optimal number of thread pools under concurrency

How can I reasonably estimate the thread pool size?