Forget about the thread pool size/thread count, there is no formula

Many of you have probably seen the theory of a thread count setting:

CPU intensive program – Number of cores + 1
I/O intensive program – number of cores x 2

Really? Really? Does anyone really plan the number of threads according to this theory?

Small tests of thread count and CPU utilization

Despite some operating systems, computer principle, a basic theory (don’t confuse is rigorous, to be well understood) : a CPU core, per unit time can only execute one thread instruction * * so in theory, I need a thread only executes instructions, you can run a full utilization of the core.

To verify this, write an example of an endless loop:

Test environment: AMD Ryzen 5 3600, 6-core, 12-Threads

public class CPUUtilizationTest {
	public static void main(String[] args) {
		// Do nothing
		while (true) {}}}Copy the code

After running the example, take a look at the current CPU utilization:As you can see from the graph, my number 3 core utilization is already full

How many more threads should I open based on the above theory?

public class CPUUtilizationTest {
	public static void main(String[] args) {

		for (int j = 0; j < 6; j++) {
			new Thread(new Runnable() {
				@Override
				public void run(a) {
					while (true){ } } }).start(); }}}Copy the code

In this case, CPU utilization of 1/2/5/7/9/11 cores is fully utilized.What about 12 threads? Is that going to run all the cores to full utilization? The answer must be yes:What happens if I increase the number of threads in the example above to 24?As you can see from the figure above, the CPU utilization is the same as the previous step, 100% for all cores, but the load has increased from 11.x to 22.xScoutapm.com/blog/unders…), indicating that the CPU is more busy and tasks of the thread cannot be executed in a timely manner.

Modern cpus are basically multi-core, such as the AMD 3600i tested here, with 6 cores and 12 threads (hyperthreading). We can simply think of it as a 12-core CPU. So my CPU can do 12 things at once without interrupting.

If the number of threads to execute is greater than the number of cores, then it needs to be scheduled by the operating system. The operating system allocates CPU time slices to each thread and then switches them continuously to achieve the effect of “parallel” execution.

But is it really faster? As you can see from the above example, a single thread can run to full utilization of a core. If each thread is “bossy”, executing instructions without giving the CPU idle time, and the number of threads executing at the same time is greater than the number of CPU cores, the operating system will execute thread switching execution more frequently to ensure that each thread can be executed.

However, there is a cost to switching, each switching will be accompanied by register data update, memory page table update and other operations. Although the cost of a single switch is trivial compared to that of an I/O operation, if there are too many threads, threads switch too frequently, or even switch time per unit of time is longer than the execution time of the program, too many CPU resources can be wasted on context switching instead of executing the program, which is not worth the loss.

The example above is a bit extreme and would be unlikely under normal circumstances.

Most programs have some I/O operations when they run, such as reading and writing files, receiving and sending network messages, etc. These I/O operations need to wait for feedback. For example, during network read/write operations, packets need to be sent or received. During the waiting process, the thread is in the waiting state and the CPU does not work. The operating system then dispatches the CPU to execute instructions from other threads, making perfect use of the idle CPU period and increasing CPU utilization.

In the example above, the program loops over and over doing nothing, and the CPU has to keep executing instructions, leaving little free time. What happens if you insert an I/O operation, and the CPU is idle during the I/O operation? Let’s look at the results for a single thread:

public class CPUUtilizationTest {
	public static void main(String[] args) throws InterruptedException {

		for (int n = 0; n < 1; n++) {
			new Thread(new Runnable() {
				@Override
				public void run(a) {
					while (true) {// After each empty loop 100 million times, sleep 50ms, simulate I/O wait and switch
						for (int i = 0; i < 100_000_000l; i++) { 
						}
						try {
							Thread.sleep(50);
						}
						catch(InterruptedException e) { e.printStackTrace(); } } } }).start(); }}}Copy the code

Wow, core 9, the only one with utilization, is 50% utilization, which is half as low as 100% without sleep. Now adjust the number of threads to 12 and see:The CPU usage of a single core is around 60, not far from the single-thread result. The CPU usage of a single core is not full, but the number of threads is increased to 18:At this point, the single core utilization is close to 100%. Therefore, when I/O operations in the thread do not occupy CPU resources, the operating system can schedule the CPU to execute more threads at the same time.

Now increase the frequency of I/O events and let’s see, reduce the number of loops to half, 50_000_000, also 18 threads:At this point, the utilization of each core is only about 70 percent.

Summary of thread count and CPU utilization

To better understand the thread count/program behavior /CPU state relationship, the above example is a quick summary:

A single extreme thread (when continuously performing “computations” operations) can run to the maximum utilization of a single core, and a multi-core CPU can execute at most as many “extreme” threads simultaneously as the number of cores
If each thread is so “extreme” and the number of threads executing at the same time exceeds the number of cores, it will result in unnecessary switches, causing too much load and only making execution slower
Pause operations, such as I/O, occur when the CPU is idle. The operating system schedules the CPU to execute other threads to improve CPU utilization and execute more threads at the same time
The higher the frequency of I/O events, or the longer the wait/pause time, the longer the IDLE time of the CPU, the lower utilization, and the more threads the operating system can schedule the CPU to execute

Formula for thread count planning

All of this is to help you understand, but now let’s look at the definition in the book. “Java Concurrent Programming Field” introduced a formula for calculating the number of threads:

$Ncpu = number of CPU core$

$Ucpu= Target CPU usage. 0<=Ucpu<=1$

$\frac{W}{C}= Ratio of waiting time to calculation time$

If you want the program to run to the target CPU utilization, the formula for the number of threads required is:

$Nthreads=Ncpu*Ucpu*(1+\frac{W}{C})$

The formula is clear, so let’s try the example above:

If I expect a target utilization of 90% (multi-core 90), the number of threads required is:

Core number 12 * Utilization rate 0.9 * (1 + 50(sleep time)/50(cycle time 50_000_000)) ≈ 22

Now set the thread count to 22 and see what happens:CPU utilization is now around 80+, which is close to expectations, but it’s not surprising that the actual utilization is lower due to the number of threads, the overhead of some context switches, and the lack of rigor in the test cases.

To invert the formula, you can also calculate CPU utilization by the number of threads:

$Ucpu=\frac{Nthreads}{Ncpu*(1+\frac{W}{C})}$

Number of threads 22 / (number of cores 12 * (1 + 50(sleep time)/50(cycle time 50_000_000)) ≈ 0.9

While the formulas are good, it is generally difficult to get accurate wait times and computation times in real programs because the programs are too complex to just “compute.” A piece of code will have a lot of memory read and write, calculation, I/O and other complex operations, it is difficult to accurately obtain these two indicators, so it is too ideal to calculate the number of threads only by formula.

The number of threads in a real program

What is the appropriate number of threads (thread pool size) in a real application, or in some Java business systems?

Conclusion first: There is no fixed answer, first set my expectations, such as what CPU utilization, load, GC frequency and so on, and then test to adjust to a reasonable number of threads

For a normal, Springboot-based business system, the default is Tomcat container +HikariCP connection pool +G1 collector, if a multi-threaded (or thread pool) business scenario is also required to execute the business process asynchronously/in parallel.

If I plan the number of threads according to the above formula, the error will be very large. Tomcat has its own thread pool, HikariCP has its own background thread, the JVM has some compiled threads, and even G1 has its own background thread. These threads are also running on the current process, on the current host, and consume CPU resources.

Therefore, under the interference of the environment, it is difficult to accurately plan the number of threads by the formula alone, so it must be verified by testing.

The process is like this:

Check whether other processes interfere with the current host
Analyze whether there are other running or potentially running threads on the current JVM process
Set a goal
1. Target CPU utilization – What is the maximum I can tolerate my CPU going up?
2. Target GC frequency/pause time – GC frequency increases after multi-threaded execution, what is the maximum allowable frequency and pause time?
3. Execution efficiency – for example, in batch processing, how many threads do I have to open per unit of time to complete processing in time
4. …
Check the key points of the link to see if there are any bottleneck points. If there are too many threads, some nodes on the link may have limited resources, which may result in a large number of threads waiting for resources (such as three-party interface flow restriction, limited connection pool quantity, excessive pressure on the middleware to support, etc.)
Continuously increase/decrease the number of threads to test, test to the highest requirements, and finally get a “meet the requirements” number of threads **

And and and and and! The idea of the number of threads varies from scenario to scenario:

MaxThreads in Tomcat is different with Blocking AND no-blocking I/O
Dubbo is still a single connection by default. There is also a distinction between I/O threads (pool) and business threads (pool). I/O threads are generally not a bottleneck, so not too much, but business threads can easily be called a bottleneck
Redis 6.0 is also multi-threaded, but it is only I/O multi-threaded, and the “business” processing is still single-threaded

So, don’t worry about how many threads you have. There is no standard answer, you must combine the scenario, with the goal, through testing to find the most appropriate number of threads.

Some students may wonder: “There is no pressure on our system, we don’t need such an appropriate number of threads, it’s just a simple asynchronous scenario, and it doesn’t affect other functions of the system.”

Very normal, a lot of internal business systems, do not need what performance, stable use to meet the demand can be. So my recommended number of threads is: CPU cores

The appendix

Java gets the number of CPU cores

Runtime.getRuntime().availableProcessors()// Get the number of logical cores, such as 6 cores and 12 threads, then return 12
Copy the code

Linux gets the number of CPU cores

# Number of cores = Number of physical cpus X Number of cores per physical CPU
Number of logical cpus = Number of physical cpus X Number of cores per physical CPU X Number of hyperthreads

Check the number of physical cpus
cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l

# Check the number of cores per physical CPU
cat /proc/cpuinfo| grep "cpu cores"| uniq

# check the number of logical cpus
cat /proc/cpuinfo| grep "processor"| wc -l
Copy the code

Original is not easy, reprint please contact the author. If my article helped you, please like/bookmark/follow it ❤❤❤❤❤❤

Forget about the thread pool size/thread count, there is no formula

Small tests of thread count and CPU utilization

Summary of thread count and CPU utilization

Formula for thread count planning

The number of threads in a real program

The appendix

Java gets the number of CPU cores

Linux gets the number of CPU cores

Related Posts

TP3.2 – summary

Standard types in Python (immutable)

Arraylist implementation and principle