Web-server has a configuration, number of worker threads.

A Service usually also has a configuration, the number of worker threads.

Experienced architects know how to configure these parameters for optimal system performance: Some services are set to 2 times the number of CPU cores, some services to 8 times the number of CPU cores, and some services to 32 times the number of CPU cores.

The setting basis of “thread count” is the issue to be discussed in this article.

Is the number of worker threads as large as possible?

The answer is clearly no:

  • The number of server CPU cores is limited and the number of concurrent threads is limited. It is meaningless to set 1000 worker threads for a single-core CPU

  • Thread switching is expensive and can degrade performance if threads are switched too frequently

Is the thread occupying CPU while calling sleep()?

No, the CPU is freed for other threads that need CPU resources during hibernation.

Not only sleep, some blocking calls, such as in network programming:

  • Block Accept () and wait for the client to connect

  • Block recv() and wait for downstream packets to return

All give up CPU resources.

Does it make sense to have multiple threads for a single-core CPU?

A single core CPU,Setting multithreadingCan concurrency performance be improved?

It makes sense to use multithreading even if it is single-core, and in most cases improves concurrency:

  • Multithreaded coding can make the code more clear, for example: IO thread receiving and sending packets, Worker thread processing tasks, Timeout thread Timeout detection

  • Adding threads does not increase concurrency if a task is constantly consuming CPU resources. For example, the following code will continue to consume CPU resources and make the CPU usage reach 100% :

     while(1){ i++; }

  • Generally speaking, Worker threads do not occupy CPU for calculation all the time. Even if the CPU is single-core, adding Worker threads can improve concurrency, because other threads can continue to work when this thread is resting

How many common service threading models are there?

Understanding common service threading models is helpful to understand the principle of service concurrency. Generally speaking, there are two common service threading models on the Internet:

  • IO threads are decoupled from worker threads through task queues

  • Pure asynchronous

The first is a decoupled class model of IO threads and worker threads through queues.



As shown above, most Web-server and service frameworks use a threadlike model of “queue decoupling between IO threads and Worker threads” :

  • There are a few IO threads that listen for upstream requests and send and receive packets (producers)

  • Have one or more task queues that act as data transmission channels (critical resources) for asynchronous decoupling of IO threads and Worker threads

  • Multiple worker threads executing real tasks (consumer)

This thread model is widely used and meets most scenarios. The characteristic of this thread model is that the Worker thread is blocked to execute tasks synchronously internally, so the concurrency capacity can be increased by increasing the number of Worker threads. Today’s discussion focuses on “how many Worker threads can be set in this model to achieve the maximum concurrency”.

The second is a purely asynchronous thread model.

Without blocking, this threading model can achieve high throughput with only a small number of threads. The disadvantages of this model are:

  • If you use single-threaded mode, it is difficult to take advantage of multi-CPU multi-core

  • Programmers are more used to writing synchronous code, and the callback approach is less readable and more demanding

  • The framework is more complex and often requires the support of server side transceiver, server side queue, client side transceiver, client side queue, context management component, finite state unit, timeout management component

However, this model is not the focus of today’s discussion.

The first type of thread model is “DECOUpled from worker threads by queue”. What is the working mode of worker threads?

Knowing the working mode of worker threads is very helpful in quantifying the setting of the number of threads:



This is a typical worker thread processing process, from start processing to end processing, the processing of the task has seven steps:

(1) Take out tasks from the work queue and perform some local initialization calculations, such as HTTP protocol analysis, parameter parsing, parameter verification, etc.;

(2) Access cache to get some data;

(3) After getting the cache data, perform some local calculations, which are related to business logic;

(4) Call the downstream service through RPC to get some data, or let the downstream service to deal with some related tasks;

(5) After the RPC call, some local calculation is performed, which is related to business logic;

(6) Access DB for some data operations;

(7) After operating the database, do some finishing work, which is also local calculation and related to business logic;

Analyzing the timeline of the entire process, we find:

  • In steps 1, 3, 5, and 7 (the pink timeline in the figure above), threads need to occupy CPU for local service logic calculation

  • In steps 2, 4, and 6 (the orange timeline in the figure above), threads are in a state of waiting for the result when accessing cache, service, and DB, which does not occupy CPU. Further decomposition shows that the “waiting time” can be divided into three parts:

    2.1 requests are transmitted to cache, service, and DB downstream on the network

    2.2) Downstream cache, service, and DB perform task processing

    2.3) Cache, Service, and DB send packets back to the worker thread on the network

How to quantify and set the number of worker threads properly?

Through the above analysis, the Worker thread is in the process of execution:

  • Some computing time needs to occupy CPU

  • The other part of the wait time does not require CPU usage

Through quantitative analysis, such as log statistics, the ratio of these two parts of time in the execution process of the whole Worker thread can be calculated. For example:

  • The CPU time (pink timeline) for performing the calculation is 100ms

  • The wait time, the CPU free time (orange timeline), is also 100ms

The result is a 1:1 calculation and wait time for this thread, i.e., 50% of the time is calculated (CPU usage) and 50% of the time is spent waiting (no CPU usage) :

  • If the CPU is single-core, set it to two worker threads to make full use of the CPU and make the CPU run to 100%

  • Assuming N cores at this point, 2N worksites can make full use of the CPU and make the CPU run to N*100%

Knock knock knock!!

Here’s the verdict:

N On a core server, if the local computing time is x and the waiting time is Y, the number of working threads (threads in the thread pool) is set to N x (x+y)/x to maximize THE CPU utilization.

Generally speaking, non-CPU-intensive services (encryption and decryption, compression and decompression, search and sorting are CPU-intensive services) have bottlenecks in back-end database access or RPC calls. Local CPU computation time is very small, so it can improve throughput by setting up dozens or hundreds of worker threads.

Did you GET new skills?

Consider:

What is the number of threads set in your company?

Voice-over: Randomly set a 200?