It all starts with the CPU
You may be wondering, why do we start with cpus when we talk about multithreading? The reason is simply that without the fancy concepts, you can see the essence of the problem more clearly.
The CPU does not know the concepts of threads, processes, etc.
The CPU only knows two things:
-
Retrieves an instruction from memory
-
Execute the command and then go back to 1
You see, the CPU really does not know the concept of process, thread, etc.
The next question is where does the CPU fetch the instructions from? The answer comes from a register called Program Counter(PC for short), which is also known as Program Counter. Don’t think of registers as too mysterious here. You can simply think of registers as memory, but access is faster.
What is stored in the PC register? This is where the instruction is located in memory. What instruction? Is the next instruction that the CPU will execute.
So who sets the address of the instruction in the PC register?
The address in the PC register is automatically incremented by 1 by default, which makes sense, because most of the time the CPU executes the address one by one. When it encounters if or else, this sequence is broken, and the CPU dynamically changes the value in the PC register according to the calculation result. Then the CPU can correctly jump to the instruction that needs to be executed.
You’re smart enough to ask, well, how is the initial value set in the PC?
Before we can answer that question we need to know where do the instructions that the CPU executes come from? The instructions in memory are loaded from an executable stored on disk that is generated by the compiler. Where does the compiler generate the machine instructions from? The answer is the function we defined.
Functions are compiled to form instructions that the CPU executes, so how do we get the CPU to execute a function? Obviously we only need to find the first instruction that the function is compiled into, which is the function entry.
As you can see by now, if we want the CPU to execute a function, we simply write the address of the first machine instruction corresponding to the function into the PC register, and the function will be executed by the CPU.
You might be wondering, what does this have to do with threads?
From CPU to operating system
Section we understand the working principle of the CPU, we want the CPU to perform a particular function, then you just need to put the function corresponding to the first machine execution into PC register is ok, so even if there is no operating system, we can also make the CPU execution procedures are possible but this is a very complicated process, we need to:
-
Find a region loader in memory of the right size
-
Locate the function entry and set up the PC register for the CPU to start executing the program
These two steps are by no means easy, and if a programmer did them manually every time he ran a program, he would go crazy, so a smart programmer might want to write a program to automate them.
Machine instructions need to be loaded into memory for execution, so the start address and length of memory need to be recorded. Also, find the entry address of the function and write it to the PC register. Consider whether this requires a data structure to record this information:
struct *** { void* start_addr; int len; void* start_point; . };Copy the code
Then it’s name time.
There has to be a name for this data structure, and what information does this structure record? What does it look like when a program is loaded from disk into memory? Let’s just call it Process **. Our guiding principle is to sound mysterious, but not easy to understand. I call it the “Rule of incomprehension.”
And so the process was born.
The first function executed by the CPU is also given a name. The first function to be executed sounds more important, so let’s call it main.
The program to complete the above two steps should also have a name, according to the “do not understand the principle” this “simple” program is called the Operating System.
Thus the operating system was born, and programmers no longer had to manually load programs to run them.
Now that you have processes and an operating system, everything looks perfect.
From single core to multi-core, how to make full use of multi-core
One of the hallmarks of human life is that it goes from a single core to multiple cores.
At this point, suppose we want to write a program that utilizes multiple cores?
Some students may say, “Isn’t there a process? Why not open a few more processes?” It sounds reasonable, but there are mainly the following problems:
-
Processes take up memory space (as you saw in the previous section), and if multiple processes are based on the same executable, the contents of their memory areas are almost identical, which is obviously a waste of memory
-
Computer processing tasks may be more complex, which involves inter-process communication, because each process is in different memory address space, inter-process communication naturally needs to rely on the operating system, which increases the programming difficulty and also increases the system overhead
What to do?
From process to thread
Let me continue to carefully think about this problem, the so-called process is a region of memory, the section area, save the CPU execution runtime machine instruction and function stack information, want to let the process run, is the main function of the first machine instruction register address written to the PC, this process is up and running.
The disadvantage of a process is that there is only one entry function, the main function, so machine instructions in a process can only be executed by one CPU. Is there a way to have multiple cpus execute machine instructions in the same process?
If we can write the address of main’s first instruction to a PC register, what’s the difference between main and other functions?
The answer is no, the main function is special only because it is the first function executed by the CPU, nothing else. We can point the PC register to main, and we can point the PC register to any function.
When we point the PC register to a function other than main, a thread is born.
This frees up the idea that there can be multiple entry functions within a process, meaning that machine instructions belonging to the same process can be executed by multiple cpus at the same time.
Note that this is a different concept from a process. To create a process, we need to find a suitable area of memory to load the process, and then point the CPU’S PC register to the main function. This means that there is only one execution flow in the process.
Now, however, multiple cpus can simultaneously execute multiple entry functions belonging to that process under the same roof (the memory area occupied by the process), meaning that there can now be multiple execution flows within a process.
Always called execution flow seems a little too easy to understand, but instead of “don’t understand the principle”, let’s call it thread instead.
That’s where threads come in.
The operating system maintains A stack of information for each process, which is used to record the memory space in which the process resides. This stack of information is denoted as dataset A.
Similarly, the operating system needs to maintain a stack of information for the thread, which is used to record the entry function or stack information of the thread. This stack of data is called dataset B.
Clearly data set is less than the amount of data from A to B, unlike the process at the same time, create A thread without having to go to the memory of looking for A period of memory space, because the thread is running in the process’s address space, the address space in the program startup is created, at the same time, the thread is program created during operation (process) since the start, So this address space is already there when the thread starts running and can be used directly by the thread. This is why, among other things, the creation thread mentioned in various textbooks is faster than the creation process.
It is important to note that with the concept of threads, we only need to create multiple threads after the process is started to keep all cpus busy, which is the root of the so-called high performance, high concurrency.
It’s as simple as creating the right number of threads.
Also noteworthy is that since the memory address space of the thread sharing process, so the communication between threads without using operating system, it brings great convenience to the programmer at the same time also brought endless trouble, multithreading most of the problems are come from it’s convenient for communication between threads that are very easy to get wrong. The root cause of the error is that the CPU does not have the concept of thread when executing instructions. The mutual exclusion and synchronization problems faced by multi-threaded programming need to be solved by the programmer himself. The mutual exclusion and synchronization problems are limited to space and not detailed, and most operating system materials are detailed.
Finally, it should be noted that although the previous diagram on threads uses multiple cpus, it is not necessary to have multiple cores to use multithreading. Multiple threads can be created in the case of a single core. The reason is that threads are implemented at the operating system level, and it does not matter how many cores there are. The CPU executing machine instructions is also unaware of which thread the executing machine instructions belong to. Even in the case of only one CPU, operating system, also can let each thread by thread scheduling “and” moving forward, is the CPU time slice in the back and forth between each thread distribution, multiple threads looks like this is “and” run, but in fact any time or is it just a thread in the run.
Threads and Memory
In the previous discussion we saw the relationship between threads and the CPU, which is to point the CPU’S PC register to the entry function of the thread so that the thread can run, which is why we must specify an entry function when creating a thread. Creating a thread in any programming language is much the same:
// set the thread entry function DoSomething thread = CreateThread(DoSomething); Thread.run ();Copy the code
So what does thread have to do with memory?
We know function is executed when the data include the information such as function parameters and local variables, return addresses, and the information is stored in the stack, a thread in the process of this concept has not yet appeared only one execution flow, so there is only one stack, the end of the stack is the entry process function, which is the main function, Suppose main calls funA and funcA calls funcB, as shown:
What about threads?
A thread after a process there are multiple entry, namely, at the same time there are multiple execution flow, so there is only one execution flow process requires a stack to save runtime information, so obviously there are multiple execution flow multiple stacks will be necessary to preserve the execution flow of information, that is to say, the operating system for each thread in the process’s address space is allocated in a stack, It is critical to realize that each thread has its own stack.
It is also worth noting that thread creation consumes process memory space.
Use of threads
Now that we have the idea of threads, how do we as programmers use them?
From a lifecycle perspective, threads handle two types of tasks: long tasks and short tasks.
1. Long-lived tasks
As the name implies, is the task of survival time is very long, we often use the word, for example, for example, we are in word editing text needs to be saved on the disk, to write data on disk is a task, then a better approach is to create a special writing disk thread, the writer thread process of life cycle and word is the same, The writer thread is created whenever word is opened, and is destroyed when word is closed. This is a long task.
This scenario is ideal for creating dedicated threads to handle specific tasks, which is relatively simple.
There are long tasks, and there are short tasks.
2. Short-lived tasks
The concept is simple: a task with a short processing time, such as a network request, a database query, etc., can be quickly processed and completed in a short period of time. Therefore, short tasks are commonly seen in various servers, such as Web Server, Database Server, File Server, mail Server, etc., which is also the most common scenario for students in the Internet industry. This scenario is what we will focus on.
This scenario has two characteristics: first, the task processing time is short; Another is the sheer number of tasks.
What if you were asked to handle this type of task?
This is easy, you might think. When the server receives a request, it creates a thread to handle the task and then destroys the thread.
This method is often referred to as thread-per-request, where a thread is created for each request:
If it is a long task, this method can work well. However, for a large number of short tasks, this method is simple to implement but has several disadvantages:
-
As we’ve seen in the previous sections, threads are an operating system concept (we’re not talking about user-mode thread implementations, coroutines, etc.), so creating threads is naturally done by the operating system, which takes time to create and destroy threads
-
Each thread needs to have its own stack, so creating a large number of threads consumes too much memory and other system resources
It’s like you are a factory owner (think happy is there), there are a lot of orders in hand, each to a batch order to recruit a group of workers, the production of products is very simple, the workers will soon be able to finish processing, processing after this batch of order will take these recruits workers fired off, when there is a new order you hardship of recruit workers again, Working for 5 minutes and hiring for 10 hours is something you probably wouldn’t do if you weren’t trying to shut down your business, so a better strategy is to hire a bunch of people and feed them on the spot, processing orders when they come in and sitting around when they don’t.
That’s where thread pools come in.
From multithreading to thread pools
The concept of the thread pool is very simple, no more than is to create a number of threads, later will not be released, a task is submitted to the thread processing, there is no need to create, destroy, thread, frequently at the same time due to the number of threads in the pool is usually fixed, also won’t consume too much memory, so the idea is to reuse, controllable.
How does a thread pool work
You might ask, how do I submit tasks to a thread pool? How do these tasks get assigned to threads in the thread pool?
Obviously, queues in data structures are a natural fit for this scenario, where the producer submits the task and the thread consumes the task is the consumer. In fact, this is a classic producer-consumer problem.
Now you know why operating system courses and job interviews ask this question, because if you don’t understand the producer-consumer problem, you essentially can’t write thread pools correctly.
Due to space limitation, the blogger does not intend to explain the producer-consumer problem in detail here, refer to the operating system related materials for the answer. Here the blogger is going to talk about what a typical task submitted to a thread pool looks like.
Typically, a task submitted to a thread pool consists of two parts: 1) data to be processed; 2) Functions that process data
struct task { void* data; // The task carries the data handler handle; // How to process data}Copy the code
(Note that you can also think of structs in code as classes, which are objects.)
When the producer writes data to the queue, a thread in the thread pool will wake up. The thread retrieves the structure (or object) from the queue, takes the data in the structure (or object) as parameter, and calls the handler function:
while(true) { struct task = GetFromQueue(); Task ->handle(task->data); // process data}Copy the code
That’s the core of the thread pool.
Understanding this will give you an idea of how thread pools work.
The number of threads in the thread pool
Now that we have a thread pool, what is the number of threads in the pool?
Think about this for yourself before moving on.
If you can see this, you’re not asleep.
Too few threads in the thread pool will not make full use of the CPU, and too many threads will cause performance degradation, excessive memory usage, consumption due to thread switching, and so on. So the number of threads should be neither too many nor too few. What should it be?
To answer that question, you need to know what kinds of tasks a thread pool handles, and some of you might say well, didn’t you say there were two kinds? Long task and short task, this is from the point of view of the life cycle, then from the point of view of the resources required to deal with the task, there are also two types, this is nothing to find extraction type and. Oh no, CPU intensive and I/O intensive.
1. cpu-intensive
CPU intensive means processing tasks that do not rely on external I/O, such as scientific computing, matrix computing, and so on. In this case, CPU resources can be fully utilized as long as the number of threads and cores is approximately the same.
2. I/O intensive
This type of task may not take up much of the computing time, and most of the time is spent on things like disk I/O, network I/O, and so on.
In this case, it is a little more complicated. You need to use performance testing tools to estimate the time spent on I/O waiting, WT(wait time), and the CPU computing time, CT(Computing time). The appropriate number of threads is approximately N * (1 + WT/CT). Assuming the I/O latency and computation times are the same, you need approximately 2N threads to make full use of CPU resources. Note that this is only a theoretical value, and the exact number should be tested based on real business scenarios.
Of course, CPU utilization is not the only consideration, as the number of threads increases, memory footprint, system scheduling, number of open files, number of open sockers, number of open database links, etc.
So there’s no one-size-fits-all formula, it’s a case by case basis.
Thread pools are not a panacea
Thread pool is only a form of multithreading, so the problems that multithreading faces thread pool can not be avoided, such as deadlock problems, race condition problems and so on, about this part can also refer to the operating system related information can be answered, so the foundation is very important ah old iron people.
Best practices for thread pool use
Thread pool is a powerful weapon in the hands of programmers. Thread pool can be seen in almost every server of Internet companies. Before using thread pool, you need to consider:
-
Have a good understanding of your tasks, whether they are long or short, CPU intensive or I/O intensive, and if they are both, then it might be better to put both types of tasks in separate thread pools so that you can better determine the number of threads
-
If a task in the thread pool has I/O operations, it is important to set a timeout for that task, otherwise the thread processing the task may block forever
-
Tasks in the thread pool are best not to wait synchronously for the results of other tasks
conclusion
In this section, we start with the CPU and work our way through the common thread pools, from bottom to top, from hardware to software. Note that there are no specific programming languages in this article, and threads are not a language concept (again, not user-mode threads), but once you really understand threads and believe that you can use them in any language, you need to understand the tao and then the art.
Write in the last
Welcome to pay attention to my public number [calm as code], massive Java related articles, learning materials will be updated in it, sorting out the data will be placed in it.
If you think it’s written well, click a “like” and add a follow! Point attention, do not get lost, continue to update!!