preface

Interview rocket, work screw, I believe we have a deep experience of this, although usually do work are CRUD, but the interview must ask you principle, for example – do you understand the GMP model?

Origin of the GMP model

Multithreading in concurrent scenarios

I believe that when you use Go programming, the most impressive thing should be the concurrency function of Go, and the reason why Go can support concurrency so well is because of the relationship between the scheduler model GMP.

Before we get to GMP, we need to understand some of the differences between processes, threads, and coroutines, which can be found here

We know that operating systems on the market are basically multi-threaded/multi-process, but there is a problem with such a structure, that is, there is the overhead of switching between threads. For example, The Linux system has the same attitude towards threads and processes, how long it takes for processes to switch, and the same is true for threads.

In this case, when the number of threads increases, there will be a lot of extra overhead, which will not make perfect use of the CPU, as if we are using 100% of a CPU. In fact, it may be the case that the actual utilization is only 60%, and the remaining 40% is the additional overhead caused by thread switching.

At the same time, for the current high concurrency scenario, it is not feasible to create a thread for each task to execute, because in the 32-bit system, the occupation of a process is 4G, and the memory occupation of a thread is almost 4M. If the number of threads is too many, the memory will be directly occupied.

Improved coroutines

The concept of coroutine was developed to solve some of the drawbacks of multi-threaded scenarios.

In a thread, threads can be distinguished into kernel threads and user threads (i.e. coroutines), while the internal CPU of the operating system only cares about kernel threads, each kernel thread is bound to at least one user thread (coroutines).

So, is it possible that there are other types of bindings? Of course there are

1:1 binding

This binding is the simplest and most primitive binding case. Each coroutine is bound to a thread through a scheduler

The biggest advantage of this method is the implementation of simple, coroutine scheduling by CPU to complete.

But the same pitfalls are also obvious, with the added overhead of handing over coroutines to the CPU, as mentioned earlier.

N: 1 binding

This is a relatively improved binding method. When N coroutines are bound to a kernel thread through the scheduler, the switching between coroutines is done by the scheduler, which reduces the overhead of CPU scheduling. However, there is also a problem in this case:

  1. If one of the coroutines bound to the thread blocks, the thread will block

  2. Because just want to when and use a single thread, not good use of CPU resources

N: M binding

This is the best binding way, perfect customer service on the above two ways of a disadvantage, this way of implementation, is in the face of concurrent scenarios, only need to focus on the optimization of the scheduler, the two are the next to talk about the DESIGN of GMP model.

Design of GMP model

In Go, the N: M method of coroutines is used, but Go does a lot of optimization in this.

Go optimization of coroutines

First of all, optimization is an optimization for coroutines. In Go, coroutines are no longer called co-routine, but goroutines. Meanwhile, the switching magnitude of the coroutine is optimized so that goroutines can be lighter when switching between each other.

When you create a Goroutine, it’s probably only a few kilobytes in size, so it takes a lot less memory.

The meaning of the GMP

Here first introduces the meaning of GMP respectively

G – goroutine

M — kernel thread

P — Logic processor

P, the logical processor, contains all the resources needed to run the goroutine. That is, every G can only be executed if it is bound to P, and every P to execute G must be bound to M.

Introduction to THE GMP Model

The part of the figure we need to care about is the part above the operating system scheduler, where by default the server has several cpus and is initialized with several kernel threads. Of course, this can also be set in code. By default, the maximum number of threads is 10000.

The other is P, and the number of P is also set by the environment variable GOMAXPROCS.

There is no absolute relationship between the number of M and the number of P. P is created at startup based on the configuration and does not exceed the maximum amount of GOMAXPROCS. However, M will execute a goroutine and block if there is no dormant thread available. A new thread is created.

Each P has a local queue of P, which is used to store goroutines. Each local queue has a storage limit. The maximum number of goroutines stored in a local queue of P is 256. A portion of the goroutine in this queue is put into the global queue (more on this later).

The global queue is a queue with a lock. Every time P wants to obtain a goroutine from the global queue, it must preempt the lock in the global queue to complete the stealing of the Goroutine.

Design of scheduler

During the design process, the scheduler mainly follows four strategies

  1. Reuse threads
  2. Using parallel
  3. preemption
  4. Global queue for goroutine

I’ll explain each strategy in detail next

Reuse threads

Two mechanisms are used to reuse threads:

  1. The work mechanism of stealing

The main effect of this mechanism is that when the local queue of P bound to the kernel thread is empty, it will steal the goroutine from the local queue of other P to execute

  1. Hand off mechanism

Under this mechanism, if in the process of execution of G1, produced a blocking operation, in order to ensure that subsequent P local queue can normal execution, the scheduler will try first wake up a sleeping thread, if there is no dormant threads in a thread queue can be used, it will be to create a new thread.

When a new thread is started, the P bound to the blocked M1 is transferred to the new thread, while the G in the original block is bound to the original thread until it completes blocking.

The new thread will take over the P and execute.

Note: When P removed, the original M1 is in a state of sleep, if when G1 block to complete, also need to continue, it will try to get a spare P, then put the G1 in the P in the queue, if there is no a spare P, M1 will enter the dormant queue thread, while the G1 will enter into global queue, Wait for some other P to pull it out and execute it.

Using parallel

As we mentioned earlier, the number of P can be set by GOMAXPROCS. When P is greater than 1, it is bound to multiple threads. In this case, all queues of P are executed in parallel

func main(a)  {
	var w sync.WaitGroup // Wait for the program to complete
        
        // If GOMAXPROCS is not set, the default value is used
        
	w.Add(2) // The number 2 indicates that there are two Goroutines running
	go func(a) {
		defer w.Done()// When the function is finished, notify the main function that when added (-1)
		for i:=0; i<3; i++ {for char:='A'; char<'A'+26; char++ { fmt.Printf("%c",char)
				// Prints uppercase letters A to Z}}} ()go func(a) {
		defer w.Done()
		for i:=0; i<3; i++ {for char:='a'; char<'a'+26; char++ { fmt.Printf("%c",char)
				// Prints lowercase letters a to z
			}
		}
	}()
	w.Wait()// Wait for goroutine to end, block if add is not 0
}
Copy the code

The result of its execution is this

aABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLbcdefghijklMNOPQRSTUVWXYZABCDEFGmnopqrstuvwxyzabcdefghijklHIJmnopqrstuvwxyzabcdeK LMNOPQfghijklmnopqrstuvwxyzRSTUVWXYZCopy the code

We can see that the printing order of the letters is alternating, that is, the printing of upper and lower case letters is carried out at the same time, but because the printing action is an IO operation, there is an alternating relationship.

Grab type goroutine

We mentioned earlier that a Go is an optimization for coroutines, and preemption is also an optimization for Goroutine

We know that when a Co-routine is executed, it consumes CPU. If the Co-routine does not release the CPU, the subsequent Co-routine will never be executed.

In Goroutine, however, none of the goroutines can occupy the CPU for more than 10ms. When the CPU exceeds 10ms, Goroutine will release the CPU, and other Goroutines will preempt the CPU for execution. This ensures fairness and prevents a Goroutine from taking up too much CPU time.

Global queue for goroutine

Can learn from the table, when the M1 is executing G1, at that time, the M2 P in the queue is empty, if it wants to get a a G to perform, it will go to steal from other priority queue P inside, if other P queue no G, at this time, it will go global queue, achieved by way of locking and unlocking global queue in G to execute.

Go func scheduling process

From the picture we can see the whole process of its execution

  1. When the program executes go func, it creates a Goroutine, and then the Goroutine finds the thread that executed it, M1, and tries to put itself in the local queue of M1’s P, and if the local queue is full, it puts itself in the global queue, Wait for some other P to pull it out and execute it.

  2. When a goroutine is placed on the local queue, P will schedule the execution of the goroutine. If there is no goroutine in the local queue, P will preferentially steal the goroutine from another queue. If there is no goroutine in the other queue, P will preferentially steal the execution of the Goroutine from another queue. So it’s going to go to the global queue.

  3. When this Goroutine starts executing, two things happen:

    3-1: If the goroutine executes successfully without blocking, it will continue to execute. If the execution is not completed within the specified time, it will be put back into the local queue for the next execution.

    3-2: When a goroutine is blocked, the scheduler tries to find the queue of dormant threads. In the queue of dormant threads, it finds a dormant thread and wakes it up. If there are no threads in the queue of dormant threads, it creates a new thread. The logical processors P and P queues that were bound to M1 will be moved to the new thread M3 to continue executing the queues.

    When the blocked goroutine is finished, M1 will try to get a new free P to continue. If not, it will be put into a sleep queue. If the number of threads has exceeded the number of GOMAXPROCS, the thread will not go to sleep, but will be destroyed.

Pay attention to the point

  1. All goroutines can only be executed in M, not by P, which is only responsible for scheduling.

  2. All M to run a G must bind a P, and their relationship is 1:1

  3. In the execution of a Goroutine, it is a loop until the P queue is empty

Go executes the lifecycle

There are two very important concepts in the Go lifecycle, M0 and G0

The introduction of M0

Here are a few features of the M0

  1. It is the main thread numbered 0 after a process is started, that is, its number is the same as the number of processes, and it is globally unique

  2. It is placed in a global variable and does not need to be allocated via head

  3. Its main function is to initialize and execute the first Goroutine, the main goroutine

  4. When main is started, it acts like any other thread and needs to grab resources

The introduction of G0

  1. Every time you start an M, you create a goroutine, which is G0, and the G0 is independent, that is, every thread created must create a G0, that is, every M will have its own G0

  2. Again, it is stored in global variables

  3. G0 does not have a trace function, just like the func that follows go

  4. One of its main functions is to schedule other goroutines, that is, the process performed on P before G to M is actually done by G0

Execute a main function

So with M0 and G0 in mind, what is the process of executing the simplest function

func main(a) {
    fmt.Print("hello world")}Copy the code

This is a very simple sentence to print hello world, so let’s go through the execution steps

  1. First, when the program starts, it creates a process, which then creates a unique thread, M0, and the M0 mentioned above creates a G0 for every M to bind to

  2. And then, depending on the configuration of the system, it creates the logical processors, which are P, the local queues of P and the global queues of P and so forth

  3. We know that main is also a Goroutine, and when everything is initialized, main G is created

  4. G0 will unbind itself from M0 in order to schedule main to execute, and M0 will then look for a free P to bind to

  5. When the binding is complete, main is placed in the local queue of P and executed

  6. This is a polling process, and the entire program will exit after main has been executed

This is the whole process of executing a simple Go function

The last

In this article, I mainly refer to liu Danbing’s video at station B. If you want to watch specific video, you can search Liu Danbing at Station B.

Finally, this is my last article in the Go theme month. I have gained a lot from more than a month’s study. Although the activity is over, I will continue to learn and understand Go in depth and insist on output more articles! I love Go!!

And finally — hands on the mug