This article records my understanding of Golang scheduler and the method of tracking scheduler, especially an easy to ignore goroutine execution order problem, read a lot of Golang scheduler articles did not mention this point, share it together to learn, welcome exchange correction.
What is a scheduler
In order to facilitate the new contact with the operating system and high-level language students, first in plain English what is the scheduler. Scheduling is to reasonably arrange multiple programs to the limited CPU so that each program can be executed, to achieve macro concurrent execution. For example, our computer CPU only has four or even two cores, but we can run dozens of programs simultaneously on the computer, thanks to the operating system scheduler. But the operating system schedules processes and threads. Threads are simply lightweight processes, but they still need megabytes of memory per thread, and if two switched threads are in different processes, they still need to switch processes, which can make the CPU spend a lot of time on scheduling. To make better use of the CPU, Golang natively supports high concurrency through Goroutine, which is scheduled at the language level by the GO scheduler. By scheduling goroutine on threads, Golang can make better use of the CPU.
Golang’s scheduler
Golang’s scheduler is implemented in runtime. Each program we run runs a Runtime that dispatches goroutine before execution. We write our code entry in main because runtime.main calls main.main. The Golang scheduler was rewritten in 2012 and now uses a new version of the G-P-M scheduler, but let’s take a look at the old G-M scheduler first to better appreciate the power of the current scheduler.
G – M model:
Here is the G-P model of the old scheduler:
- Multiple M’s accessing a common global G queue need to be protected by mutex each time, resulting in fierce lock competition and blocking.
- Locality is poor, that is, if G1 on M1 creates G2, G2 needs to be handed over to M2 for execution, but G1 and G2 are related and better executed on the same M.
- M has McAche (memory allocation state), consumes a lot of memory and has poor locality.
- The syscall call blocks threads and wastes CPU.
G – P – M model
Later, Go language developers improved the scheduler into g-P-M model, as shown below:
- Each P has a local queue that holds the goroutine to be executed
- Each P is bound to an M, which is the entity that actually executes the goroutine in P
- Normally, M fetches G from a local queue in the bound P to execute
- The goroutine is placed in the global queue when the local queue of the M bound P is full
- M is reusable, does not need to be repeatedly destroyed or created, and has work stealing and hand off strategies to ensure efficient threading.
- When the unit of P bound to M is empty, M will steal G from the local queue of other P to execute it, i.e. work stealing; When no other P can steal G, M fetches G from the global queue to the local queue.
- When G is blocked due to syscall, M will be blocked. At this time, P will unbind M, namely hand off, and search for a new idle M. If there is no idle M, a new M will be created.
- When G is blocked by channel or network I/O, M does not block. M looks for other runnable G. When the blocked G recovers, it re-enters the runnable and enters the P queue for execution
- The McAche (memory allocation state) is located at P, so G can be scheduled across M, and there is no longer the problem of poor locality across M scheduling
- G is for preemption scheduling. Unlike an operating system that schedules threads by timeslice, the Go scheduler has no concept of timeslice. G is suspended by blocking and being preempted, and G can only be preempted when a function is called. In extreme cases, if G keeps doing an endless loop, it will hog a P and M, and the Go scheduler can do nothing.
Strange execution sequence of the Go scheduler
Feel like you have a rudimentary understanding of how the Go scheduler works? The following point out a pit for you to step on, careful! Take a look at the following code output:
func main(a) {
done := make(chan bool)
values := []string{"a"."b"."c"}
for _, v := range values {
fmt.Println("- >", v)
go func(u string) {
fmt.Println(u)
done <- true
}(v)
}
// wait for all goroutines to complete before exiting
for _ = range values {
<-done
}
}
Copy the code
Think carefully and see the answer again!
The actual data results are:
---> a
---> b
---> c
c
b
a
Copy the code
Go scheduler example code can be followed by example code to learn Golang view, continuous update, want to learn golang system students can pay attention to.
Your first reaction might be, “Shouldn’t it print A, B, C? Why is the output C, A, B?” We created three goroutines (a, B, and C) in the local queue of P, and then executed the goroutines (A, B, and C) in sequence. Why do you always execute the goroutine where C is first? This is because after three tasks are created in the same logical processor, they will be placed in the same task queue in order, but in fact, the last task will be placed in the specific next position, so it has the highest priority and is most likely to be executed first. Therefore, of the multiple tasks created in the same Goroutine, the task created last is most likely to be executed first.
This explanation comes from the reference article discussion of Goroutine Execution Order.
Method of viewing the status of the scheduler
GODEBUG This Go runtime environment variable is very powerful by passing it different key1=value1,key2=value2… In combination, the Go Runtime outputs different debugging messages, such as “schedTrace =1000” for GODEBUG, which prints the status of the Goroutine Scheduler every 1000ms. Here’s how to use Golang’s powerful GODEBUG environment variable to view the status of the Go scheduler in your current program:
Linux subsystem (WSL) for Windows10. The code for WSL construction and use is summarized in the learn-golang project and can also be found in the bird’s nest article.
func main(a) {
var wg sync.WaitGroup
wg.Add(10)
for i := 0; i < 10; i++ {
go work(&wg)
}
wg.Wait()
// Wait to see the global run queue deplete.
time.Sleep(3 * time.Second)
}
func work(wg *sync.WaitGroup) {
time.Sleep(time.Second)
var counter int
for i := 0; i < 1e10; i++ {
counter++
}
wg.Done()
}
Copy the code
Compile instructions:
go build 01_GODEBUG-schedtrace.go
GODEBUG=schedtrace=1000 ./01_GODEBUG-schedtrace
Copy the code
Results:
SCHED 0ms: gomaxprocs=4 idleprocs=1 threads=5 spinningthreads=1 idlethreads=0 runqueue=0 [4 0 4 0]
SCHED 1000ms: gomaxprocs=4 idleprocs=4 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 0 0 0]
SCHED 2007ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 0 0 6]
SCHED 3025ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 0 0 6]
SCHED 4033ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 0 0 6]
SCHED 5048ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 0 0 6]
SCHED 6079ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 0 0 6]
SCHED 7081ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 0 0 6]
SCHED 8092ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 0 0 6]
SCHED 9113ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 1 0 1]
SCHED 10129ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 1 0 1]
SCHED 11134ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 1 0 1]
SCHED 12157ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 1 0 1]
SCHED 13170ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 1 0 1]
SCHED 14183ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 1 0 1]
SCHED 15187ms: gomaxprocs=4 idleprocs=0 threads=8 spinningthreads=0 idlethreads=3 runqueue=0 [0 1 0 1]
SCHED 16187ms: gomaxprocs=4 idleprocs=2 threads=8 spinningthreads=0 idlethreads=5 runqueue=0 [0 0 0 0]
SCHED 17190ms: gomaxprocs=4 idleprocs=2 threads=8 spinningthreads=0 idlethreads=5 runqueue=0 [0 0 0 0]
SCHED 18193ms: gomaxprocs=4 idleprocs=2 threads=8 spinningthreads=0 idlethreads=5 runqueue=0 [0 0 0 0]
SCHED 19196ms: gomaxprocs=4 idleprocs=2 threads=8 spinningthreads=0 idlethreads=5 runqueue=0 [0 0 0 0]
SCHED 20200ms: gomaxprocs=4 idleprocs=4 threads=8 spinningthreads=0 idlethreads=6 runqueue=0 [0 0 0 0]
SCHED 21210ms: gomaxprocs=4 idleprocs=4 threads=8 spinningthreads=0 idlethreads=6 runqueue=0 [0 0 0 0]
SCHED 22219ms: gomaxprocs=4 idleprocs=4 threads=8 spinningthreads=0 idlethreads=6 runqueue=0 [0 0 0 0]
Copy the code
Don’t panic when you see how much output there is, it should be clear what each field means:
- SCHED 1000ms Time elapsed since the program began to run
- Gomaxprocs =4 The number of logical processors (P) used by the current program is less than or equal to the number of CPU cores.
- Idleprocs =4 Number of idle threads
- Threads =8 Total number of threads M for the current application, both executing G and idle
- Spinningthreads =0 a thread in a spin state, that is, M has no G in the local or global queue of the bound P, M does not destroy but looks around for a G to steal, which reduces the number of threads created.
- Idlethreads =3 Indicates the idlethreads
- Runqueue =0 Number of G’s in the global queue
- [0 0 0 6] the number of G’s in the local queue for each P in the local queue. My computer is quad-core so it has four PS.
The output above is good enough to know the health of our program. For detailed scheduling information for each goroutine, M, and P, you can add it to GODEBUG with scheddetail:
GODEBUG=schedtrace=1000,scheddetail=1 ./01_GODEBUG-schedtrace
Copy the code
The results are as follows:
SCHED 0ms: gomaxprocs=4 idleprocs=4 threads=7 spinningthreads=0 idlethreads=2 runqueue=0 gcwaiting=0 nmidlelocked=0 stopwait=0 sysmonwait=0
P0: status=0 schedtick=7 syscalltick=1 m=-1 runqsize=0 gfreecnt=0
P1: status=0 schedtick=2 syscalltick=1 m=-1 runqsize=0 gfreecnt=0
P2: status=0 schedtick=1 syscalltick=1 m=-1 runqsize=0 gfreecnt=0
P3: status=0 schedtick=1 syscalltick=1 m=-1 runqsize=0 gfreecnt=0
M6: p=-1 curg=-1 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=true lockedg=-1
M5: p=-1 curg=-1 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=true lockedg=-1
M4: p=-1 curg=33 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=true lockedg=-1
M3: p=-1 curg=49 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=true lockedg=-1
M2: p=-1 curg=17 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=true lockedg=-1
M1: p=-1 curg=-1 mallocing=0 throwing=0 preemptoff= locks=1 dying=0 spinning=false blocked=false lockedg=-1
M0: p=-1 curg=14 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=true lockedg=-1
G1: status=4(semacquire) m=-1 lockedm=-1
G2: status=4(force gc (idle)) m=-1 lockedm=-1
G3: status=4(GC sweep wait) m=-1 lockedm=-1
G4: status=4(sleep) m=-1 lockedm=-1
G5: status=4(sleep) m=-1 lockedm=-1
G6: status=4(sleep) m=-1 lockedm=-1
G7: status=4(sleep) m=-1 lockedm=-1
G8: status=4(sleep) m=-1 lockedm=-1
G9: status=4(sleep) m=-1 lockedm=-1
G10: status=4(sleep) m=-1 lockedm=-1
G11: status=4(sleep) m=-1 lockedm=-1
G12: status=4(sleep) m=-1 lockedm=-1
G13: status=4(sleep) m=-1 lockedm=-1
G14: status=3() m=0 lockedm=-1
G33: status=3() m=4 lockedm=-1
G17: status=3() m=2 lockedm=-1
G49: status=3() m=3 lockedm=-1
Copy the code
Code can follow the example code to learn Golang view, continuous update, want to learn golang system students can pay attention to.
References:
Dabin Go scheduler series
Also about the Goroutine scheduler
Nest Go scheduler tracking
Go scheduler details
Goroutine execution order discussion