Go has four core modules, which are basically all reflected in runtime, including scheduling system, GC, Goroutine and Channel. So, a thorough understanding of the essence of Go can help us understand the language!

1. Development history of Go scheduling model

  • Single-threaded scheduler (version 0.x)
    • It contains just over 40 lines of code;
    • There can only be one active thread in the program, which is composed of g-M model.
  • Multithreaded scheduler · (Version 1.0)
    • Allow multithreaded programs to run;
    • Global locking leads to serious competition.
  • Mission Stealer Scheduler · (Version 1.1)
    • The processor P is introduced to form the current G-M-P model.
    • A scheduler based on job stealing is implemented on the basis of processor P.
    • In some cases, Goroutine doesn’t give up threads, causing starvation; (Single P + idling)
    • Excessive stop-the-world (STW) garbage collection can cause programs to fail to work for long periods of time;
  • Preemptive Scheduler · (Version 1.2 to present)
    • Collaborative based preemptive scheduler – 1.2 to 1.13
      • The compiler inserts preemption check instruction when the function is called, and checks whether the current Goroutine has initiated preemption request when the function is called, so as to realize cooperative preemption scheduling.
      • Goroutine can cause the program to pause because garbage collection and recycling take up resources for a long time;
    • Signal-based preemptive scheduler – 1.14 ~ present
      • Realize true preemptive scheduling based on signal;
      • Garbage collection triggers preemption scheduling when scanning stacks;
      • Preemption time is not enough to cover all edge cases;
  • Nonuniform storage access scheduler · Proposal
    • Partitioning various resources at run time;
    • Implementation is complex and not on the agenda to this day;

The previous 1.3 history mentioned above is actually non-release of Go, so we are focusing on the go release, which is the GPM model of Go!

  • G: GOroutine, which we use in the Go programgoKeyword creation of the executor;
  • M: Machine, or worker thread, namely, the thread of a process in the traditional sense;
  • P: Processor, an artificially abstract local resource required to execute Go code. Go code can only be executed if M is associated with a P. There is no P associated with M unless it is blocked or when a system call takes too long.

Reference: Scheduling system design essentials

2. GM model

The role of P is not only queue abstraction, if understood as queue, then its status is only a subset of M, GM model, our core focus is G and M, this is also the general thread pool model! The goal is to efficiently schedule G on M!

The following is a scheduler I write in Go language, we can look at the design ideas, as well as the problems!

package main

import (
	"fmt"
	"go.uber.org/atomic"
	"os"
	"os/signal"
	"time"
)

func main(a) {

	sig := make(chan os.Signal, 0) // Listen for program signals
	signal.Notify(sig, os.Interrupt, os.Kill)
	down := make(chan struct{}, 0) // Program down machine signal

	threadNum := 2                        // The running thread
	taskQueue := make(chan func(a), 1<<20) // Task queue size limit

	addTask := func(foo func(a)) {
		select {
		case <-down:
		case taskQueue <- foo:
		}
	}

	schedule := func(a) {
		for x := 0; x < threadNum; x++ {
			go func(a) {
				for {
					select {
					case <-down:
						return
					case foo := <-taskQueue:
						foo()
					}
				}
			}()
		}
	}

	schedule() // Start the scheduler

	count := &atomic.Int64{}
	for x := 0; x < 1; x++ {
		addTask(func(a) { // Add a task in a loop
			for {
				addTask(func(a) {
					count.Add(1)})}})}// Wait for the program to finish
	select {
	case <-sig:
		close(down)
		fmt.Println("Program exit, exec CTRL + C")
	case <-time.After(time.Second):
		close(down)
		fmt.Printf("Adjust dosage: %v\n", count.Load())
		return}}Copy the code

1, the phenomenon of

1. To test the condition, the scheduler only starts two threads, one for adding tasks in a loop and the other for executing tasks in a loop

➜ go-tool git:(Master) Qualify bin/app 5078714 ➜ Go-tool git:(Master) Qualify bin/app 5043506Copy the code

2. To test the condition, the scheduler starts three threads, then two threads to execute the task and one to add the task

➜ go-tool git:(Master) Eligible bin/app 4333959 ➜ Go-tool git:(Master) Eligible bin/app 4359804Copy the code

3, continue the test, start ten threads, one add task, nine execute task

➜ go-tool git:(Master) Qualify bin/app: 1663691 ➜ Go-tool git:(Master) Qualify bin/app: 1692096Copy the code

4. Let’s add some blocking tasks

addTask(func(a) {
  count.Inc()
  time.Sleep(time.Second * 2)})Copy the code

The execution can be seen to be completely unavailable

➜ go-tool git:(Master) Qualify bin/appCopy the code

2, problem,

1. It can be seen that with the continuous increase of M, it can be found that the number of tasks is also decreasing. What is the reason? Interested students can add a Pprof to see, in fact, a large number of waiting for lock process!

2. What if my M runs a method similar to Sleep? Can my scheduler still support scheduling of this magnitude?

For how to use pprof: Add this code to the code header:

file, err := os.OpenFile("/Users/fanhaodong/go/code/go-tool/main/prof.pporf", os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0644)
iferr ! =nil {
  panic(err)
}
iferr := pprof.StartCPUProfile(file); err ! =nil {
  panic(err)
}
defer pprof.StopCPUProfile()
Copy the code

Let’s look at go Tool pprof main/prof.pporf

Showing top 10 nodes out of 36 flat flat% sum% cum% 2.45s 63.80% 63.80% 2.45s 63.80% Runtime.usleep 0.40s 10.42% 74.22% 0.40s 10.42% runtime.pthread_cond_wait 0.28s 7.29% 81.51% 0.28s 7.29% runtime.(*waitq).queuesudog 0.25s 6.51% Func2.1 0.10s 2.60% 95.05% 0.10 s 2.60% runtime. Procyield 0.07 s 1.82% 96.88% s 1.82% 0.07 runtime. (* waitq). To dequeue 0.03 0.03 s 0.78% s 0.78% 97.66% Func3 0.02s 0.52% 98.70% 1.72s 44.79% Runtime.selectGoCopy the code

You can see that the actual code execution time is only 0.17s + 0.02s and the rest of the time is blocked!

3. GPM model

1. GM model problem

1. All G in the GM model are put into a queue, so all M will compete for locks when performing tasks, and we will also compete for locks when inserting G. Therefore, the solution to this problem is to reduce the competition for single resources, that is, bucket, in fact, each thread is allocated a queue

2. In the GM model, there is no task state, only runnable. If the task is blocked, it can be completely suspended and woken up

  • The runqueue holds all runnable tasks
  • All threads perform running tasks, running
  • A running task is blocked and needs to be relinquished, blocked, and queued

2. How GPM optimizes GM (core)

1. Introduce P

The problem here is that if you have to allocate a lot of threads, the number of threads will increase and the number of queues will increase. This will also cause stress to the scheduler because it needs to traverse the queue of all threads to allocate tasks and steal tasks as we will see later!

Because we know that the maximum parallelism of the CPU actually depends on the number of CPU cores, that is, we do not need to allocate a queue for each thread, because even if we allocate them, they go there to perform scheduling, in fact, there will be a lot of blocking, because the CPU can not schedule these threads!

Go is a queue that allocates only the number of cpus. Here is the concept of P. You can understand that P is a real resource allocator, M is very light and just executes the program. M can only perform tasks (mandatory) by binding P!

The benefits of this:

  • If the thread is light, then destruction and creation are easy, which is the kernel blocking call to create the thread, as discussed later
  • CPU parallelism can be fully utilized if all resources are allocated on a fixed number of P’s

2. GPM scheduler

1, first of all, the scheduler is actually scheduling tasks in different states. Go marks different states for GO, which is basically divided into: runnable, running, block, etc., so how to fully schedule G in different states has become a problem. Then how to solve the blocking G can actually solve the problem of G scheduling!

  • inchannelSend and receive on
  • Network I/O operations
  • Blocking system call
  • Use timers or Sleep
  • Use mutexsync.Mutex

In fact, these cases can be divided into:

  • User-mode blocking
  • Kernel-mode blocks but does not suspend the current thread
  • Kernel-mode blocking suspends the current thread

2, user mode blocking, general Go rely on gopark function to achieve, the general code logic and Go scheduling is basically bound to death

Source: golang.org/src/runtime…

func gopark(unlockf func(*g, unsafe.Pointer) bool.lock unsafe.Pointer.reason waitReason.traceEv byte.traceskip int) {
	ifreason ! = waitReasonSleep { checkTimeouts()// timeouts may expire while two goroutines keep the scheduler busy
	}
	mp := acquirem()
	gp := mp.curg
	status := readgstatus(gp)
	ifstatus ! = _Grunning && status ! = _Gscanrunning { throw("gopark: bad g status")
	}
	mp.waitlock = lock
	mp.waitunlockf = unlockf
	gp.waitreason = reason
	mp.waittraceev = traceEv
	mp.waittraceskip = traceskip
	releasem(mp)
	// can't do anything that might move the G between Ms here.
	mcall(park_m)
}

// park continuation on g0.
func park_m(gp *g) {
	_g_ := getg()

	if trace.enabled {
		traceGoPark(_g_.m.waittraceev, _g_.m.waittraceskip)
	}

	casgstatus(gp, _Grunning, _Gwaiting)
	dropg()

	iffn := _g_.m.waitunlockf; fn ! =nil {
		ok := fn(gp, _g_.m.waitlock)
		_g_.m.waitunlockf = nil
		_g_.m.waitlock = nil
		if! ok {if trace.enabled {
				traceGoUnpark(gp, 2)
			}
			casgstatus(gp, _Gwaiting, _Grunnable)
			execute(gp, true) // Schedule it back, never returns.
		}
	}
	schedule() // scheduler
}
Copy the code
  • It will mark the current state of g from running -> waiting
  • Then start scheduling:schedule() methods
    • The first is to see if there are any other special cases: GC or trace
    • The second is that it is possible to fetch the global queue first (written in code occasionally, based on randomness)
    • Gets g in the p queue of the current g binding
    • Get runnable G from somewhere else (priority: from local queue -> global queue -> network rodeo -> steal G from other PS, see findrunnable method)
    • And then finally perform the awakened G
  • Finally, unpark, which puts the current goroutine in a wait state and unlocks the lock. You can make goroutine run again by calling the GoReady method.

3, In fact, for netpool niO model, in fact, the kernel call is non-blocking, so GO opened up a network roulette queue to store these blocked G, waiting for the kernel to wake up! So when will wake up, in fact, need to wait for the scheduler to schedule!

n, err := syscall.Read(fd.Sysfd, p)
iferr ! =nil {
  n = 0
  if err == syscall.EAGAIN && fd.pd.pollable() {
    if err = fd.pd.waitRead(fd.isFile); err == nil { // suspend the current g
      continue}}//...
}
Copy the code

4. If the kernel-mode blocks (the kernel-mode blocks usually suspend the thread, and the thread needs to be woken up), P can only give up the right of the thread, and then find a new thread to run P!

For new threads: if there are idle threads, a new thread will be created.

About what happens when the kernel is woken up: because of the GPM model it needs to find a P binding, so G tries to find an available P, and if there is no available P, G marks it as runnable and puts it in the global queue!

I can’t find the code about how G executes after kernel wake up. Sorry, the logic is not too clear! In fact, the question point: how to find available P, so the advantage of a fixed number of P is more controllable query time! Why not find the last bound P? Why the context switch!

5. In fact, if we understand the above, we can understand the basic scheduling model of Go

  • What is the priority of G running
  • When will P and M touch the binding
  • When will P steal G

The answer article slowly taste!

3. How to prevent G from occupying P for a long time

If one G takes too long to execute, how can other Gs be scheduled properly? This brings up two ideas about scheduling: cooperative scheduling and preemptive scheduling. The concepts of cooperative and preemptive scheduling are simple to explain: cooperative scheduling relies on the designees’ voluntary abstention; Preemptive scheduling relies on the scheduler to force the dispatcher to interrupt passively.

1. Idling code

For example, in the following code, my native version is go1.13.5

package main

import (
	"fmt"
	"runtime"
)

func main(a) {
	go func(a) {
		for {
			{
			}
		}
	}()
	runtime.Gosched() // Indicates that the main thread gives up the current thread
	fmt.Println("close")}Copy the code

2. Existing problems

Run the GOMAXPROCS=1 command to configure only one P globally

go- tool git: (master) ✗ GODEBUG schedtrace = =1 GOMAXPROCS=1 bin/app
SCHED 0ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=0 [2]
SCHED 1ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 2ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 4ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 7ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 13ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
Copy the code

You can see that the main function cannot execute! That go idling preempted the program

Remark:

  • GODEBUG=schedtrace=1Print the data once in 1ms
SCHED: debugging output flag string, indicating that this line is output from goroutine Scheduler;1Ms: that is, from the start of the program to the output of this line of log time; Gomaxprocs: P number; Idleprocs: indicates the number of P in idle state. Execution is known by the difference between gomaxprocs and IdleprocsgoThe number of P's in code; Threads: Specifies the number of OS Threads used by scheduler plus the number of threads such as Sysmon used by runtime. Spinningthreads: The number of OS threads in the spin state; Idlethread: indicates the number of OS threads in idle state. runqueue=1:goThe number of GS in the global queue that scheduler uses; [3 4 0 10] : respectively4The number of G's in the local queue of P.Copy the code

3. Upgrade 1.14+

But if I change to use version 1.14 +, are interested can use my docker mirror, can directly pull: fanhaodong/golang: 1.15.11 and fanhaodong/golang: 1.13.5

[root@647af84b3319 code]# GODEBUG=schedtrace=1 GOMAXPROCS=1 bin/app
SCHED 0ms: gomaxprocs=1 idleprocs=0 threads=2 spinningthreads=0 idlethreads=0 runqueue=0 [0]
SCHED 1ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=0 [0]
SCHED 2ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=0 [0]
SCHED 3ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 4ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 5ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 6ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 7ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 8ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 10ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
SCHED 13ms: gomaxprocs=1 idleprocs=0 threads=3 spinningthreads=0 idlethreads=1 runqueue=1 [1]
close
Copy the code

4. Memory allocation on G

First we know that G/M/P, G can be unbound from M or P, so where to put the data variable wow! This is actually escape analysis!

1. Under what circumstances will escape

package main
type Demo struct {
	Main string
}
func main(a) {
	go func(a) {
		demo := Demo{}
		go func(a) {
			test(demo)
		}()
	}()
}
func test(demo Demo) {}
Copy the code

As you can see from the output, there was no escape because the demo was copied into its own stack space

[root@647af84b3319 code]#  go build -gcflags "-N -l -m" -o bin/app deno/main2.go
# command-line-arguments
deno/main2.go:13:11: demo does not escape
deno/main2.go:6:5: func literal escapes to heap
deno/main2.go:8:6: func literal escapes to heap
Copy the code

Remark:

-gcflags “-n -l -m” in the command, -n disables optimization. -l disables inline optimization, and -m prints escape information

So let’s go ahead and change it to this

package main
type Demo struct {
	Main string
}
func main(a) {
	go func(a) {
		demo := Demo{}
		go func(a) {
			test(&demo)
		}()
	}()
}
func test(demo *Demo) {}
Copy the code

You can see that the Demo object has actually escaped to the heap! That is, there is no problem with the allocation of memory if G is executed by another M.

[root@647af84b3319 code]#  go build -gcflags "-N -l -m" -o bin/app deno/main2.go
# command-line-arguments
deno/main2.go:13:11: demo does not escape
deno/main2.go:7:3: moved to heap: demo
deno/main2.go:6:5: func literal escapes to heap
deno/main2.go:8:6: func literal escapes to heap
Copy the code

2, the for loop declares a g-memory reference problem

So you can see that the demo is actually copied to the heap! That’s the problem with g escape, same thing with the for loop

func main(a) {
	for x := 0; x < 10; x++ {
		go func(a) {
			fmt.Println(x)
		}()
	}
}
Copy the code

Execution can see that x has actually escaped to the heap, so all of your g’s refer to an object, how to solve

go✗ - tool git: (master)go build -gcflags "-l -m" -o bin/app deno/main2.go
# command-line-arguments
deno/main2.go:10:6: moved to heap: x
deno/main2.go:11:6: func literal escapes to heap
deno/main2.go:12:15: main.func1 ... argument does not escape
deno/main2.go:12:15: x escapes to heap
deno/main2.go:16:11: test demo does not escape
Copy the code

How do you solve it? It’s really easy

  • Copy a copy of data directly
for x := 0; x < 10; x++ {
  x := x // clone
  go func(a) {
    fmt.Println(x)
  }()
}
Copy the code
  • Passing parameters (recommended)
for x := 0; x < 10; x++ {
go func(x int) {
  fmt.Println(x)
}(x)
}
Copy the code

Refer to the article

Also about the Goroutine scheduler

Illustration of Go runtime scheduler

Go Language Review: From Go 1.0 to Go 1.13

The original Go language

Scheduling system design essentials

Scalable Go Scheduler Design Doc