Most of the questions about goroutine focus on
- How is it different from a thread? How does it work?
- What’s so good about him?
- What should I pay attention to?
Wait, we may have more questions, but let’s start with the basics
Package main func hello(MSG string) {println(MSG)} func main() {go hello("hello world") // 7-8 line}Copy the code
The compiled form is:
"".main STEXT size=91 args=0x0 locals=0x28 0x0000 00000 (main.go:7) TEXT "".main(SB), ABIInternal, $40-0 0x0000 00000 (main.go:7) MOVQ (TLS), CX 0x0009 00009 (main.go:7) CMPQ SP, 16(CX) 0x000d 00013 (main.go:7) JLS 84 0x000f 00015 (main.go:7) SUBQ $40, SP 0x0013 00019 (main.go:7) MOVQ BP, 32(SP) 0x0018 00024 (main.go:7) LEAQ 32(SP), BP 0x001d 00029 (main.go:7) FUNCDATA $0, Gclocals · 33 cdeccccebe80329f1fdbee7f5874cb (SB) 0 x001d 00029 (main) go: 7) FUNCDATA $1. Gclocals · 33 cdeccccebe80329f1fdbee7f5874cb (SB) 0 x001d 00029 (main) go: 7) FUNCDATA $3, Gclocals · 9 fb7f0986f647f17cb53dda1484e0f7a (SB) 0 x001d 00029 (main) go: 8) PCDATA $2, $0 0x001d 00029 (main.go:8) PCDATA $0, $0 0x001d 00029 (main.go:8) MOVL $16, (SP) 0x0024 00036 (main.go:8) PCDATA $2, $1 0x0024 00036 (main.go:8) LEAQ "". Hello ·f(SB), AX 0x002b 00043 (main.go:8) PCDATA $2, $0 0x002b 00043 (main.go:8) MOVQ AX, 8(SP) 0x0030 00048 (main.go:8) PCDATA $2, $1 0x0030 00048 (main.go:8) LEAQ go.string."hello world"(SB), AX 0x0037 00055 (main.go:8) PCDATA $2, $0 0x0037 00055 (main.go:8) MOVQ AX, 16(SP) 0x003c 00060 (main.go:8) MOVQ $11, 24(SP) 0x0045 00069 (main.go:8) CALL runtime.newproc(SB) 0x004a 00074 (main.go:9) MOVQ 32(SP), BP 0x004f 00079 (main.go:9) ADDQ $40, SP 0x0053 00083 (main.go:9) RET 0x0054 00084 (main.go:9) NOP 0x0054 00084 (main.go:7) PCDATA $0, $-1 0x0054 00084 (main.go:7) PCDATA $2, $-1 0x0054 00084 (main.go:7) CALL runtime.morestack_noctxt(SB) 0x0059 00089 (main.go:7) JMP 0 0x0000 65 48 8b 0c 25 00 00 00 00 48 3b 61 10 76 45 48 eH.. %... H; a.vEH 0x0010 83 ec 28 48 89 6c 24 20 48 8d 6c 24 20 c7 04 24 .. (H.l$ H.l$ .. $ 0x0020 10 00 00 00 48 8d 05 00 00 00 00 48 89 44 24 08 .... H...... H.D$. 0x0030 48 8d 05 00 00 00 00 48 89 44 24 10 48 c7 44 24 H...... H.D$.H.D$ 0x0040 18 0b 00 00 00 e8 00 00 00 00 48 8b 6c 24 20 48 .......... H.l$ H 0x0050 83 c4 28 c3 e8 00 00 00 00 eb a5 .. (...... "Hello world"+0 rel 70+4 t=8 runtime.newproc+0 rel 70+4 t=8 "" 85+4 t=8 runtime.morestack_noctxt+0Copy the code
We have a goroutine in main. The background implementation is the runtime.newproc call, with the following function body
Create a new g to run fn with a siz byte parameter and place it in g's queue to run. The editor will turn a go statement into a call to that function. //go:nosplit func newproc(siz int32, fn *funcval) {// Add a pointer length from the address of fn, Argp := add(unsafe.Pointer(&fn), sys.ptrsize) // Gets the current running g gp := getg() // getCallerPC returns its caller's program counter (PC). The place used to store the address of the cell in which the next instruction resides. PC := getCallerPC (); // Create Goroutine object with g0 system stack; // Pass parameters including fn entry address, argp start address, siz length, gp (g0), Callers PC (Goroutine) Systemstack (func() {// Prototype: Func newproc1(fn *funcval, argp *uint8, narg int32, Callergp *g, callerPC uintptr) // Callerpc is the address of the GO statement that created it. The new G is put into the queue where G is waiting to run. newproc1(fn, (*uint8)(argp), siz, gp, pc) }) }Copy the code
M of Go also has two types of stacks: the systemstack, which is used to run runtime program logic; The other is the G stack, which is used to run g’s program logic. More on that later. We now know that newproc is on the system stack because we are going to create a new G to run fn.
Argp is the address where the first argument starts in the go func statement. Narg is the length of the argument. Callergp is our g0. Callerpc is the address of the next instruction after the call ends:
func newproc1(fn *funcval, argp *uint8, narg int32, callergp *g, callerpc uintptr) { _g_ := getg() if fn == nil { _g_.m.throwing = -1 // do not dump full stacks throw("go of nil func value") } _g_.m.locks++ // disable preemption because it can be holding p in a local var siz := narg siz = (siz + 7) &^ 7 // We could allocate a larger initial stack if necessary. // Not worth it: this is almost always an error. // 4*sizeof(uintreg): extra space added below // sizeof(uintreg): caller's LR (arm) or return address (x86, in gostartcall). if siz >= _StackMin-4*sys.RegSize-sys.RegSize { throw("newproc: Function arguments too large for new goroutine")} _p_ := _g_.m.p.ptr() // get a p newg := gfget(_p_) // get a newg at p // Initialization phase, If newg == nil {// if newg == nil, create a new stack newg = malg(_StackMin) // create a stack with the size of _StackMin G casgstatus(newg, _Gidle, _Gdead) // Update the newly created G from _Gidle to _Gdead. Hi == 0 {throw("newproc1: newg missing stack")} if readgStatus (newg)! = _Gdead { throw("newproc1: new g is not Gdead") } totalSize := 4*sys.RegSize + uintptr(siz) + sys.MinFrameSize // extra space in case of reads slightly beyond frame totalSize += -totalSize & (sys.SpAlign - 1) // align to spAlign sp := newg.stack.hi - totalSize spArg := sp if usesLR { // caller's LR *(*uintptr)(unsafe.Pointer(sp)) = 0 prepGoExitFrame(sp) spArg += sys.MinFrameSize } if narg > 0 { memmove(unsafe.Pointer(spArg), unsafe.Pointer(argp), uintptr(narg)) // This is a stack-to-stack copy. If write barriers // are enabled and the source stack is grey (the // destination is always black), then perform a // barrier copy. We do this *after* the memmove // because the destination stack may have garbage on // it. if writeBarrier.needed && ! _g_.m.curg.gcscandone { f := findfunc(fn.fn) stkmap := (*stackmap)(funcdata(f, _FUNCDATA_ArgsPointerMaps)) if stkmap.nbit > 0 { // We're in the prologue, so it's always stack map index 0. bv := stackmapdata(stkmap, 0) bulkBarrierBitmap(spArg, spArg, uintptr(bv.n)*sys.PtrSize, 0, bv.bytedata) } } } memclrNoHeapPointers(unsafe.Pointer(&newg.sched), unsafe.Sizeof(newg.sched)) newg.sched.sp = sp newg.stktopsp = sp newg.sched.pc = funcPC(goexit) + sys.PCQuantum // +PCQuantum so that previous instruction is in same function newg.sched.g = guintptr(unsafe.Pointer(newg)) gostartcallfn(&newg.sched, fn) newg.gopc = callerpc newg.ancestors = saveAncestors(callergp) newg.startpc = fn.fn if _g_.m.curg ! = nil { newg.labels = _g_.m.curg.labels } if isSystemGoroutine(newg, false) { atomic.Xadd(&sched.ngsys, +1) } newg.gcscanvalid = false casgstatus(newg, _Gdead, _Grunnable) if _p_.goidcache == _p_.goidcacheend { // Sched.goidgen is the last allocated id, // this batch must be [sched.goidgen+1, sched.goidgen+GoidCacheBatch]. // At startup sched.goidgen=0, so main goroutine receives goid=1. _p_.goidcache = atomic.Xadd64(&sched.goidgen, _GoidCacheBatch) _p_.goidcache -= _GoidCacheBatch - 1 _p_.goidcacheend = _p_.goidcache + _GoidCacheBatch } newg.goid = int64(_p_.goidcache) _p_.goidcache++ if raceenabled { newg.racectx = racegostart(callerpc) } if trace.enabled { TraceGoCreate (newg, newg.startpc)} // Put the newly created g on the local queue of P or directly on the global queue // true means put the next one on the execution queue, Runqput (_p_, newg, true) if atomic.Load(&sched.npidle)! = 0 && atomic.Load(&sched.nmspinning) == 0 && mainStarted { wakep() } _g_.m.locks-- if _g_.m.locks == 0 && _g_.preempt { // restore the preemption request in case we've cleared it in newstack _g_.stackguard0 = stackPreempt } }Copy the code
That is, at the beginning, there was no G available on p, so a G with a small stack size was created.
// Allocate a new g with a stack large enough to hold stacksize bytes. func malg(stacksize int32) *g { newg := new(g) if stacksize >= 0 { stacksize = round2(_StackSystem + stacksize) systemstack(func() { newg.stack = stackalloc(uint32(stacksize)) }) newg.stackguard0 = newg.stack.lo + _StackGuard newg.stackguard1 = ^uintptr(0) } return newg }Copy the code
If stacksize is greater than zero, the stack stack will be allocated as follows:
type g struct { // Stack parameters. // stack describes the actual stack memory: [stack.lo, stack.hi). // stackguard0 is the stack pointer compared in the Go stack growth prologue. // It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption. // stackguard1 is the stack pointer compared in the C stack growth prologue. // It is stack.lo+StackGuard on g0 and gsignal stacks. // It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash). stack stack // offset known to runtime/cgo stackguard0 uintptr // offset known to liblink stackguard1 uintptr // offset known to liblink _panic *_panic // innermost panic - offset known to liblink _defer *_defer // innermost defer m *m // current m; offset known to arm liblink sched gobuf syscallsp uintptr // if status==Gsyscall, syscallsp = sched.sp to use during gc syscallpc uintptr // if status==Gsyscall, syscallpc = sched.pc to use during gc stktopsp uintptr // expected sp at top of stack, to check in traceback param unsafe.Pointer // passed parameter on wakeup atomicstatus uint32 stackLock uint32 // sigprof/scang lock; TODO: fold in to atomicstatus goid int64 schedlink guintptr waitsince int64 // approx time when the g become blocked waitreason waitReason // if status==Gwaiting preempt bool // preemption signal, duplicates stackguard0 = stackpreempt paniconfault bool // panic (instead of crash) on unexpected fault address preemptscan bool // preempted g does scan for gc gcscandone bool // g has scanned stack; protected by _Gscan bit in status gcscanvalid bool // false at start of gc cycle, true if G has not run since last scan; TODO: remove? throwsplit bool // must not split stack raceignore int8 // ignore race detection events sysblocktraced bool // StartTrace has emitted EvGoInSyscall about this goroutine sysexitticks int64 // cputicks when syscall has returned (for tracing) traceseq uint64 // trace event sequencer tracelastp puintptr // last P emitted an event for this goroutine lockedm muintptr sig uint32 writebuf []byte sigcode0 uintptr sigcode1 uintptr sigpc uintptr gopc uintptr // pc of go statement that created this goroutine ancestors *[]ancestorInfo // ancestor information goroutine(s) that created this goroutine (only used if debug.tracebackancestors) startpc uintptr // pc of goroutine function racectx uintptr waiting *sudog // sudog structures this g is waiting on (that have a valid elem ptr); in lock order cgoCtxt []uintptr // cgo traceback context labels unsafe.Pointer // profiler labels timer *timer // cached timer for time.Sleep selectDone uint32 // are we participating in a select and did someone win the race? // Per-G GC state // gcAssistBytes is this G's GC assist credit in terms of // bytes allocated. If this is positive, then the G has credit // to allocate gcAssistBytes bytes without assisting. If this // is negative, then the G must correct this by performing // scan work. We track this in bytes to make it fast to update // and check for debt in the malloc hot path. The assist ratio // determines how this corresponds to scan work debt. gcAssistBytes int64 }Copy the code
New g (_StackSystem + stacksize) allocates a round2(_StackSystem + stacksize) stack.
newg.stackguard0 = newg.stack.lo + _StackGuard
newg.stackguard1 = ^uintptr(0)
Copy the code
Then change the state of the newly generated G from _Gidle to _Gdead. Add g in Gdead state to ALLG slice.
var (
allgs []*g
allglock mutex
)
func allgadd(gp *g) {
if readgstatus(gp) == _Gidle {
throw("allgadd: bad status Gidle")
}
lock(&allglock)
allgs = append(allgs, gp)
allglen = uintptr(len(allgs))
unlock(&allglock)
}
Copy the code
Then initialize the g-related Sched field, whose type is a structure,
type gobuf struct { // The offsets of sp, pc, and g are known to (hard-coded in) libmach. // // ctxt is unusual with respect to GC: it may be a // heap-allocated funcval, so GC needs to track it, but it // needs to be set and cleared from assembly, where it's // difficult to have write barriers. However, ctxt is really a // saved, live register, and we only ever exchange it between // the real register and the gobuf. Hence, we treat it as a // root during stack scanning, which means assembly that saves // and restores it doesn't need write barriers. It's still // typed as a pointer so that any other writes from Go get // write barriers. sp uintptr pc uintptr g guintptr ctxt unsafe.Pointer ret sys.Uintreg lr uintptr bp uintptr // for GOEXPERIMENT=framepointer }Copy the code
The function of this field, we can roughly guess should be to save a variety of pointer address, a bit of protection of the meaning of the scene, see specifically
memclrNoHeapPointers(unsafe.Pointer(&newg.sched), unsafe.Sizeof(newg.sched))
newg.sched.sp = sp
newg.stktopsp = sp
newg.sched.pc = funcPC(goexit) + sys.PCQuantum // +PCQuantum so that previous instruction is in same function
newg.sched.g = guintptr(unsafe.Pointer(newg))
gostartcallfn(&newg.sched, fn)
Copy the code
FuncPC returns the PC entry of f +PCQuantum in order to place the previous instruction in the same function. Before calling goStartCallfn
+--------+ | | --- --- newg.stack.hi +--------+ | | | | | | +--------+ | | | | | | siz +--------+ | | | | | | +--------+ | | | | | --- +--------+ | | | | +--------+ | totalSize = 4*sys.PtrSize + siz | | | +--------+ | | | | +--------+ | | | + -- -- -- -- -- -- -- -- -- -- -- + high address SP | | imaginary caller stack frame + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - | | fn stack frame + -- -- -- -- -- -- -- -- + | | Lower address... +--------+ PC | goexit | +--------+Copy the code
After gostartCallfn is executed
func gostartcallfn(gobuf *gobuf, fv *funcval) { var fn unsafe.Pointer if fv ! = nil { fn = unsafe.Pointer(fv.fn) } else { fn = unsafe.Pointer(funcPC(nilfunc)) } gostartcall(gobuf, fn, unsafe.Pointer(fv)) } func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) { if buf.lr ! = 0 { throw("invalid use of gostartcall") } buf.lr = buf.pc buf.pc = uintptr(fn) buf.ctxt = ctxt }Copy the code
The saved stack looks like this:
+--------+ | | --- --- newg.stack.hi +--------+ | | | | | | +--------+ | | | | | | siz +--------+ | | | | | | +--------+ | | | | | --- +--------+ | | | | +--------+ | totalSize = 4*sys.PtrSize + siz | | | +--------+ | | | | +--------+ | | | + -- -- -- -- -- -- -- -- -- -- -- + | high address goexit | imaginary caller stack frame + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- SP | | fn stack frame + -- -- -- -- -- -- -- -- + | | low address... +--------+ PC | fn | +--------+Copy the code
As you can see, sched.sp actually saves the address of Goexit at the execution site. So JMP jumps to the PC register and starts executing FN. When fn completes execution, it will restore the address of the (imaginary) caller GOexit to PC, so as to achieve the purpose of executing Goexit. The aftermath of exit includes resetting the state of G, untying M and G, and putting them into the Gfree linked list to wait for other GO statements to create a new G.
Then there is an action to adjust the current state of G
casgstatus(newg, _Gdead, _Grunnable)
Copy the code
The runnable g is put into the local runnable queue,
runqput(_p_, newg, true)
Copy the code
The body of the function looks like this:
// runqput attempts to place g in a local runnable queue. If next is false, runqput adds g to the end of the runnable queue. // If next is true, runqput puts g in the _p_. Runnext slot. // If the runqueue is full, runnext places G in the global queue. // Only executed by owner P. func runqput(_p_ *p, gp *g, next bool) { if randomizeScheduler && next && fastrand()%2 == 0 { next = false } if next { retryNext: oldnext := _p_.runnext if ! _p_.runnext.cas(oldnext, guintptr(unsafe.Pointer(gp))) { goto retryNext } if oldnext == 0 { return } // Kick the old runnext out to the regular run queue. gp = oldnext.ptr() } retry: h := atomic.LoadAcq(&_p_.runqhead) // load-acquire, synchronize with consumers t := _p_.runqtail if t-h < uint32(len(_p_.runq)) { _p_.runq[t%uint32(len(_p_.runq))].set(gp) atomic.StoreRel(&_p_.runqtail, t+1) // store-release, makes the item available for consumption return } if runqputslow(_p_, gp, h, t) { return } // the queue is not full, now the put above must succeed goto retry }Copy the code
So, we have a general idea of what g is. When we create g and put it on a local queue, we refer to a structure p. What is this thing? Here is his structure
type p struct { lock mutex id int32 status uint32 // one of pidle/prunning/... link puintptr schedtick uint32 // incremented on every scheduler call syscalltick uint32 // incremented on every system call sysmontick sysmontick // last tick observed by sysmon m muintptr // back-link to associated m (nil if idle) mcache *mcache racectx uintptr deferpool [5][]*_defer // pool of available defer structs of different sizes (see panic.go) deferpoolbuf [5][32]*_defer // Cache of goroutine ids, Amortizes accesses Runtime · Sched. goidGen. Goidcache uint64 GOIDCACHEEND uint64 // Queue of Runnable goroutines. Email exchange with email exchange without lock. // Runqhead uint32 RUNQTAIL uint32 RUNq [256]guintptr // RunNext (if not nil) is the current g ready to run, // If there is time left in the time slice of the running G, run next instead of getting G from runq. // It inherits the remaining time in the current slice. // If a set of Goroutines is locked into communication wait mode, // this dispatcher sets it to a cell, // and eliminates the (potentially large) scheduling delay that might otherwise be caused by adding ready Goroutines to the end of the run queue. runnext guintptr // Available G's (status == Gdead) gFree struct { gList n int32 } sudogcache []*sudog sudogbuf [128]*sudog tracebuf traceBufPtr // traceSweep indicates the sweep events should be traced. // This is used to defer the sweep start event until a span // has actually been swept. traceSweep bool // traceSwept and traceReclaimed track the number of bytes // swept and reclaimed by sweeping in the current sweep loop. traceSwept, traceReclaimed uintptr palloc persistentAlloc // per-P to avoid mutex // Per-P GC state gcAssistTime int64 // Nanoseconds in assistAlloc gcFractionalMarkTime int64 // Nanoseconds in fractional mark worker gcBgMarkWorker guintptr gcMarkWorkerMode gcMarkWorkerMode // gcMarkWorkerStartTime is the nanotime() at which this mark // worker started. gcMarkWorkerStartTime int64 // gcw is this P's GC work buffer cache. The work buffer is // filled by write barriers, drained by mutator assists, and // disposed on certain GC state transitions. gcw gcWork // wbBuf is this P's GC write barrier buffer. // // TODO: Consider caching this in the running G. wbBuf wbBuf runSafePointFn uint32 // if 1, run sched.safePointFn at next safe point pad cpu.CacheLinePad }Copy the code
In the newproc function, when the p structure is fetched from the current g, the m field of g is passed. Is a pointer to m structure, and the structure prototype of M is:
Type m struct {g0 *g // Goroutine morebuf GObuf // GOBUf ARG to MORESTACK DIVMOD uint32 denominator for arm - known to liblink // Fields not known to debuggers. procid uint64 // for debuggers, Allocated signal handling stack // Go- coded gsignal * G // goSigStack // Go- coded signal handling stack Sigmask sigset // Storage for Saved Signal mask TLS [6] Uintptr // Thread local storage mstartFn func() curg *g // Current running g caughtsig Guintptr // Goroutine running DURING fatal signal P puintptr // P held when executing go code (nil if not executed) NexTP puintptr OLDP puintptr // the p that was attached before executing a syscall id int64 mallocing int32 throwing int32 preemptoff string // if ! = "", Keep curg running on this M locks int32 dying Int32 profileHz Int32 spinning bool // M is not running work and is looking for work blocked bool // m is blocked on a note inwb bool // m is executing a write barrier newSigstack bool // minit on C thread called sigaltstack printlock int8 incgo bool // m is executing a cgo call freeWait uint32 // if == 0, safe to free g0 and delete m (atomic) fastrand [2]uint32 needextram bool traceback uint8 ncgocall uint64 // number of cgo calls in total ncgo int32 // number of cgo calls currently in progress cgoCallersUse uint32 // if non-zero, cgoCallers in use temporarily cgoCallers *cgoCallers // cgo traceback if crashing in cgo call park note alllink *m // on allm schedlink muintptr mcache *mcache lockedg guintptr createstack [32]uintptr // stack that created this thread. lockedExt uint32 // tracking for external LockOSThread lockedInt uint32 // tracking for internal lockOSThread nextwaitm muintptr // next m waiting for lock waitunlockf unsafe.Pointer // todo go func(*g, unsafe.pointer) bool waitlock unsafe.Pointer waittraceev byte waittraceskip int startingtrace bool syscalltick uint32 thread uintptr // thread handle freelink *m // on sched.freem // these are here because they are too large to be on the stack // of low-level NOSPLIT functions. libcall libcall libcallpc uintptr // for cpu profiler libcallsp uintptr libcallg guintptr syscall libcall // stores syscall parameters on windows vdsoSP uintptr // SP for traceback while in VDSO call (0 if not in call) vdsoPC uintptr // PC for traceback while in VDSO call mOS }Copy the code
Look at the above structure feels very empty, what are they? When newProc creates G, it puts it in the local runnable queue associated with P, and to understand what those things are, you have to start with how did they come about?
➜ Goroutinetest GDB Main GNU GDB (GDB) 8.3 Copyright (C) 2019 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "X86_64-apple-windows 16.7.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from main... (No debugging symbols found in main) Loading Go Runtime support. (gdb) info files Symbols from "goroutinetest/main". Local exec file: `goroutinetest/main', file type mach-o-x86-64. Entry point: 0x1052770 0x0000000001001000 - 0x0000000001093194 is .text 0x00000000010931a0 - 0x00000000010e1ace is __TEXT.__rodata 0x00000000010e1ae0 - 0x00000000010e1be2 is __TEXT.__symbol_stub1 0x00000000010e1c00 - 0x00000000010e2864 is __TEXT.__typelink 0x00000000010e2868 - 0x00000000010e28d0 is __TEXT.__itablink 0x00000000010e28d0 - 0x00000000010e28d0 is __TEXT.__gosymtab 0x00000000010e28e0 - 0x000000000115c108 is __TEXT.__gopclntab 0x000000000115d000 - 0x000000000115d158 is __DATA.__nl_symbol_ptr 0x000000000115d160 - 0x0000000001169c9c is __DATA.__noptrdata 0x0000000001169ca0 - 0x0000000001170610 is .data 0x0000000001170620 - 0x000000000118be50 is .bss 0x000000000118be60 - 0x000000000118e418 is __DATA.__noptrbss (gdb) (gdb) b *0x1052770 Breakpoint 1 at 0x1052770 (gdb) info br Num Type Disp Enb Address What 1 breakpoint keep y 0x0000000001052770 <_rt0_amd64_darwin> (gdb)Copy the code
What is _rt0_amd64_darwin?
#include "textflag.h"
TEXT _rt0_amd64_darwin(SB),NOSPLIT,$-8
JMP _rt0_amd64(SB)
// When linking with -shared, this symbol is called when the shared library
// is loaded.
TEXT _rt0_amd64_darwin_lib(SB),NOSPLIT,$0
JMP _rt0_amd64_lib(SB)
Copy the code
_RT0_AMD64 is the common startup code for most AMD64 systems when internal links are used. This is the program entry point for the normal -buildmode = exe program in the kernel. The stack holds the number of arguments and c-style argv.
TEXT _rt0_amd64(SB),NOSPLIT,$-8
MOVQ 0(SP), DI // argc
LEAQ 8(SP), SI // argv
JMP runtime·rt0_go(SB)
Copy the code
The runtime.rt0_go method is finally called
TEXT Runtime ·rt0_go(SB),NOSPLIT,$0 // SP = stack; R0 = argc; R1 = argv SUB $32, RSP MOVW R0, 8(RSP) // argc MOVD R1, 16(RSP) // argv // create istack out of the given (operating system) stack. // _cgo_init may update stackguard. MOVD $runtime·g0(SB), MOVD RSP, MOVD RSP $(-64*1024)(R7), MOVD R0, g_stackGuard0 (g) MOVD R0, (g_stack+stack_lo)(g) MOVD R7, (g_stack+stack_hi)(g) // if there is a _cgo_init, call it using the gcc ABI. MOVD _cgo_init(SB), R12 CMP $0, R12 BEQ nocgo MRS_TPIDR_R0 // load TLS base pointer MOVD R0, R3 / / arg 3: TLS base pointer # ifdef TLSG_IS_VARIABLE MOVD $runtime · tls_g (SB), R2 / / arg 2: &tls_g #else MOVD $0, R2 // arg 2: not used when using platform's TLS #endif MOVD $setg_gcc<>(SB), R1 // arg 1: setg MOVD g, R0 // arg 0: G SUB $16, RSP // reserve 16 bytes for sp-8 where fp may be saved. BL (R12) ADD $16, RSP nocgo: BL runtime·save_g(SB) // Update stackGuard after _cgo_movd (g_stack+stack_lo)(g), R0 ADD $const__StackGuard, R0 MOVD R0, g_stackguard0(g) MOVD R0, G_stackguard1 (g) : registers the content of the m0(m) and the m0(m) : registers the content of the m0(m) and the m0(m) : registers the content of the M0 (m). M_g0 (R0) // save m0 to g0->m MOVD R0, g_m(g) BL runtime·check(SB) MOVW 8(RSP), R0 -8(RSP) MOVD 16(RSP), R0 // copy argv MOVD R0, 0(RSP) BL Runtime ·args(SB) BL Runtime ·osinit(SB) BL Runtime ·schedinit(SB) // Create a new goroutine to start program MOVD $0, -8(R7) MOVD.W $0, -8(R7) MOVD.W $0, -8(R7) MOVD.W $0, -8(R7) MOVD.W $0, -8(R7) MOVD.W $0, -8(R7) MOVD.W $0, -8(R7) MOVD R7, RSP BL runtime·mstart(SB) ADD $32, RSP // start this M BL runtime·mstart(SB) MOVD $0, R0 MOVD R0, R0 MOVD R0 (R0) // boom UNDEFCopy the code
G0 and M0 are initialized first, followed by detection Settings for local thread storage. Initialize the scheduler, create a new Goroutine runtime, and finally open our M.
// The bootstrap sequence is: // // call osinit // Call schedinit // make & queue new G // Call Runtime ·mstart // // The new G calls Runtime ·main. func schedinit() { // raceinit must be the first call to race detector. // In particular, it must be done before mallocinit below calls racemapshadow. _g_ := getg() if raceenabled { _g_.racectx, Raceprocctx0 = raceinit()} // Set a maximum of 10000 operating system threads to start, Maxmcount = 10000 TraceBackInit () moduleDataverify () StackInit () mallocInit () McOmmoninit (_g_.m) // Initialize m0, G0 ->m = &m0 cpuinit() // must run before alginit alginit() // maps must not be used before this call modulesinit() // provides activeModules typelinksinit() // uses maps, activeModules itabsinit() // uses activeModules msigsave(_g_.m) initSigmask = _g_.m.sigmask goargs() goenvs() Parsedebugvars () gcinit() sched.lastPOLL = uint64(nanotime()) // How many cores are in the system, Procs := ncpu if n, ok := atoi32(gogetenv("GOMAXPROCS")); Ok &&n > 0 {// Create a specified number of p procs = n} // create and initialize the global variable allp if procreSize (procs)! = nil { throw("unknown runnable goroutine during bootstrap") } // For cgocheck > 1, we turn on the write barrier at all times // and check all pointer writes. We can't do this until after // procresize because the write barrier needs a P. if debug.cgocheck > 1 { writeBarrier.cgo = true writeBarrier.enabled = true for _, p := range allp { p.wbBuf.reset() } } if buildVersion == "" { // Condition should never trigger. This code just serves // Ensure Runtime ·buildVersion is kept in the resulting binary. BuildVersion = "unknown"}}Copy the code
Let’s look at how m0 is initialized. Okay
func mcommoninit(mp *m) { _g_ := getg() // g0 stack won't make sense for user (and is not necessary unwindable). if _g_ ! = _g_.m.g0 { callers(1, mp.createstack[:]) } lock(&sched.lock) if sched.mnext+1 < sched.mnext { throw("runtime: ID = sched.mnext sched.mnext++ checkmcount(); // the schedt structure's mnext field indicates the next available thread id.mp. ID = sched.mnext sched.mnext++ checkmcount() mp.fastrand[0] = 1597334677 * uint32(mp.id) mp.fastrand[1] = uint32(cputicks()) if mp.fastrand[0]|mp.fastrand[1] == 0 { mp.fastrand[1] = 1 } mpreinit(mp) if mp.gsignal ! = nil { mp.gsignal.stackguard1 = mp.gsignal.stack.lo + _StackGuard } // Add to allm so garbage collector doesn't free G ->m when it is just in a register or thread-local storage. Mp. alllink = allm // NumCgoCall() iterates over allM w/o schedlock, // so we need to publish it safely. atomicstorep(unsafe.Pointer(&allm), unsafe.Pointer(mp)) unlock(&sched.lock) // Allocate memory to hold a cgo traceback if the cgo call crashes. if iscgo || GOOS == "solaris" || GOOS == "windows" { mp.cgoCallers = new(cgoCallers) } }Copy the code
The last part of the scheduler initialization is the initialization of P
With all the initialization done, it’s time to start the run-time scheduler. We already know that the last boot call to be executed when all the preparations are complete is runtime.mstart.
Func mstart() {_g_ := getg() // osStack := _g_.stack.lo == 0 if osStack {// osStack Initialize stack bounds from system stack. // Cgo may have left stack size in stack.hi. // minit may update the stack bounds. size := _g_.stack.hi if size == 0 { size = 8192 * sys.StackGuardMultiplier } _g_.stack.hi = uintptr(noescape(unsafe.Pointer(&size))) _g_.stack.lo = _g_.stack.hi - size + 1024 } // Initialize stack guards so that we can start calling // both Go and C functions with stack growth prologues. _g_.stackguard0 = _g_.stack.lo + _StackGuard _g_. Stackguard1 = _g_. Stackguard0 / / start m mstart1 exit () / / thread if GOOS = = "Windows" | | GOOS = = "solaris" | | GOOS == "plan9" || GOOS == "darwin" || GOOS == "aix" { // Window, Solaris, Darwin, AIX and Plan 9 always system-allocate // the stack, but put it in _g_.stack before mstart, // so the logic above hasn't set osStack yet. osStack = true } mexit(osStack) } func mstart1() { _g_ := getg() if _g_ ! Throw ("bad Runtime ·mstart") {throw("bad Runtime ·mstart")} So other calls can reuse the current frame. save(getcallerpc(), getcallersp()) asminit() minit() // Install signal handlers; after minit so that minit can // prepare the thread to be able to handle the signals. if _g_.m == &m0 { mstartm0() } // If fn := _g_.m.startfn; fn ! P if _g_.m! = nil {fn()} // If m is not m0, bind p if _g_.m! Acquirep (_g_.m.ntp.ptr ()) _g_.m.ntp.ptr = 0}Copy the code
In mstart1, the schedule function is called: round scheduler: find a runnable Goroutine and execute it. It will never return.
func schedule() { _g_ := getg() if _g_.m.locks ! = 0 { throw("schedule: holding locks") } if _g_.m.lockedg ! = 0 { stoplockedm() execute(_g_.m.lockedg.ptr(), false) // Never returns. } // We should not schedule away from a g that is executing a cgo call, // since the cgo call is using the m's g0 stack. if _g_.m.incgo { throw("schedule: in cgo") } top: if sched.gcwaiting ! = 0 { gcstopm() goto top } if _g_.m.p.ptr().runSafePointFn ! = 0 { runSafePointFn() } var gp *g var inheritTime bool if trace.enabled || trace.shutdown { gp = traceReader() if gp ! = nil { casgstatus(gp, _Gwaiting, _Grunnable) traceGoUnpark(gp, 0) } } if gp == nil && gcBlackenEnabled ! . = 0 {gp = gcController findRunnableGCWorker _g_. M.P.P tr () ()} if gp = = nil {/ / / / that isn't GC / / / / each dispatching 61 times, just check a global queue, Schedtick %61 == 0 && sched.runqsize > 0 { Lock (&sched.lock) // Steal from global queue g gp = globrunqget(_g_.m.p.ptr(), 1) unlock(&sched.lock)}} if gp == nil {gp, inheritTime = runqget(_g_.m.p.ptr()) if gp ! = nil && _g_.m.spinning { throw("schedule: Spinning with local work")}} if gp == nil {gp, inheritTime = findrunnable()} And is no longer spinning, // therefore, if it is marked as spinning, it needs to reset immediately and possibly start the new spin of M. if _g_.m.spinning { resetspinning() } if sched.disable.user && ! schedEnabled(gp) { // Scheduling of this goroutine is disabled. Put it on // the list of pending runnable goroutines for when we // re-enable user scheduling and look again. lock(&sched.lock) if schedEnabled(gp) { // Something re-enabled scheduling while we // were acquiring the lock. unlock(&sched.lock) } else { sched.disable.runnable.pushBack(gp) sched.disable.n++ unlock(&sched.lock) goto top } } if gp.lockedm ! = 0 { // Hands off own p to the locked m, // Then blocks waiting for a new p. startLockedm (gp) goto top} // If inheritTime is true, we are inheritTime (gp, inheritTime) }Copy the code
If M is spinning, the resetspinning method is called,
func resetspinning() { _g_ := getg() if ! _g_.m.spinning { throw("resetspinning: not a spinning m") } _g_.m.spinning = false nmspinning := atomic.Xadd(&sched.nmspinning, -1) if int32(nmspinning) < 0 {throw("findrunnable: negative nmspinning")} // M's wake-up strategy is deliberately conservative, so check if you need to wake up another P here. // See the "worker thread park/unpark" comment at the top of the file for more information. if nmspinning == 0 && atomic.Load(&sched.npidle) > 0 { wakep() } }Copy the code
Wakep () tries to add another P to execute G. Called when G becomes runnable (newproc, ready). This function calls startm(nil, true). Startm schedules some M to run p (create M if necessary). If p == nil, try to get a free P, if there is no free P, do nothing. Can run with m.p == nil, so write barriers are not allowed. If rotation is set, then the caller has increased nmspinning and startM will reduce nmspinning or set up M.pinning in the newly launched M.
In the core method schedule, the last thing to do is execute the function execute
// If inheritTime is true, GP inherits the remaining time slice, otherwise, it opens a new time slice // Never returns // Never returns. // // Write barrier allows, // //go: yesWriteBarrierrec func execute(gp *g, inheritTime bool) {_g_ := getg() casgStatus (gp, _Grunnable, _Grunning) gp.waitsince = 0 gp.preempt = false gp.stackguard0 = gp.stack.lo + _StackGuard if ! inheritTime { _g_.m.p.ptr().schedtick++ } _g_.m.curg = gp gp.m = _g_.m // Check whether the profiler needs to be turned on or off. hz := sched.profilehz if _g_.m.profilehz ! = hz { setThreadCPUProfiler(hz) } if trace.enabled { // GoSysExit has to happen when we have a P, but before GoStart. // So we emit it here. if gp.syscallsp ! {traceGoSysExit(gp.sysexitTicks)} traceGoStart()}Copy the code
Gogo in amD64 environment
Func gogo(buf *gobuf) // Restore state from gobuf; Longjmp TEXT Runtime ·gogo(SB), NOSPLIT, MOVQ +0(FP), BX, MOVQ 0(DX), longjmp TEXT Runtime ·gogo(SB), NOSPLIT, MOVQ +0(FP), BX, CX // make sure g ! = nil get_tls(CX) MOVQ DX, g(CX) MOVQ gobuf_sp(BX), SP // restore SP MOVQ gobuf_ret(BX), AX MOVQ gobuf_ctxt(BX), DX MOVQ gobuf_bp(BX), BP MOVQ $0, gobuf_sp(BX) // clear to help garbage collector MOVQ $0, gobuf_ret(BX) MOVQ $0, Gobuf_ctxt (BX) MOVQ $0, goBUF_bp (BX) MOVQ gobuf_PC (BX), BX MOVQ gobuf_PC (BX), BXCopy the code
In the explanation of the g call, we learned that sched.sp actually holds the address of Goexit at the execution site. So JMP jumps to the PC register and starts executing FN. When fn completes execution, the address of the (imaginary) caller goexit is restored to the PC, thus achieving the purpose of goexit execution. In goexit, we call the Schedule function again. The whole process starts again. If Goroutine locks itself in the same OS thread and does not unbind itself, m exits and is not put back into the thread pool. Instead, gogo is called again to switch to the G0 execution scene, which is currently the only chance to exit M.
In schedule, work is stolen from the global runnable queue, work is stolen from the local queue, and finDRUNnable looks for work that can be stolen, as follows:
// Steal from the global queue, Func globrunqget(_p_ *p, max int32) *g { if sched.runqsize == 0 { return nil } n := sched.runqsize/gomaxprocs + 1 if n > sched.runqsize { n = sched.runqsize } if max > 0 && n > max { n = max } if n > int32(len(_p_.runq))/2 { n = int32(len(_p_.runq)) / 2 } Runqsize -= n gp := sched.runq.pop() n-- n > 0; n-- { gp1 := sched.runq.pop() runqput(_p_, gp1, false) } return gp }Copy the code
// Retrieves G from the local runnable queue // if inheritTime is true, G inherits the remaining time slice // otherwise starts a new time slice. [inheritTime bool] {// If there's a runnext, it's the next G to run. for { next := _p_.runnext if next == 0 { break } if _p_.runnext.cas(next, 0) { return next.ptr(), true } } for { h := atomic.LoadAcq(&_p_.runqhead) // load-acquire, synchronize with other consumers t := _p_.runqtail if t == h { return nil, false } gp := _p_.runq[h%uint32(len(_p_.runq))].ptr() if atomic.CasRel(&_p_.runqhead, h, h+1) { // cas-release, commits consume return gp, false } } }Copy the code
// Finds a runnable goroutine to execute.
// Tries to steal from other P's, get g from global queue, poll network.
func findrunnable() (gp *g, inheritTime bool) {
_g_ := getg()
// The conditions here and in handoffp must agree: if
// findrunnable would return a G to run, handoffp must start
// an M.
top:
_p_ := _g_.m.p.ptr()
if sched.gcwaiting != 0 {
gcstopm()
goto top
}
if _p_.runSafePointFn != 0 {
runSafePointFn()
}
if fingwait && fingwake {
if gp := wakefing(); gp != nil {
ready(gp, 0, true)
}
}
if *cgo_yield != nil {
asmcgocall(*cgo_yield, nil)
}
// local runq
if gp, inheritTime := runqget(_p_); gp != nil {
return gp, inheritTime
}
// global runq
if sched.runqsize != 0 {
lock(&sched.lock)
gp := globrunqget(_p_, 0)
unlock(&sched.lock)
if gp != nil {
return gp, false
}
}
// Poll network.
// This netpoll is only an optimization before we resort to stealing.
// We can safely skip it if there are no waiters or a thread is blocked
// in netpoll already. If there is any kind of logical race with that
// blocked thread (e.g. it has already returned from netpoll, but does
// not set lastpoll yet), this thread will do blocking netpoll below
// anyway.
if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Load64(&sched.lastpoll) != 0 {
if list := netpoll(false); !list.empty() { // non-blocking
gp := list.pop()
injectglist(&list)
casgstatus(gp, _Gwaiting, _Grunnable)
if trace.enabled {
traceGoUnpark(gp, 0)
}
return gp, false
}
}
// Steal work from other P's.
procs := uint32(gomaxprocs)
if atomic.Load(&sched.npidle) == procs-1 {
// Either GOMAXPROCS=1 or everybody, except for us, is idle already.
// New work can appear from returning syscall/cgocall, network or timers.
// Neither of that submits to local run queues, so no point in stealing.
goto stop
}
// If number of spinning M's >= number of busy P's, block.
// This is necessary to prevent excessive CPU consumption
// when GOMAXPROCS>>1 but the program parallelism is low.
if !_g_.m.spinning && 2*atomic.Load(&sched.nmspinning) >= procs-atomic.Load(&sched.npidle) {
goto stop
}
if !_g_.m.spinning {
_g_.m.spinning = true
atomic.Xadd(&sched.nmspinning, 1)
}
for i := 0; i < 4; i++ {
for enum := stealOrder.start(fastrand()); !enum.done(); enum.next() {
if sched.gcwaiting != 0 {
goto top
}
stealRunNextG := i > 2 // first look for ready queues with more than 1 g
if gp := runqsteal(_p_, allp[enum.position()], stealRunNextG); gp != nil {
return gp, false
}
}
}
stop:
// We have nothing to do. If we're in the GC mark phase, can
// safely scan and blacken objects, and have work to do, run
// idle-time marking rather than give up the P.
if gcBlackenEnabled != 0 && _p_.gcBgMarkWorker != 0 && gcMarkWorkAvailable(_p_) {
_p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
gp := _p_.gcBgMarkWorker.ptr()
casgstatus(gp, _Gwaiting, _Grunnable)
if trace.enabled {
traceGoUnpark(gp, 0)
}
return gp, false
}
// wasm only:
// If a callback returned and no other goroutine is awake,
// then pause execution until a callback was triggered.
if beforeIdle() {
// At least one goroutine got woken.
goto top
}
// Before we drop our P, make a snapshot of the allp slice,
// which can change underfoot once we no longer block
// safe-points. We don't need to snapshot the contents because
// everything up to cap(allp) is immutable.
allpSnapshot := allp
// return P and block
lock(&sched.lock)
if sched.gcwaiting != 0 || _p_.runSafePointFn != 0 {
unlock(&sched.lock)
goto top
}
if sched.runqsize != 0 {
gp := globrunqget(_p_, 0)
unlock(&sched.lock)
return gp, false
}
if releasep() != _p_ {
throw("findrunnable: wrong p")
}
pidleput(_p_)
unlock(&sched.lock)
// Delicate dance: thread transitions from spinning to non-spinning state,
// potentially concurrently with submission of new goroutines. We must
// drop nmspinning first and then check all per-P queues again (with
// #StoreLoad memory barrier in between). If we do it the other way around,
// another thread can submit a goroutine after we've checked all run queues
// but before we drop nmspinning; as the result nobody will unpark a thread
// to run the goroutine.
// If we discover new work below, we need to restore m.spinning as a signal
// for resetspinning to unpark a new worker thread (because there can be more
// than one starving goroutine). However, if after discovering new work
// we also observe no idle Ps, it is OK to just park the current thread:
// the system is fully loaded so no spinning threads are required.
// Also see "Worker thread parking/unparking" comment at the top of the file.
wasSpinning := _g_.m.spinning
if _g_.m.spinning {
_g_.m.spinning = false
if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
throw("findrunnable: negative nmspinning")
}
}
// check all runqueues once again
for _, _p_ := range allpSnapshot {
if !runqempty(_p_) {
lock(&sched.lock)
_p_ = pidleget()
unlock(&sched.lock)
if _p_ != nil {
acquirep(_p_)
if wasSpinning {
_g_.m.spinning = true
atomic.Xadd(&sched.nmspinning, 1)
}
goto top
}
break
}
}
// Check for idle-priority GC work again.
if gcBlackenEnabled != 0 && gcMarkWorkAvailable(nil) {
lock(&sched.lock)
_p_ = pidleget()
if _p_ != nil && _p_.gcBgMarkWorker == 0 {
pidleput(_p_)
_p_ = nil
}
unlock(&sched.lock)
if _p_ != nil {
acquirep(_p_)
if wasSpinning {
_g_.m.spinning = true
atomic.Xadd(&sched.nmspinning, 1)
}
// Go back to idle GC check.
goto stop
}
}
// poll network
if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Xchg64(&sched.lastpoll, 0) != 0 {
if _g_.m.p != 0 {
throw("findrunnable: netpoll with p")
}
if _g_.m.spinning {
throw("findrunnable: netpoll with spinning")
}
list := netpoll(true) // block until new work is available
atomic.Store64(&sched.lastpoll, uint64(nanotime()))
if !list.empty() {
lock(&sched.lock)
_p_ = pidleget()
unlock(&sched.lock)
if _p_ != nil {
acquirep(_p_)
gp := list.pop()
injectglist(&list)
casgstatus(gp, _Gwaiting, _Grunnable)
if trace.enabled {
traceGoUnpark(gp, 0)
}
return gp, false
}
injectglist(&list)
}
}
stopm()
goto top
}
Copy the code
In the process of Findrunnable, we:
First check if GC is in progress, if so suspend the current M and block sleep; Try to fetch g from the local queue, return if it is found, otherwise continue to search for g from the global queue, return if it is found; Check whether g of the poll network exists. If so, return g directly. If g still cannot be found at this point, steal from the local queues of other P's; In the first two rounds, only runnable queues will be looked for. In the second two rounds, ready queues will be looked for first. If found, ready queues will be returned. All possibilities have been explored, and additional checks should be carried out before preparing the temporary m; First check if this is the GC mark phase, if so, return g for the MARK phase directly; If not, take a snapshot of the current P and prepare to lock the scheduler. Once the scheduler is locked, we still need to check again to see if any GC has been entered during that time. If so, we go back to the first step, block M and sleep; When the scheduler is locked, if we find g in the global queue, we return directly; When the scheduler was locked, we could not find the task completely, so we returned and released the current P, put it into idle linked list, and unlocked the scheduler. When M/P has been unbound, we need to switch the state of M to spin and reduce nMspinning; At this point we still need to recheck all queues; If we find that a P queue is not empty at this time, we will immediately try to obtain a P. If the P queue is obtained, we will go back to the first step and steal again. If the P queue is not obtained, it means that the system is fully loaded and there is no need to continue scheduling. Again, we need to check if g of GC mark is present. If so, fetch P and go back to step 1 to steal again. Similarly, we also need to check whether there is a poll network g, if so, return directly; Finally, we find nothing, suspend the current M and block sleep.Copy the code
Articles in this series:
- I probably wouldn’t use Golang slice
- I probably wouldn’t use Golang’s map
- I probably wouldn’t use Golang’s channel
- I probably wouldn’t use Golang’s Goroutines
If you have any questions, please leave a comment
References:
- The book of god
- Apo Chang loves to write programs