Recently, when I was studying performance optimization, I saw a document HACKING. Md under the Golang Runtime package and found it interesting. After reading it, I felt that my understanding of Runtime was better, so I wanted to translate it.

This chapter will have some depth and will require a basic reader, but it will not be possible to cover all the details here.

This document is intended for runtime developers, so there is a lot of content that we don’t have access to in normal use.

This document is frequently edited, and current content may become outdated over time. This document is intended to show how writing Runtime code differs from normal GO code, so focus on general concepts rather than detailed implementations.

Scheduler structure

The scheduler manages three types that are important in Runtime: G, M, and P. Even if you don’t write scheduler code, you should understand these concepts.

G, M and P

A G is a goroutine, represented in Runtime by the type G. When a Goroutine exits, the g object is placed in a free pool of G objects for subsequent goroutine use.

An M is a system thread that can execute the user’s GO code, Runtime code, system call, or idle wait. It is represented in Runtime by the type M. At the same time, there can be any number of M’s, because any number of M’s can block in a system call. When an M executes a blocking system call, M and P are unbound and a new M is created to execute the other G on P.

Finally, a P represents the resources needed to execute the user go code, such as scheduler state, memory allocator state, and so on. It is represented in Runtime by type P. The number of P is exactly equal to GOMAXPROCS. A P can be understood to be the CPU in the operating system scheduler, and the P type can be understood to be the state of each CPU. Here you can put some states that need to be shared efficiently but not for each P (Per P) or each M (Per M).

The job of the scheduler is to combine a G (the code to execute), an M (where the code executes), and a P (the permissions and resources that the code needs to execute). When an M stops executing user code (such as entering a blocked system call), its P needs to be returned to the free P pool. In order to continue executing the user’s GO code (such as when exiting from a blocked system call), a P needs to be retrieved from the free P pool.

All g, M, and P objects are allocated on the heap and are never freed, so their memory usage is stable. Thanks to this, Runtime can avoid write barriers in scheduler implementations.

User stack and system stack

Each living (non-dead) G has an associated user stack where the user’s code is executed. The user stack starts out small (say 2K) and grows or shrinks dynamically.

Each M has an associated system stack (also known as the G0 stack because this stack is also implemented through G); On Unix platforms, there is also a Signal stack (also known as the GSignal stack). The system and signal stacks cannot grow, but are large enough to run any Runtime and CGO code (8K in pure GO binaries, allocated by the system in the CGO case).

Runtime code often temporarily switches to the systemstack by calling systemstack, McAll, or asmcgocall to perform special tasks such as: cannot be preempted, should not expand the user stack, and will switch the user goroutine. Code running on the system stack is implicitly non-preemptible, and the garbage collector does not scan the system stack. When an M is run on the system stack, the current user stack is not run.

getg()andgetg().m.curg

If you want to get the current user’s g, you need to use getg().m.curg.

Getg () returns the current g, but when executed on the system stack or signal stack, it returns g0 or gsignal for the current M, which is probably not what you want.

To determine whether you are currently executing on the system stack or the user stack, use getg() == getg().m.carbg.

Error handling and reporting

In user code, there are errors that can be reasonably recovered using PANIC as usual, but in some cases panic can cause immediate fatal errors, such as when called in the system stack or when performing mallocGC.

Most runtime errors are unrecoverable. For these unrecoverable errors, a throw should be used. The throw prints the traceback and immediately terminates the process. The throw should be passed a string constant to avoid having to allocate memory for the string in that case. By convention, more information should be printed using print or println before the throw, and should be in runtime. At the beginning.

A useful way to debug runtime errors is to set GOTRACEBACK=system or GOTRACEBACK=crash.

synchronous

There are multiple synchronization mechanisms in Runtime that differ not only semantically, but also in their interactions with the GO scheduler and the operating system scheduler.

The simplest is mutex, which can be used with lock and unlock. This approach is mainly used to protect some shared data in the short term (and poor performance in the long term). Blocking on mutex blocks the entire M without interacting with the go scheduler. Therefore, it is safe to use mutex at the lowest level in Runtime, because it also prevents the associated G and P from being rescheduled (M is both blocked and cannot be scheduled). Rwmutex is similar.

Note is used for one-time notifications. Note provides Notesleep and NoteWakeup. Unlike traditional UNIX sleep/wakeup, note is race-free, so if Notewakeup has occurred, Notesleep will return immediately. Note can be reset with noteclear after use, but note that Noteclear cannot compete with Notesleep or Notewakeup. Similar to mutex, blocking on a note blocks the entire M. Note, however, provides a different way to call sleep: Notesleep prevents the associated G and P from being rescheduled; Notetsleepg behaves like a blocked system call, allowing P to be reused to run another G. Still, this is less efficient than blocking a G directly, which consumes an M.

If you need to interact directly with the GO scheduler, you can use Gopark and GoReady. Gopark suspends the current Goroutine — putting itin waiting state and removing it from the scheduler’s run queue — and then schedules another Goroutine to the current M or P. Goready restores a suspended Goroutine to a Runnable state and places it on the run queue.

It can be summarized as follows:

Blocks
Interface G M P
(rw)mutex Y Y Y
note Y Y Y/N
park Y N N

atomic

Runtime uses some of the own atomic operations in Runtime /internal/atomic. This corresponds to sync/atomic, except that the method names are slightly different for historical reasons, and there are some additional runtime required methods.

In general, we are very careful with the use of atomic in The Runtime and try to avoid unnecessary atomic operations. If access to a variable is already protected by another synchronization mechanism, then the protected access generally does not need to be atomic. There are several reasons for this:

  1. The proper use of non-atomic and atomic operations makes the code more readable, and an atomic operation on one variable means that there may be concurrent operations on the same variable somewhere else.
  2. Non-atomic operations allow automatic race detection. Runtime itself does not currently have a race detector, but it might in the future. An atomic operation will cause the race detector to ignore the detection, but a non-atomic operation will allow the race detector to verify your hypothesis.
  3. Non-atomic operations can improve performance.

Of course, all non-atomic operations on a shared variable should be documented as to how the operation is protected.

Some of the more common scenarios for mixing atomic and nonatomic operations are:

  • Most operations are reads, and write operations are locked to protected variables. Within the scope of lock protection, read operations need not be atomic, but write operations must be atomic. Read operations outside the lock protection range must be atomic.
  • Only reads occur during STW, and no writes occur during STW. So in this case, the read operation doesn’t have to be atomic.

That being said, the advice given by the Go Memory Model still holds true. Runtime performance is important, but robustness is even more important.

Unmanaged memory

In general, Runtime will try to apply for memory in the normal way (on the heap, gC-managed), but in some cases Runtime must apply for unmanaged memory out of the heap. This is necessary because it is possible that the memory is the memory manager itself, or that the caller does not have a P.

There are three ways to apply for out-of-heap memory:

  • sysAllocThe memory to be obtained from the operating system must be an integer multiple of the length of the system page table. Can be achieved bysysFreeTo release.
  • persistentallocCombine several small memory applications into one large onesysAllocTo avoid memory fragmentation. However, as the name suggests, throughpersistentallocThe allocated memory cannot be freed.
  • fixallocIs aSLABStyle memory allocator that allocates a fixed size of memory. throughfixallocAllocated objects can be freed, but memory can only be the samefixallocPool reuse. sofixallocSuitable for objects of the same type.

In general, any type of memory allocated using the above three methods should be marked with //go:notinheap (see below).

Objects allocated out of the heap should not contain pointer objects on the heap unless the following rules are also followed:

  1. All Pointers to the heap from out-of-heap memory must be garbage collection roots. That is, all Pointers must be accessible through a global variable, or explicitly usedruntime.markrootTo mark.
  2. If memory is reused, the pointer on the heap must be initialized with zero (zero-initialized, see below) before being marked as the GC root and visible to GC. Otherwise, the GC may observe stale heap Pointers. See belowZero-initialization versus zeroing.

Zero-initialization versus zeroing

There are two types of zero initialization in Runtime, depending on whether memory has been initialized to a type-safe state.

If memory is not in a type-safe state, it may contain some junk values because it was just allocated and first initialized to use. The memclrNoHeapPointers are used for zero-initialized or non-pointer writes. This does not trigger a write barrier.

Memory can be set to type-safe by writing a zero value via TypedMemclr or memclrHasPointers. This triggers the write barrier.

Runtime-only Compiler directives

In addition to the //go: compile directive noted in go doc compile, the compiler supports additional directives in the Runtime package.

go:systemstack

Go :systemstack indicates that a function must run on the systemstack, and this is dynamically verified by a special function prologue.

go:nowritebarrier

Go: noWritebarrier tells the compiler to raise an error if the following function contains a writebarrier (this does not prevent a writebarrier from being generated, it is purely an assumption).

In general you should use go: nowriteBarrierrec. Go :nowritebarrier If and only if the “best not to” writebarrier is not required for correctness.

Go: nowritebarrierrec and go: yeswritebarrierrec

Go: noWriteBarrierrec tells the compiler to raise an error if the following functions and the functions it calls (recursively) include a writebarrier until a go: yesWriteBarrierrec.

Logically, the compiler starts with each go: noWriteBarrierrec function on the generated call graph until it encounters (or ends) a go: yesWriteBarrierrec function. If one of these functions is encountered that contains a write barrier, an error is generated.

Go: noWriteBarrierrec is mainly used to implement the writebarrier itself to avoid infinite loops.

Both of these compilation instructions are used in the scheduler. The write barrier requires an active P(getg().m.p! = nil), however it is possible for scheduler-related code to run without an active P. In this case, go: noWriteBarrierrec will be used on some function that frees P or has no P, and go: yesWriteBarrierrec will be used on code that retrieves P. Because these are function-level comments, the code that frees P and gets P must be split into two functions.

go:notinheap

Go: NotinHeap applies to type declarations, indicating that a type must not be allocated on the GC heap. In particular, Pointers to this type should always fail in the Runtime.inheap judgment. This type can be used for global variables, stack variables, or objects in out-of-heap memory (such as allocated through sysAlloc, PersistentAlloc, FixAlloc, or other manually managed spans). Special:

  1. new(T),make([]T),append([]T, ...)And implicitly forTAllocation on the heap is not allowed (although implicit allocation is never allowed in Runtime).
  2. A pointer to a normal type (exceptunsafe.Pointer) cannot be converted to a pointergo:notinheapA pointer to a type, even if they have the same underlying type.
  3. Any one of them containsgo:notinheapSo are the types of types themselvesgo:notinheap. If the structure and array containgo:notinheapOf, then they are themselvesgo:notinheapType. Map and channel are not allowedgo:notinheapType. To make things clearer, any implicitgo:notinheapTypes should be explicitly identifiedgo:notinheap.
  4. Point to thego:notinheapThe write barrier for a pointer of type can be ignored.

The last point is the real benefit of the GO: NotinHeap type. Runtime uses this in the underlying structure to avoid scheduler and memory allocator memory barriers to avoid illegal checks or simply to improve performance. This method is reasonably safe and does not make the Runtime less readable.

In this paper, the author: Pure White, this paper links: www.purewhite.io/2020/10/14/… Copyright Notice: All articles in this blog are licensed under by-NC-SA unless otherwise stated. Reprint please indicate the source!