preface

Before looking at Go’s Goruntine, let’s review the difference between concurrency and parallelism:

  • Concurrency: Execute multiple tasks at the same time.
  • Parallelism: Perform multiple tasks at the same time.

On a single-core processor, the CPU time slice is executed serially (concurrent rather than parallel) through multiple threads. Parallelism, on the other hand, relies on physical resources such as multi-core processors, allowing multiple tasks to be executed in parallel (concurrently and in parallel).

In addition: interested in learning network programming, recommend a training camp: hands-on combat network programming, can use the invitation code: AqCJeLyy preferential.

GPM basic process

GPM meaning

  • G: A goroutine, a task I need to share out (brick);
  • P: a queue full of G, used to maintain some tasks (loading vehicles);
  • M: an operator that moves a G to the thread for execution (worker);

Go scheduler basic scheduling process

  1. Create a G object (Goruntine);
  2. Save G to P;
  3. P wakes up (tells) an M and continues its execution (allocates the next G);
  4. M looks for the free P and reads the G to be allocated by that P;
  5. M then performs a scheduling loop, calling G → execute → clean up the thread → continue to find a new G to execute.

Briefly state their respective tasks:

  • G, carrying missions;
  • P, assign tasks;
  • M, search task;

Specific process of GMP scheduling

Important attributes of G, M and P (information)

  • G

    • An instruction to execute a function (pointer)
    • Thread context information (context used to hold G when goroutine switches, e.g., variables, related information, etc.)
    • Field protection and field recovery (for protection during global queue execution)
    • The function stack to which it belongs
    • M currently executed
    • The time it was blocked
  • P

    P/M needs to be bound to form a unit of execution. P determines the number of concurrent tasks that can be performed concurrently, and GOMAXPROCS limits the number of operating system threads that can simultaneously execute user-level tasks.

    • Status (idle, running…)
    • Associated with m
    • A queue of runnable Goroutines
    • The next g
  • M

    All M’s have thread stacks. Memory is provided to the thread stack if it is not (different operating systems provide different sizes of thread stacks).

    • The owning scheduling stack
    • Currently running G
    • The associated P
    • state

Having listed the important attributes of each of the three structures, let’s now look at the detailed running process.

Prepare knowledge

The stack

Plain stack: Plain stack refers to the stack of goroutines that need to be scheduled. It is a growable stack because goroutines can be opened more and more.

Thread stack: Thread stack is made up of m’s that need to put the Goroutine on the thread. M’s are essentially generated by the Goroutine. Thread stack size is fixed (with the number of M’s set). All scheduling-related code is switched to the goroutine stack before execution. This means that the thread stack is implemented using G, not OS.

The queue

Global queue: G stored in this queue will be shared globally by all M. To ensure data competition, lock is required.

Local queue: this queue stores tasks of the same data resources. Each local queue is bound to an M, which is assigned to complete the task. There is no data competition, no locking is required, and the processing speed is much higher than that of the global queue.

Context switch

It can be simply understood as the environment at that time. The environment can include the state of the program and the state of variables at that time.

For a value in code, the context is the local (global) scoped object in which the value resides. In contrast to a process, a context is the environment in which the process is executing, specifically variables and data, including all register variables, open files of the process, memory (stack) information, etc.

Threads to clean up

Since each P needs to bind a M for task execution, when cleaning the thread, only P needs to be released (unbind) (M has no task). P is released mainly in two cases:

  • Active release: The most typical example is when there is a system call when a G task is executed and M is blocked when a system call occurs. The scheduler sets a timeout period and releases P when it does.
  • Passive release: If a system call occurs, there is a dedicated monitor that scans for the currently blocked P/M combination. When the timeout period is exceeded, the system automatically steals P resources. To perform other G tasks in the queue.

Blocking is when a running thread does not finish running, temporarily freeing the CPU.

Preemptive scheduling

In Runtime. main, an extra M is created to run sysmon, where preemption is implemented.

Sysmon goes into an infinite cycle, sleeping for 20 seconds in the first cycle, doubling the sleep time in each subsequent cycle, and eventually sleeping for 10ms in each cycle. Sysmon has netpool(fetch FD events), retake(preempt), forcegc(enforce GC on time), scheap (release freelist items to reduce memory usage), and so on.

Preempt the conditions

  1. If P is in the system call and the duration has already passed oncesysmonAfter, preempt; callhandoffpDisassociate M from P.
  2. If P is running and the time passes oncesysmonIf the blocking duration exceeds the set blocking duration, preemption occurs. Set a flag indicating that the function can be aborted. When the call stack recognizes this flag, it knows that preemption is triggered and checks again for preemption.

Detailed process

  • The basic process is the same as above. When a g is created, a P is created first for storage. When p reaches the limit, it is added to the queue in waiting state.

  • If g execution needs to be blocked, a context switch is performed and the system returns the resource to continue execution.

  • When a G blocks on a M for a long time, the Runtime creates a new M, and the P where G blocks will mount other GS on the new M. Reclaim the old M when the old G blocks or is thought to be dead (preemptive scheduling).

  • P will perform some scheduling on its own goroutine queue (such as suspending the goroutine that occupies a long CPU time, running the subsequent goroutine, etc.). When its own queue is consumed, it will fetch from the global queue. If the global queue also runs out, it will grab the task from another queue of P (so it needs to store the address of the next G separately, rather than fetch it from the queue).

conclusion

  • Than most concurrent design model, the design of the comparative advantages of Go is P the emergence of the concept of context, if only the G and M corresponding relation, so when G block on the IO M is not practical at work, so that the waste of resources caused by without the P, then all the list of G are placed in the global, As a result, the critical region is too large, which greatly affects multi-core scheduling.

  • Goroutine can be used to do both intensive multi-core computing and high concurrency IO applications. When writing IO applications, it feels like the most programmer-friendly synchronous blocking. In fact, due to runtime scheduling, The underlying layer operates in a synchronous, non-blocking manner (i.e., IO multiplexing).

So the core idea of preemptive scheduling to protect the scene and passing G to other CALLS to M when blocked is what makes goroutine.

  • In terms of thread scheduling, Go has an advantage over other languages in that OS threads are scheduled by the OS kernel, while Goroutine is scheduled by the Go Runtime’s own scheduler. This scheduler uses a technique called M: N scheduling (multiplexing/scheduling m Goroutines to N OS threads).

  • One of the most important features is that goroutine scheduling is completed in user mode, does not involve frequent switching between kernel mode and user mode, including memory allocation and release, in user mode to maintain a large memory pool, do not directly call the system malloc function (unless the memory pool needs to change). The cost is much lower than scheduling OS threads.

  • On the other hand, it makes full use of multi-core hardware resources, approximately divides several Goroutines into physical threads, and the ultra-light weight of its own Goroutine ensures the performance of GO scheduling.

More technical articles, all in [Go keyboard man] public number, welcome to pay attention to ~