Think about how to set the GOMAXPROCS size from the container. How much is a reasonable number to set, and what does it limit, the number of CPU cores, the number of system threads, or the number of coroutines?

background

Go is built for concurrency. Go supports concurrency from the language level, through the lightweight coroutine Goroutine to realize the concurrent operation of the program, the Go keyword is powerful and concise is not available in other languages, next let us explore Golang Goroutine and coroutine scheduler design principles of it.

Go coroutines

concept

Process: The smallest unit of system resources (CPU time slice, memory, etc.) allocated by the operating system

Thread: Lightweight process, the smallest unit of operating system scheduling

Coroutines: Lightweight threads whose scheduling is controlled by programs

How to understand

How to understand the two smallest units of process, thread?

In the early process-oriented computer architecture, a process is the minimum unit of operating system allocation of system resources and operating system scheduling

But in modern computing structures, processes are upgraded to containers for threads, multiple threads share system resources within a process, and CPU execution (scheduling) objects are threads

How to understand lightweight processes and lightweight threads

Lightweight processes: Each process has its own virtual memory space, where resources include stacks, code, data, heap… Threads have a separate stack, but share all the resources of the process. In Linux implementations, the underlying data structures of processes and threads are the same, except that threads share resources within the same process.

Lightweight thread: thread stack size fixed 8 M, coroutine stack: 2 KB, dynamic growth. Multiple coroutines in a pair of threads can reduce the number of threads

What are the advantages of coroutines over threads

  • Lightweight (MB vs. KB)
  • Low switching costs (scheduling is programmatically controlled and does not need to enter kernel space. Need to save context generally less)
  • Low switching frequency, coroutine cooperative scheduling, thread scheduling is controlled by the operating system, which needs to ensure fairness. When the number of threads is large, the switching frequency is relatively high

Coroutine scheduler

In Golang, Goroutine scheduling is handled by the Golang runtime and does not require programmers to focus on coroutine scheduling when writing code.

The GM model

Goroutine’s scheduling is really a producer-consumer model.

Producer: Program goroutine (G)

Consumer: The system thread (M) consumes (executes) the Goroutine

Naturally, there needs to be a queue between producers and consumers to temporarily store the unconsumed goroutine. In Go 1.1, this model is used.

GM model problem

  • M is concurrent. Each access to the global queue requires a global lock, and lock contention is serious
  • The relationship between G is ignored. For example, when M1 executes G1, G1 creates G2, and in order to execute G2, it is likely to execute M2. While G1 and G2 are correlated, the cache probability is relatively close, which will lead to performance degradation
  • Each M allocates a cache MCache, which is wasteful

GOMAXPROCS: In this version represents the maximum number of active threads supporting GOMAXPROCS at the same time.

GMP model

  • G: Go coroutine
  • M: System thread
  • P: logical processor, responsible for providing the relevant context, memory cache management, Goroutine task queue, etc

The above GM model has serious performance problems, so as shown in the figure below, Go divides a global queue into multiple local queues, and the structure for managing local queues is called P.

P structure characteristics

  • When M passes P to G, concurrent access is greatly reduced and the local queue does not require a global lock.

  • The local G queue length for each P is limited to 256, and the number of goroutines is indeterminate, so Go also retains a global queue of infinite length.

  • The local queue data structure is an array and the global queue data structure is a linked list

  • In addition to the local queue, a runNext structure is added to prioritize the execution of the newly created Goroutine

  • MCache went from M to P

  • Set GOMAXPROCS to control the number of P’s

Consumption logic of M (get G)

  1. G is first fetched from the bound P local queue (priority runnext)
  2. Get G from the global queue periodically: every 61 dispatches look at the entire queue (to be fair) and divide G from the global queue among ps in the process
  3. If there is no G in the global queue, it will randomly pick a P to steal half of the task. If there is no G to steal, the thread will sleep. Stealing a task causes concurrent access to the local queue, so a spin lock is required to operate on the local queue

Production logic of G

  1. usego func, produces a G structure that preferentially places runnext on the current P
  2. If runNext is full, the goroutine in the current runNext is kicked out and placed at the end of the local queue, and the current Gorouine is placed in RunNext
  3. If the local queue is also full, put half of the local queue G and the G kicked out of runnext into the global queue

Introduce the problem solved by P

  • Global lock, the coroutine in P is serial
  • Data locality: when G is created, it takes precedence over the local queue of P; when M obtains available P, it takes precedence over the previously bound P
  • Memory consumption problem. Multiple threads share MCache

Going back to the original thought, the GOMAXPROCS setup problem on the container. GOMAXPROCS represents the number of P, that is, the maximum number of threads running the GO code. The default value is the number of CPU cores, which helps reduce the number of threads and thus reduce thread switching. However, in the container, the actual number of cores of the physical machine should not be used, and should be changed to the container limit.

reference

  1. Docs.google.com/document/d/…
  2. Stackoverflow.com/questions/6…
  3. Cloud.tencent.com/developer/a…
  4. Yizhi. Ren / 2019/06/03 /…
  5. Segmentfault.com/a/119000004…
  6. www.zhihu.com/question/30…
  7. www.bookstack.cn/read/qcrao-…