Memory model in Golang

The Go memory model specifies the conditions under which reads of a variable in one goroutine can be guaranteed to observe values produced by writes to the same variable in a different goroutine.

  • In other words, the Go memory model guarantees that writes to the same variable in one Goroutine can be observed in another goroutine by defining the following conditions.

Happens Before

In a Goroutine, the order of read and write must be the order of execution in the installer. The compiler and processor may modify the read and write order only if they are not changing the behavior of the Goroutine.

However, due to the rearrangement, different gouroutines will see different execution orders.

For example, a goroutine executes a = 1; b = 2; , another Goroutine may see B updated before A.

To illustrate the necessity of reading and writing, go introduces the happens-before primitive to describe conditions specified in the memory model. Happens Before.

Happens Before

  • If event E1 occurs before event E2, then we say that e2 occurs after event E1. Similarly, if E1 does not precede e2 and does not follow E2, then we say that E1 and E2 occur at the same time.

What conditions are specified by the memory model

The happens-before of go defines two sets of conditions:

The first set of conditions:

When the above two conditions are met, the read operation r of variable v is allowed to monitor the write operation W of v

  • R does not precede W
  • There are no other writes to v after w before r

The second set of conditions:

To ensure that read operation R on variable v can detect a particular write operation W on v, you need to ensure that w is the only write operation r is allowed to see. That is, when the following conditions are met, r can ensure that W can be monitored

  • W precedes R
  • Any other writes to the shared variable v can occur only before or after w

Within a single Goroutine, both sets of conditions are defined the same. However, in a multi-Goroutine environment, the second set of conditions is more stringent because it needs to ensure that no other writes occur at the same time as w or R

If we look at the first set of conditions, why it’s not as strict as the second set of conditions, just because r doesn’t precede W doesn’t mean that r comes after W, because they can happen simultaneously.

Therefore, it is important to note that the first set of conditions says that for two concurrent goroutines, it is uncertain whether one goroutine can read the data written in the other goroutine. It may or may not be readable.

Occur in the second set of conditions, due to the r w, any other writes to a Shared variable v can only occur after or before w r, it means that between r and w during this period, there is no other writes w ‘, there is no parallel writes w and r ‘ ‘happen, so we can say that r read value must be write the values of w.

The following image is taken from the go compilation language website:

-- w0 ---- R1 -- w1 ---- w2 ---- R2 ---- R3 ----> Here is not only a partial order relation, but also a good order relation: all r/ W sequence is comparable. Case of double Go path: - w0, r1, r2, w3, w4, r5 -- -- -- -- -- -- -- -- > - w1 - w2 - r3, r4, w5 -- -- -- -- -- -- -- -- > single Go ride on events have order; For the two Go paths, the situation is different. Even if R1 occurs before W2 in time, the execution time of each Go is as elastic as a rubber band, so they have no logical order. In other words, both are concurrent. For concurrent R/W, R3 may read the previous W2, the above W3, or even the value of W4; R5 may read the value of W4, w1, W2, w5, but it cannot be the value of W3. Case of cross synchronization of double Go paths: - r0 - r1 - | -- -- -- -- -- - r2 -- -- -- -- -- -- -- -- -- -- -- - | - w5 -- -- -- -- -- - > - w1 - w2 - | - r3, r4, w4 - | -- -- -- -- -- -- -- > now added two synchronous points above, Namely the |. In this case, R3 comes after R1, and before W5. R2 is written to w2 before, but is concurrent with W4, so the value of R2 is uncertain: it can be w2 or W4. R4 writes to w2 before r4 writes to w2, but does not write to it concurrently, so R4 reads w2.Copy the code

Personal Understanding:

From the above explanation, we know that in the case of multiple Goroutines, if we do not add synchronization control, then all goroutines run in parallel, similar to running in two dimensions. There is no way to determine which is first or last for read or write operations in two dimensions. So you have to introduce synchronous control. With the introduction of synchronization control, there is only one “same node” in the two Goroutines, and on both sides of this “same node” there is also the order of execution. But the part between the two “nodes” is also free to scale, in no order.

What are the synchronization mechanisms in Golang?

1. Initialization

Program initialization runs in a single goroutine, but that goroutine may create other goroutines, which run concurrently.

If a package p imports package q, the completion of q’s init functions happens before the start of any of p’s.

The start of the function main.main happens after all init functions have finished.

  • The initialization of the program runs in a single Goroutine, but that Goroutine may create other goroutines that run concurrently
  • If package P imports package Q, then q’s init function completes before any of p’s functions start.
  • The function main.main is started after all init functions have finished.

2. Goroutine creation

The go statement that starts a new goroutine happens before the goroutine’s execution begins.

  • The creation of a goroutine happens before the execution of the goroutine

3. Destruction of goroutine

The exit of a goroutine is not guaranteed to happen before any event in the program.

  • The Go program cannot ensure that it exits before any event in the program occurs.
var a string

func hello() {
	go func() { a = "hello" }()
	print(a)
}
Copy the code

There is no synchronization event after the assignment to A, so it is not guaranteed to be detected by any other Go procedure. In fact, an active compiler might delete the entire GO statement.

If the action of one Go program must be monitored by another Go program, a synchronization mechanism such as lock or channel communication is used to establish a sequential relationship.

4. channel

Channel communication is the main method of synchronization between goroutines. Each send on a particular channel is matched to a corresponding receive from that channel, usually in a different goroutine.

  • Channels are the primary method of synchronization between Goroutines. Each send on a particular channel is matched by a corresponding receive operation, which usually occurs on a different Goroutine.

A channel has four different synchronization modes to ensure that event A happens before event B.

The first:

A send on a channel happens before the corresponding receive from that channel completes.

A send operation to a channel occurs before its receive operation,

Such as:

var c = make(chan int, 10)
var a string

func f() {
	a = "hello, world"
	c <- 0
}

func main() {
	go f()
	<-c
	print(a)
}
Copy the code

It’s guaranteed to print hello world, because the assignment to A takes place before the sending of C; The sending of C takes place before the receiving of C; The receipt of C occurs first at print.

The second:

The closing of a channel happens before a receive that returns a zero value because the channel is closed.

The closure of a channel occurs first when the channel returns zero

For example, in the code above, c < -0 can be changed to close(c). The effect is the same

The third:

A receive from an unbuffered channel happens before the send on that channel completes.

In an unbuffered channel, the receive operation to it takes place before sending to the channel is complete

Such as:

var c = make(chan int)
var a string

func f() {
	a = "hello, world"
	<-c
}
func main() {
	go f()
	c <- 0
	print(a)
}
Copy the code

It is also guaranteed to print hello, world, because the assignment to a takes place on the corresponding channel C; The receiving of C takes place after the sending of C is completed; The completion of sending C occurs first at print

Fourth:

The kth receive on a channel with capacity C happens before the k+Cth send from that channel completes.

This rule generalizes the previous rule to buffered channels. It allows a counting semaphore to be modeled by a buffered channel: the number of items in the channel corresponds to the number of active uses, the capacity of the channel corresponds to the maximum number of simultaneous uses, sending an item acquires the semaphore, and receiving an item releases the semaphore. This is a common idiom for limiting concurrency.

The KTH receive operation takes place before the completion of the k+1 send

This rule generalizes the previous rule to buffer channels. It allows a counting semaphore to be modeled by a buffer channel: the number of items in the channel corresponds to the number of activities used, the capacity of the channel corresponds to the maximum number of simultaneous uses, sending one item to get the semaphore, receiving one item to release the semaphore. This is a common idiom for limiting concurrency.

This approach can often be used to control the number of concurrent requests.

For example:

var limit = make(chan int, 3)

func main() {
	for _, w := range work {
		go func(w func()) {
			limit <- 1
			w()
			<-limit
		}(w)
	}
	select{}
}
Copy the code

This limits the number of work threads to three at a time.

The lock

The sync package implements two lock data types: sync.mutex and sync.rwmutex.

For any variable l of type sync.mutex or sync.rwmutex, and n < m, the NTH call to l.lock () occurs before the MTH call to l.lock () returns.

This program:

var l sync.Mutex
var a string

func f() {
	a = "hello, world"
	l.Unlock()
}

func main() {
	l.Lock()
	go f()
	l.Lock()
	print(a)
}
Copy the code

Make sure you print “Hello, world”. The program makes the first call to l.lock () (in f), then the second call to l.lock (in main), and finally the print function.

For any call to l.lock from l of type sync.rwmutex, there is such an n that l.lock occurs (returns) after the NTH call to l.lock, And the matching L.unlock occurs before the n+1 call to L.lock.

Once the type

The Sync package provides a secure mechanism for initializing multiple Go programs through the Once type. Multiple threads can perform once.do (f) for a particular f, but only one will run f(), and the other calls will block until f() returns.

A single call to f() via once.do (f) occurs before any other once.do (f) call returns.

In this program:

var a string var once sync.Once func setup() { a = "hello, World "} func doprint() {once.Do(setup) print(a)} func twoprint() {go doprint() go doprint()} Call twoprint will print twice "Hello, world". The first call to twoprint runs setup once.Copy the code

What problem does the Go memory model solve?

So what’s the point of defining a memory model? In fact, in the go memory model, we mainly need to solve the problem of visibility, that is, in the multi-threaded environment, how can write variables in A Goroutine make read operations in B Goroutine visible.

Why do visibility problems occur in multithreaded environments?

Because of memory rearrangement, there are two memory rearrangements, one is software level rearrangement, one is hardware level rearrangement.

  • Hardware rearrangement mainly refers to compiler optimization, that is, compiler rearrangement. Compiler rearrangement refers to the optimization of compiled instructions to improve efficiency without changing the original semantics of the program.

  • Software rearrangement is mainly about CPU rearrangement. In order to improve the efficiency of CPU execution and maximize its performance, CPU designers have used various means to it, such as branch prediction, pipeline, etc. It also includes rearranging CPU operations on memory to improve memory read and write efficiency.

How does rearrangement cause visibility problems?

In order to improve the efficiency of CPU operations on memory, the CPU has proposed various strategies, such as the three-level cache strategy we often hear about.

Each core of the CPU has its own Store Buffer, and there are also levels of L1-L2-L3 cache. To improve performance, the CPU does not have to wait for a variable operation to spread to memory before performing other operations, namely the rearrangement of read and write instructions. So, this leads to other cores not necessarily reading the most recent value at any given moment.

In a single-threaded environment, this is fine, but in a multi-threaded environment, it can cause data visibility problems. Therefore, all cpus provide so-called “lock” support, called barriers, or fences.

A barrier instruction forces all memory operations before it to complete before any memory operation after it can begin.

The barrier directive requires that all operations on memory must “diffuse” into memory before any other operations can proceed. These are known as memory barriers.

Memory barriers are classified as Store barriers, Load barriers, and Full barriers. They serve two functions:

  1. Prevents reordering between instructions
  2. Ensure visibility of data

Regarding the first point, regarding the reordering of instructions, without considering the architecture, the Load and Store operations will have load-store, store-load, load-load, and store-store possible out-of-order results. The three barriers mentioned above are mechanisms that limit these different kinds of disorder.

On the second point. The write barrier blocks until the Store Buffer is flushed to the Cache; The read barrier blocks until the message in the Invalid Queue finishes executing. In order to ensure the consistency of data at all levels between cores.

This refers to data consistency between L1-L2-L3-main memory. However, it is not the programmer’s concern to solve the problem of cache consistency at each level, but the CPU implementation, such as the Intel CPU MESI protocol

Therefore, it is important to note that memory barriers address only sequential consistency, not Cache consistency (which is the responsibility of the Cache consistency protocol and is not a programmer’s concern). Components such as Store Buffer and Load Buffer are part of the pipeline and have nothing to do with Cache. The Cache Consistency protocol only guarantees Cache Coherence, not Sequential Consistency. For example, if one processor writes A variable A only A short time before another processor reads A, there is no guarantee that the read will return the newly written value.

Barrier instructions consume hundreds of CPU cycles and are error-prone. Therefore, we can use advanced atomic compare-and-swap, or we can simply use more advanced locks, usually provided by the standard library. The Atomic package in Go, for example, provides atomic operations.

At the end

In fact, CPU rearrangement also involves the problem of data consistency in the case of multi-core CPU, which involves the CPU cache consistency protocol, such as the MESI protocol proposed by Intel.

CPU cache lines, CPU Store buffers, invalid queues, etc. Then write an article about your understanding.

In this paper, the reference

Coolshell. Cn/articles / 20…

www.huaweicloud.com/articles/0f…

zhuanlan.zhihu.com/p/37749443

go-zh.org/ref/mem

www.jianshu.com/p/5e44168f4…

Blog.csdn.net/wll1228/art…

zhuanlan.zhihu.com/p/37749443

zhuanlan.zhihu.com/p/59055457

Github.com/cch123/gola…

Blog.csdn.net/qcrao/artic…