Lock in Ready to go-go

Go no go Wow? Roar wow! It has been two weeks, and I looked at the code of the memory model in GO, and found that the lock in the go code is really everywhere. So, I’d like to share with you golang locks (synchronization mechanism).

00. What is a lock?

Before we begin, what is a lock?

Let me give you an interesting example.

Q: Why do toilet stalls have doors?

Answer: toilet pit only one ah, don’t close the door don’t two people go up at the same time? = =! ?

Ask: that have no lock also ok, A is top, B saw don’t go in line yao?

Answer: that is not certain, in case that B quality is quite poor, hold back to panic, it is possible to be on the toilet of A drag out, seize pit!

Q:… , think of feel terrible, or add a lock, always can’t go up to half called out, and then continue for a while ah.

Well, what if “A” is in the manger? Isn’t there a lot of anger out there? (Teasing: Kind of like spin)

A:… We can talk about this later. I’m going to buy an orange. You wait here for me.

To sum up, locks can protect shared resources (pits) from being accessed by only one task (person), ensuring that each task can correctly access resources.

Lock lock

In
computer science, a
lock or
mutex (from
mutual exclusion) is a
synchronization mechanism for enforcing limits on access to a resource in an environment where there are many
threads of execution. A lock is designed to enforce a
mutual exclusion
concurrency control policy.

A lock is a synchronization mechanism used to restrict access to resources in a multitasking environment to meet the need for mutual exclusion.

The definition of a lock, then, is obvious. It is a mechanism for synchronizing access to resources in a multitasking environment.

01. Locks in computers

In a computer, tasks can be regarded as threads or coroutines, and shared resources can be regarded as critical resources. A code fragment that accesses a critical resource is called a critical section.

So to avoid race conditions, we just need to ensure that only one thread or coroutine is in the critical region.

Instead of looking at how the system is going to implement it for us, let’s think about, what if we were to design the lock ourselves?

For example, in multi-threaded programs, in order to ensure that shared resources can only be accessed by one thread, we can set a global variable flag in the shared memory space. When flag = 0, resources can be accessed; when flag = 1, resources cannot be accessed, and the thread occupying resources needs to set Flag = 0 to release possession.

There is nothing wrong with thinking about it, but have you found that Flag is also a shared resource that everyone can access? If A is occupying resources and A malicious third party appears, if the flag is set to 0, won’t you be able to access shared resources?

At this point, we need a mechanism to ensure that the flag is not tampered with.

And that mechanism is atomic manipulation. As the name implies, an atomic operation is a sequence of operations that cannot be interrupted. Personally, I think that atomic operation is relative, such as a thread for multiple threads, on a series of operations and mutex mutex can become atomic operation (with the other thread mutex), but in the kernel, it is not necessarily the atoms, or even a assembly instruction is not necessarily the atoms, through some mechanism is needed to guarantee (such as the bus lock, It will say later).

When we usually write code, the atomic operation is software level, it is realized on the basis of hardware atomic operation. So let’s start with a general understanding of atomic hardware operations

Hardware atomic operation

Hardware atomic operations can be divided into single-core and multi-core (SMP) operations, because an instruction in a single-core can be considered atomic, but in a multi-core system, if memory access is involved, it is not necessarily atomic (other cores will have threads accessing the memory at the same time).

Single core: an instruction is atomic and interrupts occur only between instructions. For example, some CPU instruction architectures provide test-and-set instructions to set attribute values atomically, making critical resources mutually exclusive.

Multicore: Multiple cores run at the same time. When one core instruction executes a single instruction to access memory, it may also be affected by other cores. Therefore, the CPU has a mechanism to lock the bus during instruction execution, and this locking is at the hardware level. Here’s a quote from someone else’s explanation, as follows:

The CPU chip has a lead #HLOCK PIN. If an assembly language program prefixes an instruction with “LOCK”, the assembled machine code causes the CPU to lower the potential of #HLOCK pin when executing the instruction until it is released at the end of the instruction, thus locking the bus. In this way, other cpus on the same bus can not access the memory through the bus temporarily, which ensures the atomicity of this instruction in the multi-processor environment.

Add: However, the bus lock locks the communication between CPU and memory, which prevents other cpus from operating on the data of other memory addresses. This means that the granularity of the lock is relatively large, resulting in a high cost. So, one way to optimize this is through CPU cache locks instead of bus locks.

Software atomic operation

Software atomic operations are based on hardware atomic operations. The Linux kernel, for example, provides two atomic manipulation interfaces, one for integers (usually used for counting operations) and one for bits. We won’t go into details here. Sync/Atomic in Golang, for example, implements a set of atomic operation interfaces in assembly language for different system architectures.

In summary, what is the significance of atomic operations?

Atomic operations can implement mutex.
When we know that an operation has so many tasks to handle in its execution environment, we can eliminate the overhead of locking.
Conversely, if we implement mutex, we can expand the scope of atomic operations. (A critical section can have multiple instructions executed)

With the above basic introduction, we will enter the theme of the go lock, is how to design ~

02. Golang lock summary

Golang’s lock is in the source sync package, which contains various synchronization methods for multitasking.

From the directory structure above, it can be seen that go is used for synchronization in many ways, the common ones are:

Atomic operation
The mutex
Read-write lock
waitgroup

Mutex, read/write locks, waitGroup in Golang all rely on atomic operations & semaphores. The semaphores in GO are implemented in the Runtime package. Let’s take a look at them one by one.

03. Runtime/Sema

Sema. go in Runtime implements semaphore processing in the Sync package as well as in the internal/poll package. Here we focus on the semaphore implementation of the SYNC package, including semaphore acquisition & semaphore release.

//go:linkname sync_runtime_Semacquire sync.runtime_Semacquire
func sync_runtime_Semacquire(addr *uint32) {
	semacquire1(addr, false, semaBlockProfile)
}
//go:linkname sync_runtime_Semrelease sync.runtime_Semrelease
func sync_runtime_Semrelease(addr *uint32, handoff bool) {
	semrelease1(addr, handoff)
}
Copy the code

The semaphore in Golang provides the blocking and wake up mechanism for goroutine, because the task in GO is the Goroutine, and the synchronization function of the semaphore is applied to the Goroutine. Look again at this comment in sema.go:

// That is, don't think of these as semaphores.
// Think of them as a way to implement sleep and wakeup
Copy the code

Its main data structure is as follows:

type semaRoot struct { lock mutex head *sudog tail *sudog nwait uint32 // Number of waiters. Read w/o the lock. } // The main structure is a linked list of sudogs and an Nwait. // Lists are used to store waiting goroutines. Nwait indicates the number of goroutines waiting on the semaphore. // lock a mutex that protects linked lists in multithreaded environments. (But this mutex is not the mutex in sync, it is a private version of sema.go // for internal use)Copy the code

In addition, Golang sets the maximum number of operable semaphores to 251, whose corresponding semaRoot structures are stored in an array of size 251, semtable, whose subscripts are modulated by the operations of incoming ADDR addresses.

Semaphore acquisition

semacquire1(addr *uint32, lifo bool, profile semaProfileFlags)

Semacquire1: if the value of semaphore addr is unsigned, let the semaphore -1 end. If = 0, the current goroutine is added to the corresponding Goroutine WaitingList, and gopark is called to block the current goroutine.

Execution flow chart:

Semaphore release

semrelease1(addr *uint32, handoff bool)

Semrelease1 sets the semaphore to +1 and then looks to see if the semaphore’s associated Goroutine WaitingList has a Goroutine. If not, end. If so, call goReady to wake up one of the Goroutines.

Execution flow chart:

Sync /mutex and sync/rwmutex will be used in sync/rwmutex.

04. Sync/atomic

Atomic operations ensure that multiple cpus (coroutines) operate atomically on the same memory area.

Specific implementation for each CPU architecture on the implementation of the assembler, roughly as follows:

Let’s pick the x86 64-bit architecture of ASM_amd64. s, because the code is more, let’s take an example.

In sync/mutex.go, we use the atomic operation code snippet as follows:

Func (m *Mutex) Lock() {// Fast path: grab unlocked mutex. if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) { if race.Enabled { race.Acquire(unsafe.Pointer(m)) } return } ......Copy the code

Here called atomic.Com pareAndSwapInt32 (addr * int32, old and new int32), it is a CAS operations, to ensure atomicity, golang is implemented through the assembly instructions. The corresponding assembly code is as follows:

// Golang assembler is based on the Plan9 assembler, which looks like a mess (not AT&T or Intel assembler). The SB pseudo-register can be thought of as an address of memory, so the symbol foo(SB) is a memory address represented by the name foo. This form is commonly used to name global functions and data. // FP: Frame Pointer On 64-bit machines, 0(FP) represents the first argument to the function; 8(FP) represents the second parameter, etc.; // NOSPLIT: the function returns a value in the EAX register (AX in Plan9) Have NOSPLIT, TEXT ·CompareAndSwapInt32(SB),NOSPLIT,$0-17 JMP ·CompareAndSwapUint32(SB) TEXT ·CompareAndSwapUint32(SB),NOSPLIT,$0-17 MOVQ addr+0(FP), BP MOVL old+8(FP), AX // Insert AX MOVL new+12(FP), CX // insert CX LOCK // insert CX LOCK, CMPXCHGL CX, 0(BP) // CMPXCHGL, which compares the contents of AX to the contents of the second operand, if equal, If (*addr == old); if(*addr == old); then *addr = new SETEQ swapped+16(FP) RETCopy the code

This is the basic set of atomic operation assembler implementations in Golang. Other instructions are not repeated and can be divided into the following categories of operations:

The CAS operation. It’s more optimistic than mutexes, compare-and-swap, it might not work.
Swap operations. So instead of doing this, you just swap.
Add and subtract operations. Atomic addition or subtraction of a number
Read and write operations. To prevent reading a variable while it is being written, our read and write operations also need to be atomic.

05. Sync/mutex

The purpose of mutex is not covered. The code for golang mutex is in sync/mutex.go.

Mutex data structures:

// A Mutex must not be copied after first use. type Mutex struct {state int32 an integer, Sema uint32 // IOTA, a special constant, can be considered as a constant that can be modified by the compiler. // Each occurrence of the const keyword is reset to 0, and each occurrence of the ioTA is automatically incremented by 1 until the next occurrence of the const. Const (mutexLocked = 1 << ioTA // value 1, indicating that the lock is available,0 is available, 1 is not available mutexWoken // value 2, indicating that the lock is second in state, meaning: MutexWaiterShift = iota // The offset in state to count the number of goroutine blocks on the mutex.Copy the code

Is it confusing to see state? Above:

State Specifies the function of each bit

Here is the code flow:

Func (m *Mutex) Lock(

In simple terms, if the current Goroutine can be locked, then the atomic operation is called to make the flag in mutex set to occupied and mutually exclusive. If the current goroutine finds that the lock is occupied, a conditional loop will attempt to acquire the lock. Instead of using the semaphore to perform sleep and wake operations on the goroutine (as much as possible to avoid overhead), if the loop fails, an atomic race will be called. Otherwise, runtime_Semacquire must be called to determine whether to block goroutine or to continue contention for the lock.

Func (m *Mutex) Unlock() — Unlock

Simply put, set the lock flag in state to 0, release the lock, and then release the blocked Goroutine to scramble for the lock as appropriate.

06. Sync/RWMutex (read/write lock)

RWMutex is a read-write lock that can be applied to multiple read locks or a write lock. It is often used in situations where the number of reads is much greater than the number of writes (Mutex makes no difference if the reads and writes are equal). However, when a read lock is added, the read lock can only be continued. When there is a write lock, no other lock can be loaded. That is, only read-reads are shared; the others are mutually exclusive.

The RWMutex data structure is as follows:

type RWMutex struct { w Mutex // held if there are pending writers writerSem uint32 // semaphore for writers to wait for  completing readers readerSem uint32 // semaphore for readers to wait for completing writers readerCount int32 // number  of pending readers readerWait int32 // number of departing readers }Copy the code

Read/write locks, it seems, are based on mutex and semaphore.

Read and write lock code, after removing the race detection code, relatively simple, directly on (temporarily uncommented, test children)

Rlock — Read lock

Func (rw *RWMutex) RLock() {// readerCount is less than 0. Looking down, writer's Lock() subtracts readerCount (atomic operation) to indicate that writer is present. ReaderSem > 0, -1, no operation; = 0, Goroutine if atomy.addint32 (&rw.readerCount, 1) < 0 {// A writer is pending, wait for it. runtime_Semacquire(&rw.readerSem) } }Copy the code

RUnlock — Read release

func (rw *RWMutex) RUnlock() {
	if r := atomic.AddInt32(&rw.readerCount, -1); r < 0 {
		if r+1 == 0 || r+1 == -rwmutexMaxReaders {
			race.Enable()
			throw("sync: RUnlock of unlocked RWMutex")
		}
		// A writer is pending.
		if atomic.AddInt32(&rw.readerWait, -1) == 0 {
			// The last reader unblocks the writer.
			runtime_Semrelease(&rw.writerSem, false)
		}
	}
}
Copy the code

Lock — Write Lock

func (rw *RWMutex) Lock() { // First, resolve competition with other writers. rw.w.Lock() // Announce to readers there is a pending writer. r := atomic.AddInt32(&rw.readerCount, -rwmutexMaxReaders) + rwmutexMaxReaders // Wait for active readers. if r ! = 0 && atomic.AddInt32(&rw.readerWait, r) ! = 0 { runtime_Semacquire(&rw.writerSem) } }Copy the code

Unlock — Write and release

func (rw *RWMutex) Unlock() {
	// Announce to readers there is no active writer.
	r := atomic.AddInt32(&rw.readerCount, rwmutexMaxReaders)
	if r >= rwmutexMaxReaders {
		race.Enable()
		throw("sync: Unlock of unlocked RWMutex")
	}
	// Unblock blocked readers, if any.
	for i := 0; i < int(r); i++ {
		runtime_Semrelease(&rw.readerSem, false)
	}
	// Allow other writers to proceed.
	rw.w.Unlock()
}
Copy the code

7. Afterword.

By learning the implementation of locking in Golang from the bottom up, I was able to have a deeper understanding of locking and multitasking synchronization. If there are any shortcomings, please correct them. I will add new understanding to this article later

As an aside, in the previous picture, this is my hard learning environment…

directory

00. What is a lock?

01. Locks in computers

02. Golang lock summary

03. Runtime/Sema

04. Sync/atomic

05. Sync/mutex

06. Sync/RWMutex (read/write lock)

7. Afterword.

Related Posts

Springboot + WebSocket, one is enough

Linux command learning magic! The command does not understand directly to explain to you!

Kubernetes architecture Overview | Pure White