Go language is characterized by concurrency, which inevitably leads to competition for resources. In this case, we need to use sync.Mutex, a Mutex lock provided by GO, to ensure the access of critical resources mutually exclusive.

Since this lock is often used, a look at its internal implementation will give you an idea of what scenarios and features this lock applies to.

primers

When looking at the sync.mutex code, keep in mind that multiple Goroutines are requesting the lock at the same time, so the lock state can change all the time.

The nature of the lock

Conclusion first: Sync. Mutex is a fair lock.

In the source code, there is a comment:

// Mutex fairness. // // Mutex can be in 2 modes of operations: normal and starvation. // In normal mode waiters are queued in FIFO order, but a woken up waiter // does not own the mutex and competes with new arriving goroutines over // the ownership. New arriving goroutines have an advantage -- they are // already running on CPU and there can be lots of them, so a woken up // waiter has good chances of losing. In such case it is queued at front // of the wait queue. If a waiter  fails to acquire the mutex for more than 1ms, // it switches mutex to the starvation mode. // // In starvation mode ownership of the mutex is directly handed off from  // the unlocking goroutine to the waiter at the front of the queue. // New arriving goroutines don't try to acquire the  mutex even if it appears // to be unlocked, and don't try to spin. Instead they queue themselves at // the tail of the wait queue. // // If a waiter receives ownership of the mutex and sees that either // (1) it is the last waiter in the queue, or (2) it waited for less than 1 ms, // it switches mutex back to normal operation mode. // // Normal mode has considerably better performance as a goroutine  can acquire // a mutex several times in a row even if there are blocked waiters. // Starvation mode is important to prevent pathological cases of tail latency.Copy the code

Understanding this comment will help us to understand the mutex lock. It tells us the design concept of the lock. The general idea is as follows:

// Fair lock // // Lock has two modes: normal mode and hungry mode. // In normal mode, all goroutines waiting for a lock are stored in a first-in, first-out queue. You still have to compete with the new arrivial // goroutine, which is unfair because the new arrivial Goroutine has one advantage -- it's running on the CPU, and there may be a lot of them. So the probability of an awakened Goroutine getting the lock is very small. In this case, // the awakened goroutine is added to the head of the queue. If a waiting Goroutine has not acquired the lock for more than 1ms (written dead in code), the lock will be put into starvation mode. // // In starvation mode, the ownership of the lock is transferred directly from the unlock goroutine to the queue head goroutine. // The goroutine requesting the lock does not acquire it even if it is idle and does not attempt to spin. They just go to the end of the queue. // // If a Goroutine has acquired a lock, it determines two things: // 1. It is the last goroutine in the queue; // 2. It takes less than 1ms to get the lock; // If one of the above is true, it will switch the lock back to normal mode. // Normal mode has better performance because a goroutine can attempt to request locks more than once, even if there are many blocked goroutines waiting for locks. // Starvation mode is very important to prevent tail lag.Copy the code

Before we actually look at the source code next, we must understand one thing: When a Goroutine acquires a lock, there may be no competitors, or there may be many competitors. Therefore, we need to consider the transition between the lock state seen by the Goroutine and its actual and expected state from the perspective of different Goroutines.

Field definition

Sync.mutex contains only two fields:

// A Mutex is a mutual exclusion lock. // The zero value for a Mutex is an unlocked mutex. // // A Mutex must not be copied after first use. type Mutex struct { state int32 sema uint32 } const ( mutexLocked = 1 << iota // mutex is locked  mutexWoken mutexStarving mutexWaiterShift = iota starvationThresholdNs = 1e6 )Copy the code

The zeroth bit (1) indicates that the lock has been acquired. The zeroth bit (1) indicates that the lock is owned by a goroutine. The first bit (2) indicates that a Goroutine has been awakened to try to acquire the lock; The second bit (4) marks whether the lock is hungry.

The semA field is the semaphore used to wake up the Goroutine.

Lock

Before we look at the code, we need to keep in mind that each Goroutine also has its own state, which exists in local variables (that is, in the function stack). Goroutine can be newly arrived, awakened, normal, or hungry.

atomic.CAS

Take a look at the most basic line of code to lock the CAS operation:

// Lock locks m.
// If the lock is already in use, the calling goroutine
// blocks until the mutex is available.
func (m *Mutex) Lock() {
	// Fast path: grab unlocked mutex.
	if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
		if race.Enabled {
			race.Acquire(unsafe.Pointer(m))
		}
		return
	}
	...
}
Copy the code

This is the first code that calls the CompareAndSwapInt32 method in the atomic package to try to quickly acquire the lock. This method is signed as follows:

// CompareAndSwapInt32 executes the compare-and-swap operation for an int32 value. func CompareAndSwapInt32(addr *int32,  old, new int32) (swapped bool)Copy the code

If the address that addr points to contains the same value as old, change the value in addr to new and return true. Otherwise, do nothing and return false. Because it is a function of atomic, it is guaranteed atomicity.

We to the realization of specific see CAS (SRC/runtime/internal/atomic/asm_amd64. S) :

// bool Cas(int32 *val, int32 old, int32 new) // Atomically: // if(*val == old){ // *val = new; // return 1; // } else // return 0; / / this parameter and the size of the return value added up to 17, because a pointer under the amd64 is 8 bytes, / / and then int32 is 4 bytes, respectively, the return value is a bool takes up 1 byte, TEXT runtime/internal/atomic·Cas(SB),NOSPLIT,$0-17 Because AX has special uses, // in the following CMPXCHGL, MOVL new+12(FP), MOVL new+12(FP), MOVL new+12(FP), CX // // Compare the number in AX with the second operand, 0(BX), which is the address pointed to by the BX register // If equal, CMPXCHGL CX, 0(BX), 0(BX), 0(BX), 0(BX) // If the Zero Flag register is 1, set the operand to 1; otherwise, set the operand to 0; that is, return true if the comparisons are equal, Otherwise false // ret+16(FP) represents the address of the returned value SETEQ ret+16(FP) retCopy the code

It doesn’t matter if you don’t understand it, as long as you know what this function does, and that this function is atomic.

Check to see if the lock is idle, and if so, just change state atomically to acquired. How neat (though this code isn’t…) !

Main process

Now let’s look at the code for the main flow. There are some bit operations in the code that look a little bit dizzy, so I’ll try to use pseudo code to comment on the side.

// Lock locks m. // If the lock is already in use, the calling goroutine // blocks until the mutex is available. func (m *Mutex) Lock() { // Fast path: grab unlocked mutex. if atomic.CompareAndSwapInt32(&m.state, 0, MutexLocked) {if race.enabled {race.acquire (unsafe.Pointer(m))} return} // The current goroutine wait time var waitStartTime Starving := false // The supply of the goroutine awoke awoke the diet := false // Used to store the current number of goroutine loops (think of a Goroutine that was looped 2147483648 times...) Iter := 0 // duplicate the current lock state old := m.state // spin for {// If the lock is hungry, do not spin, because the lock will be handed directly to the goroutine at the head of the queue // if the lock is acquired, If the spin condition is met (see canSpin below), then the spin equal-lock // pseudocode: if isLocked() and isNotStarving() and canSpin() if old&(mutexLocked|mutexStarving) == mutexLocked && Runtime_canSpin (iter) {// Set the Unlock state and lock state to wake, so that the Unlock does not wake any other blocked goroutine. awoke && old&mutexWoken == 0 && old>>mutexWaiterShift ! = 0 && atomic.CompareAndSwapInt32(&m.state, old, Old | mutexWoken) {awoke = true} / / spin (see article) after analysis runtime_doSpin () iter++ / / update the status of the lock (probably in a spin lock within this period of time of change has been other goroutine state) old = m.spate continue} // When this step is completed, the following situation may occur: // 1. Lock acquired + hungry // 2. Lock acquired + normal // 3. Lock idle + hungry // 4. Lock idle + normal // The goroutine state may be wakeup or not wakeup // A copy of the current state, the purpose is to set the desired state according to the current state, stored in new, New := old // If the lock is not hungry, set the expected state to be acquired (get the lock). }}}}}}}}}}}}}}}}}}} NewState because = locked new | = mutexLocked} / / if the lock is access to state, Or the hungry state // will be the number of waiting queue in the expected state +1(actually new + 8) // (could there be 300 million goroutines waiting for the lock...) if old&(mutexLocked|mutexStarving) ! = 0 {new += 1 << mutexWaiterShift} // If the current goroutine is hungry and the lock is acquired by another goroutine, set the desired lock state to hungry // If the lock is free, So Unlock expects starving && old&mutexLocked, not just one goroutine = 0 {/ / desired state is set to the hunger state new | = mutexStarving} / / if the current goroutine is awakened state, because we need to reset the state / / goroutine either get the lock, // If the awoke state is not woken, something is wrong // Wake's logic is below if new&mutexWoken == 0 {throw("sync: The state of the CAS is inconsistent with the state of the CAS lock. The state of the CAS lock is inconsistent with the state of the CAS lock. The state of the CAS lock is inconsistent with the state of the CAS lock. The state of the CAS lock is inconsistent with the state of the CAS lock. Can also be set only for the hungry and waiting for the number of the if atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old, New) {// If the old state is not hungry or acquired // then the current goroutine has successfully acquired the lock through CAS. That is state from free to obtain) if old & (mutexLocked | mutexStarving) = = 0 {break / / locked the mutex with CAS} / / if they had been waiting before, QueueLifo := waitStartTime! If waitStartTime == 0 {waitStartTime = runtime_nanotime()} Use the sleep primitive to block the current goroutine // queue the lock by semaphore // If it is a new Goroutine, put it at the end of the queue // If it is an awakened goroutine waiting for the lock, Runtime_SemacquireMutex (&m.sema, queueLifo) Awakened / / if the current goroutine is hungry / / or the current goroutine has been waiting for 1 ms (above defined above constants) / / set the current state of the goroutine to starving hungry = starving | | Runtime_nanotime ()-waitStartTime > starvationThresholdNs old = m.state The current goroutine was awakened by the semaphore, that is, the lock was handed directly to the current Goroutine if old& Mutexmedicine! = 0 {// If the current lock is in the wake state or acquired state, or the waiting queue is empty, then it is impossible, because the current state should be waiting queue, Lock must be released also state and did not wake up if old & (mutexLocked | mutexWoken)! = 0 || old>>mutexWaiterShift == 0 { throw("sync: Inconsistent mutex state")} // The current goroutine is locked. Delta := int32(mutexLocked -1 <<mutexWaiterShift) // If the goroutine is not hungry, Or if the current goroutine is the last goroutine in the queue, exit starvation mode and set the state to normal if! starving || old>>mutexWaiterShift == 1 { // Exit starvation mode. // Critical to do it here and consider wait time. // Starvation mode is so inefficient, that two goroutines // can go lock-step infinitely once they switch mutex // to starvation mode. delta -= mutexStarving } // Add the changed state atomically.AddInt32(&m.state, delta) break} // If the lock is not in starvation mode, Set the current goroutine as awoke // and reset the awoke ITER (spin) else {// If the CAS was not successful, that is, not succeeded in getting the lock, Old = m.state}} if race.enabled {race.acquire (unsafe.Pointer(m))}}Copy the code

Why can CAS get the lock above? Because CAS atomically determines whether the old state is the same as the current lock state; There is always a Goroutine that meets the above criteria.

canSpin

Let’s take a look at the canSpin condition above:

// Active spinning for sync.Mutex. //go:linkname sync_runtime_canSpin sync.runtime_canSpin //go:nosplit func Sync_runtime_canSpin (I int) bool {// Active_spin is a constant with a value of 4. So there should not be a lot of spin (CPU consumption) // The conditions for spin are as follows: // 1. The number of spins is less than active_spin(4 here); // 2. On a multi-core machine; // 3. GOMAXPROCS > 1 and at least one other running P; // 4. Currently P has no other G waiting to run; // Only when the above four conditions are met can spin be performed. if i >= active_spin || ncpu <= 1 || gomaxprocs <= int32(sched.npidle+sched.nmspinning)+1 { return false } if p := getg().m.p.ptr(); ! runqempty(p) { return false } return true }Copy the code

So you can see that it doesn’t go on indefinitely, but when you get to four spins or something else doesn’t work, it’s a semaphore lock.

doSpin

Then let’s take a look at the implementation of doSpin (which isn’t much fun) :

//go:linkname sync_runtime_doSpin sync.runtime_doSpin
//go:nosplit
func sync_runtime_doSpin() {
	procyield(active_spin_cnt)
}
Copy the code

This is an assembly implementation of the function, a brief look at the implementation of amD64:

TEXT Runtime ·procyield(SB),NOSPLIT,$0-0 cycles+0(FP), AX again: PAUSE SUBL $1, AX JNZ again RETCopy the code

It doesn’t look like much, so just skip it.

Unlock

Now let’s look at the implementation of Unlock. There are two key features of Unlock:

  1. If the lock is not in locked state, Unlock causes panic.
  2. There is no correspondence between a lock and a Goroutine, so you can get the lock in Goroutine 1 and Unlock in Goroutine 2. (Although not recommended…)
func (m *Mutex) Unlock() { if race.Enabled { _ = m.state race.Release(unsafe.Pointer(m)) } // Fast path: Drop lock bit. // We get the state of the lock and subtract the state from the acquired state (i.e., unlock), called new(expected) state. New := atomic.AddInt32(&m.state, -mutexlocked) // If the desired state is added to the acquired state, If the lock is not obtained // then panic // here is a question for everyone: why bother so much to subtract and add, directly compare the original lock state is obtained not finished? if (new+mutexLocked)&mutexLocked == 0 { throw("sync: Unlocked of unlocked mutex")} new&mutexmedicine == 0 {old := new for {// If there is no goroutine waiting for the lock // or the lock is acquired (acquired by another Goroutine during the loop) // or the lock is awakened (indicating that a Goroutine is awakened), No need to try to wake up the other goroutine) // or the lock is in starvation mode (which is forwarded directly to the goroutine in the queue head) // then it returns directly, What all need not do if old > > mutexWaiterShift = = 0 | | old & (mutexLocked | mutexWoken | mutexStarving)! = 0 {return} // When this step is reached, the lock is still idle, and no goroutine has been awakened and there are goroutines waiting to be unlocked. New waiting queue - 1 = (old - 1 < < mutexWaiterShift) | mutexWoken CAS / / it is familiar with the if atomic.Com pareAndSwapInt32 (& Margaret spellings Tate, old, New) {// If the state is set successfully, we use the semaphores to wake up goroutine runtime_Semrelease(&m.sema, false) return} // At the end of the loop, update the state, because it is possible that during execution, Old = m.state}} else {// If the state is hungry, // Handoff = true to give the lock to the goroutine at the head of the queue // Note: At this point, the acquired lock state is not set, but is set by the awakened Goroutine after it is awakened. Runtime_Semrelease (&m.sema, true)}} We also assume that the Lock is acquired (because we specify the acquired goroutine manually) // so the new goroutine does not attempt to acquire the Lock (as reflected in Lock) runtime_Semrelease(&m.sema, true)}}Copy the code

conclusion

As you can see from the above code, sync.Mutex is a low lock on your workload, such as assigning to a key variable, but if the critical resource operation takes a long time (especially if the single operation is greater than 1ms), In fact, there may be some performance issues, and this is where we often see the “lock is always hungry” problem. In this case, we may need to find another solution.

Well, that’s the end of the sync.mutex analysis. Although it’s only 200 lines of code (including 150 lines of comments, the actual code is estimated to be 50 lines), the algorithm, design idea, and programming philosophy are worth thinking about.