Source code Interpretation of blocking primitive for Golang lock Resources

Previously on

Last article we introduced the Mutex lock resource pattern and source code analysis. When mutex is in normal state, if the lock application fails to preempt the spin Goroutine, the corresponding Goroutine will be placed at the head or tail of the waiting queue (the head of the queue if the goroutine is awakened, and the tail of the queue if the newly created goroutine is created) to wait for awakening. When mutex is hungry, if there is no lock resource when applying for a lock, it is placed directly at the end of the queue.

/ / lock
func (m *Mutex) lockSlow(a){...If the race fails, use the runtime_SemacquireMutex semaphore to ensure that there are no 2 goutine fetches
	// Since the lock is not available, block this goroutine with the sleep primitive
	QueueLifo =false queueLifo=false queueLifo=false queueLifo=false
	QueueLifo =true to join the head of the waiting queue
	runtime_SemacquireMutex(&m.sema, queueLifo, 1)... }Copy the code

Second, the underlying source code understanding

Equivalent to the operating system P locks one resource per resource

// SemacquireMutex is like Semacquire, but for profiling contended Mutexes.
// If lifo is true, queue waiter at the head of wait queue.
// skipframes is the number of frames to omit during tracing, counting from
// runtime_SemacquireMutex's caller.
func runtime_SemacquireMutex(s *uint32, lifo bool, skipframes int)
Copy the code

The underlying call to Semacquire1 is also the main processing interface. The functions of this interface are: Get sudog and semaRoot, where sudog is the wrapper object g put in the wait queue, sudog will have information of G and some other parameters, semaRoot is the queue structure, inside is a two-plug balanced tree, Put the sudog associated with the current G into semaRoot, change the state of G to wait, and call the scheduler to execute another G, at which point the current G will stop executing. Until the dispatcher reschedules execution, sudog is first released and then the other code logic is executed

//go:linkname sync_runtime_SemacquireMutex sync.runtime_SemacquireMutex
func sync_runtime_SemacquireMutex(addr *uint32, lifo bool, skipframes int) {
	semacquire1(addr, lifo, semaBlockProfile|semaMutexProfile, skipframes)
}
Copy the code

1, sudog

Sudog is a high-level abstraction used by the runtime to store blocked goroutines. It is one of the main mechanisms used to implement user-mode semaphores. For example, when a Goroutine needs to be blocked because it is waiting for a channel, Sudog records the Goroutine and the location it uses to wait for data. Blocking sudogs form a tree heap

type sudog struct {
	/ / the current goroutine
	g *g

	// indicates that g is participating in a select
	isSelect bool
	// Next goroutine
	next     *sudog
	// Last goroutine
	prev     *sudog
	// Data elements
	elem     unsafe.Pointer // data element (may point to stack)

	// For channels, waitlink is only accessed by g.
	// For semaphores, all fields (including the ones above)
	// are only accessed when holding a semaRoot lock.
	acquiretime int64
	releasetime int64
	ticket      uint32 // Random priority probability
	parent      *sudog // semaRoot binary tree
	waitlink    *sudog // g.waiting list or semaRoot
	waittail    *sudog // semaRoot
	c           *hchan // channel
}
Copy the code

2, semaRoot

A semaRoot holds a tree heap of sudogs at different addresses. Each sudog may in turn point to a list of sudogs waiting at the same address

type semaRoot struct {
	lock  mutex
	treap *sudog // Balance tree root
	nwait uint32 // Number of waiters. Read w/o the lock. Waiter
}
Copy the code

3, Treap

Treap is a balanced tree. Compared to a normal binary search tree, each node is given a new attribute: Priority (that’s right is similar to a priority queue priority), for each node in the Treap, except that it has a weight meet the properties of binary search trees, its priority is to satisfy heap properties, namely node priority less than the priority of all its children, as well as a minimum Value on the weight, so Treap is a binary search trees; Treap is a heap in terms of priority. Therefore, Treap=Tree+Heap we found that ordinary BST would be unbalanced because ordered data would degrade the search path into a chain, while random data has a very small probability of degradation. So we randomly generate the value of this priority in Treap so that the structure of Treap tends to be balanced

The black font is the priority, rotated to match the original balance tree (3,7,9,11,16). But rotation by priority turns into the minimum heap

In sudog, ticket serves as the priority field and is rotated according to ticket.

s.ticket = fastrand() | 1   // The ticket value is randomly generated
Copy the code

4, Semacquire1 specific process

Get a Sudog and add the basic information for goroutine
Get a semroot, the root of Treap. The sudog is inserted into the tree and rotated to reach a new equilibrium.
Set the current goroutine to wait state, unbind it from the current M, and let M execute the other goroutine

func semacquire1(addr *uint32, lifo bool, profile semaProfileFlags, skipframes int) {
	gp := getg()
	ifgp ! = gp.m.curg { throw("semacquire not on the G stack")}// Easy case.
    // Preempt to semaphore directly returns CAS preemption
	if cansemacquire(addr) {
		return
	}
	// The semaphore was not captured
	// Harder case:
	//	increment waiter count
	//	try cansemacquire one more time, return if succeeded
	//	enqueue itself as a waiter
	//	sleep
	//	(waiter descriptor is dequeued by signaler)
	// Get a sudog
	s := acquireSudog()
	// Get a semaRoot
	root := semroot(addr)
	t0 := int64(0)
	s.releasetime = 0
	s.acquiretime = 0
	s.ticket = 0
	ifprofile&semaBlockProfile ! =0 && blockprofilerate > 0 {
		t0 = cputicks()
		s.releasetime = - 1
	}
	ifprofile&semaMutexProfile ! =0 && mutexprofilerate > 0 {
		if t0 == 0 {
			t0 = cputicks()
		}
		s.acquiretime = t0
	}
    / / blocking
	for {
		/ / lock
		lock(&root.lock)
		// nwait+1
		atomic.Xadd(&root.nwait, 1)
		// Check cansemacquire to avoid missed wakeup.
		if cansemacquire(addr) {
			atomic.Xadd(&root.nwait, - 1)
			unlock(&root.lock)
			break
		}
		// Any semrelease after the cansemacquire knows we're waiting
		// (we set nwait above), so go to sleep.
		// Add to semaRoot treap
		root.queue(addr, s, lifo)
		// Unlock, change the current state of G to wait, and then let the current m call other G, g equals to wait
		goparkunlock(&root.lock, waitReasonSemacquire, traceEvGoBlockSync, 4+skipframes)
		ifs.ticket ! =0 || cansemacquire(addr) {
			break}}if s.releasetime > 0 {
		blockevent(s.releasetime-t0, 3+skipframes)
	}
	releaseSudog(s)
}
Copy the code

Let’s look at some of these interfaces.

5. How to get a Sudog

A global cache pool and per-P cache pool are involved. Per-p is allocated first and then returned to the cache pool when used. It follows the strategy:

First fetch from per-p cache. If per-P cache is empty, half is fetched from the global pool.
Priority is returned to per-P cache, and if per-P cache is full, half of per-P cache is returned to the global pool.

func acquireSudog(a) *sudog {
	mp := acquirem()  // get m for the current g
	pp := mp.p.ptr()  // get p for the current g
	if len(pp.sudogcache) == 0 { // Check the per-p sudogcache pool for reusable sudogs
		lock(&sched.sudoglock)
		// if there is no per-p, half of the current per-p is fetched from the global pool
		for len(pp.sudogcache) < cap(pp.sudogcache)/2&& sched.sudogcache ! =nil {
			s := sched.sudogcache / / global pool
			sched.sudogcache = s.next
			s.next = nil
			pp.sudogcache = append(pp.sudogcache, s) / / add sudog
		}
		unlock(&sched.sudoglock)
		// If the global pool is empty, create a new one
		if len(pp.sudogcache) == 0 {
			pp.sudogcache = append(pp.sudogcache, new(sudog))
		}
	}
	n := len(pp.sudogcache)
	s := pp.sudogcache[n- 1] // Get one
	pp.sudogcache[n- 1] = nil
	pp.sudogcache = pp.sudogcache[:n- 1]
	ifs.elem ! =nil {
		throw("acquireSudog: found s.elem ! = nil in cache")
	}
	releasem(mp) // m.lock--
	return s
}
Copy the code

6. Add sudog node to semaRoot

func (root *semaRoot) queue(addr *uint32, s *sudog, lifo bool) {
	s.g = getg() // A sudog points to the current Goroutine
	s.elem = unsafe.Pointer(addr)
	s.next = nil
	s.prev = nil

	var last *sudog
	pt := &root.treap
	fort := *pt; t ! =nil; t = *pt {
		if t.elem == unsafe.Pointer(addr) {
			// lifo is true and placed at the head of the queue
			if lifo {
                // Replace t with s
				// Substitute s in t's place in treap.
				*pt = s
				s.ticket = t.ticket
				s.acquiretime = t.acquiretime
				s.parent = t.parent
				s.prev = t.prev
				s.next = t.next
				ifs.prev ! =nil {
					s.prev.parent = s
				}
				ifs.next ! =nil {
					s.next.parent = s
				}
				// Add t first in s's wait list.
                // put t at the first of s's wait lists
				s.waitlink = t
				s.waittail = t.waittail
				if s.waittail == nil {
					s.waittail = t
				}
				t.parent = nil
				t.prev = nil
				t.next = nil
				t.waittail = nil
			} else { // Put it at the end of the queue
				// Add s to end of t's wait list.
				if t.waittail == nil {
					t.waitlink = s
				} else {
					t.waittail.waitlink = s
				}
				t.waittail = s
				s.waitlink = nil
			}
			return
		}
		last = t
		if uintptr(unsafe.Pointer(addr)) < uintptr(t.elem) {
			pt = &t.prev
		} else {
			pt = &t.next
		}
	}

	s.ticket = fastrand() | 1
	s.parent = last
	*pt = s

	// Rotate according to ticket random value as priority
	fors.parent ! =nil && s.parent.ticket > s.ticket {
		if s.parent.prev == s {
			root.rotateRight(s.parent)
		} else {
			ifs.parent.next ! = s {panic("semaRoot queue")
			}
			root.rotateLeft(s.parent)
		}
	}
}
Copy the code

At this point, the process of blocking to apply for a lock resource looks like this. The current goroutine is unbound from the corresponding M, which is equivalent to a cache pool waiting for resources to be released.

Third, summary

We can see that the above operation is equivalent to the operating system’s sleep signal, but we are not getting caught up in system calls and wasting CPU and other resources. Instead, in the user state, the cache pool is used to mask the presence of the internal scheduler, creating semaphores based on the Goroutine abstraction. For example, when user-mode code is competing with a mutex, the goroutine to which the user-mode code is attached can be slept, stored centrally, wakeup, and rescheduled when available.

Is that why Golang works so well

Next: The process of releasing a resource

Four, references,

Go Sema source code analysis
Synchronization primitives
Diagram and implementation of Treap