This article introduces GC source code analysis (source code based on V1.14)

I. Overview of GC

GC scans where Pointers are stored, and first variables are either allocated on the stack or in the heap. We learned in the previous Go language memory management section that the bitmap of the heap indicates every 2 bits which arena stores objects and whether objects contain Pointers. In addition, McEntral is also divided into span containing Pointers (noscan) and span without Pointers. Such classification will reduce the time of marking and improve the efficiency of marking. For stacks, the stack space pointer information is stored in the function, using 1 bit to represent a pointer size memory (located in stackmap.bytedata).

Go’s GC is parallel GC, meaning that most of the GC’s processing is run at the same time as normal Go code, which complicates the Go GC process. First, GC has four phases, which are

  • Sweep Termination: Sweep unswept spans. A new GC can be started only after the Sweep of the previous GC is complete
  • Mark: Scan all root objects, and all objects that the root object can reach, and Mark them not to be recycled
  • Mark Termination: Complete the marking and rescan some of the root objects (STW required)
  • Sweep: Sweeps the span according to the marked result

GC is triggered when certain conditions are met, including the following:

  • GcTriggerHeap: GC is triggered when the currently allocated memory reaches a certain value (automatic)
  • GcTriggerTime: Trigger a GC when no GC has been executed for a certain amount of time (automatic)
  • GcTriggerCycle: requires a new round of GC to be started. If GC is started, it will be skipped and GC will be triggered manuallyruntime.GC()Can use this condition (active)
type gcTriggerKind int
const (
	// gcTriggerHeap indicates that a cycle should be started when
	// the heap size reaches the trigger heap size computed by the
	// controller.
	gcTriggerHeap gcTriggerKind = iota

	// gcTriggerTime indicates that a cycle should be started when
	// it's been more than forcegcperiod nanoseconds since the
	// previous GC cycle.
    // Temporarily set to 2 minutes
	gcTriggerTime

	// gcTriggerCycle indicates that a cycle should be started if
	// we have not yet started cycle number gcTrigger.n (relative
	// to work.cycles).
	gcTriggerCycle
)
Copy the code

Second, source code analysis

Refer to the god, I feel should be the most complete network, we can completely compare source code to learn ~~GC implementation principle

On this basis, I would like to add some points:

1. What did STW do

At present, The whole GC process will carry out STW(Stop The World) twice, The first is The beginning of Mark stage, and The second is The Mark Termination stage. The first STW prepares a scan of the root object, initiating a Write Barrier and mutator assist. The second STW will rescan some of the root objects, disabling Write barriers and mutator assist. Note that not all scans of root objects require STW, such as scanning objects on the stack simply by stopping G. Starting with Go 1.9, Write barriers were implemented using Hybrid Write barriers, which significantly reduced the time for the second STW.

// Execute by g0
func stopTheWorldWithSema(a) {
	_g_ := getg()

	// If we hold a lock, then we won't be able to stop another M
	// that is blocked trying to acquire the lock.
	if _g_.m.locks > 0 {
		throw("stopTheWorld: holding locks")
	}

	lock(&sched.lock)
    // The number of P to stop
	sched.stopwait = gomaxprocs
    // Set the GC wait flag, which will enter the wait when scheduling
	atomic.Store(&sched.gcwaiting, 1)
	preemptall()
	// stop current P
	_g_.m.p.ptr().status = _Pgcstop 
    // Reduce the number of P stops (current P counts as one)
	sched.stopwait--
	// Preempt all ps in Psyscall state to prevent them from re-participating in scheduling
	for _, p := range allp {
		s := p.status
		if s == _Psyscall && atomic.Cas(&p.status, s, _Pgcstop) {
			if trace.enabled {
				traceGoSysBlock(p)
				traceProcStop(p)
			}
			p.syscalltick++
			sched.stopwait--
		}
	}
	// Prevent all idle ps from re-participating in the schedule
	for {
		p := pidleget()
		if p == nil {
			break
		}
		p.status = _Pgcstop
		sched.stopwait--
	}
	wait := sched.stopwait > 0
	unlock(&sched.lock)

	// If there are still P's that need to stop, wait for them to stop
	if wait {
		for {
			// loop wait + preempt all running G
			if notetsleep(&sched.stopnote, 100*1000) {
				noteclear(&sched.stopnote)
				break
			}
			preemptall()
		}
	}

	// Check for logical correctness
	bad := ""
	ifsched.stopwait ! =0 {
		bad = "stopTheWorld: not stopped (stopwait ! = 0)"
	} else {
		for _, p := range allp {
			ifp.status ! = _Pgcstop { bad ="stopTheWorld: not stopped (status ! = _Pgcstop)"}}}ifatomic.Load(&freezing) ! =0 {
		// Some other thread is panicking. This can cause the
		// sanity checks above to fail if the panic happens in
		// the signal handler on a stopped thread. Either way,
		// we should halt this thread.
		lock(&deadlock)
		lock(&deadlock)
	}
	ifbad ! ="" {
		throw(bad)
	}
    // At this point all running G's become pending, and all P's cannot be retrieved by M
	// This means that all go code (except the current one) stops running and no new GO code can be run
}
Copy the code

2. The role of auxiliary GC

Mutator assist: To prevent the heap from growing too fast, if the concurrently running G allocates memory during GC execution, the G is required to assist the GC with some of its work. G running at the same time in the GC process is called “mutator”. The mechanism of “Mutator assist” is the mechanism by which G assists the GC to do part of the work. There are two types of work to assist the GC, one is Mark and the other is Sweep.

3. Root object

The first thing that needs to be marked during the marking phase of the GC is the “root object “, from which all reachable objects are considered alive. The root object contains global variables, variables on each G’s stack, etc. GC scans the root object first and then all objects reachable by the root object.

func gcMarkRootPrepare(a) {
	work.nFlushCacheRoots = 0

	// Compute how many data and BSS root blocks there are.
	nBlocks := func(bytes uintptr) int {
		return int((bytes + rootBlockBytes - 1) / rootBlockBytes)
	}

	work.nDataRoots = 0
	work.nBSSRoots = 0

	// Scan globals.
	for _, datap := range activeModules() {
        // Scan for global variables that can be read and written
		nDataRoots := nBlocks(datap.edata - datap.data)
		if nDataRoots > work.nDataRoots {
			work.nDataRoots = nDataRoots
		}
	}

	for _, datap := range activeModules() {
        // Scan for read-only global variables
		nBSSRoots := nBlocks(datap.ebss - datap.bss)
		if nBSSRoots > work.nBSSRoots {
			work.nBSSRoots = nBSSRoots
		}
	}

	// We're only interested in scanning the in-use spans,
	// which will all be swept at this point. More spans
	// may be added to this list during concurrent GC, but
	// we only care about spans that were allocated before
	// this mark phase.
    // Scan for special objects in each span
	work.nSpanRoots = mheap_.sweepSpans[mheap_.sweepgen/2%2].numBlocks()

	// Scan stacks.
	//
	// Gs may be created after this point, but it's okay that we
	// ignore them because they begin life without any roots, so
	// there's nothing to scan, and any roots they create during
	// the concurrent phase will be scanned during mark
	// termination.
    // Scan the stack of each G
	work.nStackRoots = int(atomic.Loaduintptr(&allglen))

	work.markrootNext = 0
	work.markrootJobs = uint32(fixedRootCount + work.nFlushCacheRoots + work.nDataRoots + work.nBSSRoots + work.nSpanRoots + work.nStackRoots)
}
Copy the code

3. How is the tricolor notation reflected in the code

We learned that after V1.9, GC started using tri-color markers + blending barriers

Inside Go, objects do not store color attributes. The three colors are only descriptions of their state. The white object has a 0 bit in the gcmarkBits of the span it is in, and the gray object has a 1 bit in the gcmarkBits of the span it is in, and the object is in the tag queue. The black object has a bit of 1 in the gcmarkBits of the span it is in, and the object has been removed from the tag queue and processed. After gc is complete, the gcmarkBits are moved to allocBits and reassigned to a bitmap of all zeros, so that the black object becomes white.

4. What did the cleared object do

// Clear a single span
func (s *mspan) sweep(preserve bool) bool {
	_g_ := getg()
	if _g_.m.locks == 0 && _g_.m.mallocing == 0&& _g_ ! = _g_.m.g0 { throw("MSpan_Sweep: m is not locked")
	}
	sweepgen := mheap_.sweepgen
	ifs.state ! = mSpanInUse || s.sweepgen ! = sweepgen- 1 {
		print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
		throw("MSpan_Sweep: bad span state")}if trace.enabled {
		traceGCSweepSpan(s.npages * _PageSize)
	}

	// Count the number of pages cleared
	atomic.Xadd64(&mheap_.pagesSwept, int64(s.npages))

	spc := s.spanclass
	size := s.elemsize
	res := false

	c := _g_.m.mcache
	freeToHeap := false

	// Determines the destructor in Special and marks the object alive to prevent collection if the corresponding object is no longer alive
	specialp := &s.specials
	special := *specialp
	forspecial ! =nil {

		objIndex := uintptr(special.offset) / size
		p := s.base() + objIndex*size
		mbits := s.markBitsForIndex(objIndex)
		if! mbits.isMarked() {// This object is not marked and has at least one special record.
			// Pass 1: see if it has at least one finalizer.
			hasFin := false
			endOffset := p - s.base() + size
			fortmp := special; tmp ! =nil && uintptr(tmp.offset) < endOffset; tmp = tmp.next {
				if tmp.kind == _KindSpecialFinalizer {
					// Stop freeing of object if it has a finalizer.
					mbits.setMarkedNonAtomic()
					hasFin = true
					break}}// Pass 2: queue all finalizers _or_ handle profile record.
			forspecial ! =nil && uintptr(special.offset) < endOffset {
				// Find the exact byte for which the special was setup
				// (as opposed to object beginning).
				p := s.base() + uintptr(special.offset)
				ifspecial.kind == _KindSpecialFinalizer || ! hasFin {// Splice out special record.
					y := special
					special = special.next
					*specialp = special
					freespecial(y, unsafe.Pointer(p), size)
				} else {
					// This is profile record, but the object has finalizers (so kept alive).
					// Keep special record.
					specialp = &special.next
					special = *specialp
				}
			}
		} else {
			// object is still live: keep special record
			specialp = &special.next
			special = *specialp
		}
	}

	/ / debugging use
	ifdebug.allocfreetrace ! =0 || raceenabled || msanenabled {
		// Find all newly freed objects. This doesn't have to
		// efficient; allocfreetrace has massive overhead.
		mbits := s.markBitsForBase()
		abits := s.allocBitsForIndex(0)
		for i := uintptr(0); i < s.nelems; i++ {
			if! mbits.isMarked() && (abits.index < s.freeindex || abits.isMarked()) { x := s.base() + i*s.elemsizeifdebug.allocfreetrace ! =0 {
					tracefree(unsafe.Pointer(x), size)
				}
				if raceenabled {
					racefree(unsafe.Pointer(x), size)
				}
				if msanenabled {
					msanfree(unsafe.Pointer(x), size)
				}
			}
			mbits.advance()
			abits.advance()
		}
	}

	// Count the number of objects released
	nalloc := uint16(s.countAlloc())
	if spc.sizeclass() == 0 && nalloc == 0 {
		// If span is of type 0(large object) and the object in it is no longer alive, it is released to heap
		s.needzero = 1
		freeToHeap = true
	}
	nfreed := s.allocCount - nalloc
	if nalloc > s.allocCount {
		print("runtime: nelems=", s.nelems, " nalloc=", nalloc, " previous allocCount=", s.allocCount, " nfreed=", nfreed, "\n")
		throw("sweep increased allocation count")}// Set a new allocCount
	s.allocCount = nalloc

	// Determine whether span has no unallocated objects
	wasempty := s.nextFreeIndex() == s.nelems

	// Reset freeIndex to search from 0 next time
	s.freeindex = 0 // reset allocation index to start of span.
	if trace.enabled {
		getg().m.p.ptr().traceReclaimed += uintptr(nfreed) * s.elemsize
	}

	// gcmarkBits becomes the new allocBits
	// Then reallocate a block of gcmarkBits that are all zeros
	// The next time an object is allocated, allocBits can tell which elements are unallocated
	// gcmarkBits becomes the allocBits.
	// get a fresh cleared gcmarkBits in preparation for next GC
	s.allocBits = s.gcmarkBits
	s.gcmarkBits = newMarkBits(s.nelems)

	// Update allocCache from freeIndex
	// Initialize alloc bits cache.
	s.refillAllocCache(0)

	// Update sweepgen to the latest if there are no living objects in span
	// Span is added to McEntral or mheap
	// We need to set s.sweepgen = h.sweepgen only when all blocks are swept,
	// because of the potential for a concurrent free/SetFinalizer.
	// But we need to set it before we make the span available for allocation
	// (return it to heap or mcentral), because allocation code assumes that a
	// span is already swept if available for allocation.
	if freeToHeap || nfreed == 0 {
		// The span must be in our exclusive ownership until we update sweepgen,
		// check for potential races.
		ifs.state ! = mSpanInUse || s.sweepgen ! = sweepgen- 1 {
			print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
			throw("MSpan_Sweep: bad span state after sweep")}// Serialization point.
		// At this point the mark bits are cleared and allocation ready
		// to go so release the span.
		atomic.Store(&s.sweepgen, sweepgen)
	}

	if nfreed > 0&& spc.sizeclass() ! =0 {
		// add span to McEntral
		c.local_nsmallfree[spc.sizeclass()] += uintptr(nfreed)
		res = mheap_.central[spc].mcentral.freeSpan(s, preserve, wasempty)
		// freeSpan will update Sweepgen
		// MCentral_FreeSpan updates sweepgen
	} else if freeToHeap {
		// Release span to mheap
		// Free large span to heap

		if debug.efence > 0 {
			s.limit = 0 // prevent mlookup from finding this span
			sysFault(unsafe.Pointer(s.base()), size)
		} else {
			mheap_.freeSpan(s, 1)
		}
		c.local_nlargefree++
		c.local_largefree += size
		res = true
	}
	
	// If span is not added to McEntral or released to mheap, span is still in use
	if! res {// Add spans still in use to sweepSpans' "Swept" queue
		mheap_.sweepSpans[sweepgen/2%2].push(s)
	}
	return res
}go
Copy the code

Reference:

Golang garbage collection anatomy

GC implementation principle