Go GC has been criticized since its birth, but with the introduction of v1.5’s tricolor markup and v1.8’s hybrid write barrier, the normal GC has been reduced to around 10us, which is pretty good. Let’s explore the GC principle of Go

Three color marking principle

Let’s start by looking at a picture that gives us a general idea of the trichromatic notation:

Principle:

  1. First put all the objects in the white set
  2. The object is traversed from the root node, and the white object traversed is moved from the white set into the gray set
  3. Iterate over the objects in the gray set, putting the objects in the white set referenced by the gray object into the gray set, and putting the objects in the gray set traversed into the black set
  4. Loop through step 3, knowing that there are no objects in the gray set
  5. After Step 4, the objects in the white collection are unreachable objects, that is, garbage, and are collected

Write barriers

Go does not have a STW for its tricolor, which means that the object can still be modified

So let’s think about the following situation

We are scanning the gray set in A tricolor, and we find object A and mark all references to object A. At this point, we start scanning the references to object D. At this point, another Goroutine changes the references to D->E to look like the following figure

Will this result in E objects not being scanned and being mistaken for white objects, i.e. garbage

The write barrier is designed to solve this problem. After the introduction of the write barrier, E is considered alive after the above steps, even if it is discarded by A later, E will be reclaimed in the next GC, which will not be reclaimed

The hybrid write barrier is enabled in Go1.9. The pseudocode is as follows

writePointer(slot, ptr):
    shade(*slot)
    if any stack is grey:
        shade(ptr)
    *slot = ptr
Copy the code

The hybrid write barrier marks both the “old pointer” and the “new pointer” to which the pointer is written.

The reason for marking the original pointer is that other running threads may copy the value of the pointer to a register or local variable on the stack at the same time. Copying the pointer to a register or local variable on the stack does not pass the write barrier, so the pointer may not be marked.

[go] b = obj [go] oldx = nil [gc] scan oldx... [go] oldx = b.x // Copy b.x to local variables without writing barriers [go] b.x = PTR // Write barriers should mark the original value of b.x [gc] scan b... If the write barrier does not mark the original value, then OLDX will not be scanned.Copy the code

The reason for marking the new pointer is that other running threads might move the pointer. Consider the following:

[go] a = ptr [go] b = obj [gc] scan b... [go] b.x = a // Write barriers should mark b.x with new values [go] a = nil [gc] scan a... If the write barrier does not mark the new value, the PTR will not be scanned.Copy the code

The hybrid write barrier allows the GC to reduce STW time in Mark Termination by eliminating the need to rescan the stacks of each G after the parallel Mark ends

Except for the write barrier, all newly allocated objects immediately turn black during GC, as you can see in the mallocGC function above

The recycling process

GO’s GC is parallel GC, meaning that most of the GC’s processing is run at the same time as normal GO code, which complicates the GO GC process. First, GC has four phases, which are:

  • Sweep Termination: Sweep unswept spans. A new GC can be started only after the Sweep of the previous GC is complete
  • Mark: Scan all root objects, and all objects that the root object can reach, and Mark them not to be recycled
  • Mark Termination: Complete the marking and rescan some of the root objects (STW required)
  • Sweep: Sweeps the span according to the marked result

Here is a more complete GC flow, with the four stages sorted by color:

There are two types of background tasks (G) during GC, one for marking and one for cleaning. Background tasks for marking are started when needed, and the number of background tasks that can work at the same time is about 25% of the number of P’s, which is the basis for go to spend 25% of the CPU on GC. A background task for cleaning is started when the program starts and woken up when it enters the cleaning phase.

At present, The whole GC process will carry out STW(Stop The World) twice, The first is The beginning of Mark stage, and The second is The Mark Termination stage. The first STW prepares a scan of the root object, initiating a Write Barrier and mutator assist. The second STW will rescan some of the root objects, disabling Write barriers and mutator assist. Note that not all scans of root objects require STW, such as scanning objects on the stack simply by stopping G. The Write Barrier implementation uses a Hybrid Write Barrier, significantly reducing the time for the second STW.

Source code analysis

gcStart

func gcStart(mode gcMode, trigger gcTrigger) {
	// Since this is called from malloc and malloc is called in
	// the guts of a number of libraries that might be holding
	// locks, don't attempt to start GC in non-preemptible or
	// potentially unstable situations.
	// Check whether g is preempt. GC is not triggered if g is not preempt
	mp := acquirem()
	if gp := getg(); gp == mp.g0 || mp.locks > 1|| mp.preemptoff ! ="" {
		releasem(mp)
		return
	}
	releasem(mp)
	mp = nil

	// Pick up the remaining unswept/not being swept spans concurrently
	//
	// This shouldn't happen if we're being invoked in background
	// mode since proportional sweep should have just finished
	// sweeping everything, but rounding errors, etc, may leave a
	// few spans unswept. In forced mode, this is necessary since
	// GC can be forced at any point in the sweeping cycle.
	//
	// We check the transition condition continuously here in case
	// this G gets delayed in to the next GC cycle.
	// Clean up the remaining uncleaned garbage
	fortrigger.test() && gosweepone() ! = ^uintptr(0) {
		sweep.nbgsweep++
	}

	// Perform GC initialization and the sweep termination
	// transition.
	semacquire(&work.startSema)
	// Re-check transition condition under transition lock.
	// Check whether the gcTrriger condition is true
	if! trigger.test() { semrelease(&work.startSema)return
	}

	// For stats, check if this GC was forced by the user
	Runtime.gc () can be called by the user and enforced
	work.userForced = trigger.kind == gcTriggerAlways || trigger.kind == gcTriggerCycle

	// In gcstoptheworld debug mode, upgrade the mode accordingly.
	// We do this after re-checking the transition condition so
	// that multiple goroutines that detect the heap trigger don't
	// start multiple STW GCs.
	// Set the mode for gc
	if mode == gcBackgroundMode {
		if debug.gcstoptheworld == 1 {
			mode = gcForceMode
		} else if debug.gcstoptheworld == 2 {
			mode = gcForceBlockMode
		}
	}

	// Ok, we're doing it! Stop everybody else
	semacquire(&worldsema)

	if trace.enabled {
		traceGCStart()
	}
	// Start the background tag task
	if mode == gcBackgroundMode {
		gcBgMarkStartWorkers()
	}
	// Resets the state associated with gc flags
	gcResetMarkState()

	work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
	if work.stwprocs > ncpu {
		// This is used to compute CPU time of the STW phases,
		// so it can't be more than ncpu, even if GOMAXPROCS is.
		work.stwprocs = ncpu
	}
	work.heap0 = atomic.Load64(&memstats.heap_live)
	work.pauseNS = 0
	work.mode = mode

	now := nanotime()
	work.tSweepTerm = now
	work.pauseStart = now
	if trace.enabled {
		traceGCSTWStart(1)}// STW, stop world
	systemstack(stopTheWorldWithSema)
	// Finish sweep before we start concurrent scan.
	// Clean up the garbage from the previous round
	systemstack(func(a) {
		finishsweep_m()
	})
	// clearpools before we start the GC. If we wait they memory will not be
	// reclaimed until the next GC cycle.
	Sync. Pool sched. Sudogcache, sched. Deferpool
	clearpools()

	// Add GC technology
	work.cycles++
	if mode == gcBackgroundMode { // Do as much work concurrently as possible
		gcController.startCycle()
		work.heapGoal = memstats.next_gc

		// Enter concurrent mark phase and enable
		// write barriers.
		//
		// Because the world is stopped, all Ps will
		// observe that write barriers are enabled by
		// the time we start the world and begin
		// scanning.
		//
		// Write barriers must be enabled before assists are
		// enabled because they must be enabled before
		// any non-leaf heap objects are marked. Since
		// allocations are blocked until assists can
		// happen, we want enable assists as early as
		// possible.
		// Set the GC state to gcMark
		setGCPhase(_GCmark)

		// Update the bgMark status
		gcBgMarkPrepare() // Must happen before assist enable.
		// Calculate and queue root scan tasks and initialize the status of related scan tasks
		gcMarkRootPrepare()

		// Mark all active tinyalloc blocks. Since we're
		// allocating from these, they need to be black like
		// other allocations. The alternative is to blacken
		// the tiny block on every allocation from it, which
		// would slow down the tiny allocator.
		// Mark tiny objects
		gcMarkTinyAllocs()

		// At this point all Ps have enabled the write
		// barrier, thus maintaining the no white to
		// black invariant. Enable mutator assists to
		// put back-pressure on fast allocating
		// mutators.
		// Set gcBlackenEnabled to 1 to enable write barriers
		atomic.Store(&gcBlackenEnabled, 1)

		// Assists and workers can start the moment we start
		// the world.
		gcController.markStartTime = now

		// Concurrent mark.
		systemstack(func(a) {
			now = startTheWorldWithSema(trace.enabled)
		})
		work.pauseNS += now - work.pauseStart
		work.tMark = now
	} else {
		// Non-parallel mode
		// Record the start time of the completion marking phase
		if trace.enabled {
			// Switch to mark termination STW.
			traceGCSTWDone()
			traceGCSTWStart(0)
		}
		t := nanotime()
		work.tMark, work.tMarkTerm = t, t
		work.heapGoal = work.heap0

		// Perform mark termination. This will restart the world.
		// STW, mark, clean, and start the world
		gcMarkTermination(memstats.triggerRatio)
	}

	semrelease(&work.startSema)
}
Copy the code

gcBgMarkStartWorkers

This function prepares some goroutines that perform BG mark work, but these goroutines do not work immediately, but do not work until the GC state is marked as gcMark, as shown in line 119 of the previous function

func gcBgMarkStartWorkers(a) {
	// Background marking is performed by per-P G's. Ensure that
	// each P has a background GC G.
	for _, p := range allp {
		if p.gcBgMarkWorker == 0 {
			go gcBgMarkWorker(p)
			// Wait for the bgMarkReady signal from gcBgMarkWorker goroutine to continue
			notetsleepg(&work.bgMarkReady, - 1)
			noteclear(&work.bgMarkReady)
		}
	}
}
Copy the code

gcBgMarkWorker

A function that marks tasks in the background

func gcBgMarkWorker(_p_ *p) {
	gp := getg()
	// Used to retrieve p and m after hibernation
	type parkInfo struct {
		m      muintptr // Release this m on park.
		attach puintptr // If non-nil, attach to this p on park.
	}
	// We pass park to a gopark unlock function, so it can't be on
	// the stack (see gopark). Prevent deadlock from recursively
	// starting GC by disabling preemption.
	gp.m.preemptoff = "GC worker init"
	park := new(parkInfo)
	gp.m.preemptoff = ""
	/ / set the park of m and p, stay behind to gopark, being gcController. FindRunnable sensei, easy to find
	park.m.set(acquirem())
	park.attach.set(_p_)
	// Inform gcBgMarkStartWorkers that this worker is ready.
	// After this point, the background mark worker is scheduled
	// cooperatively by gcController.findRunnable. Hence, it must
	// never be preempted, as this would put it into _Grunnable
	// and put it on a run queue. Instead, when the preempt flag
	// is set, this puts itself into _Gwaiting to be woken up by
	// gcController.findRunnable at the appropriate time.
	// Make gcBgMarkStartWorkers notetsleepg stop waiting and continue and exit
	notewakeup(&work.bgMarkReady)

	for {
		// Go to sleep until woken by gcController.findRunnable.
		// We can't releasem yet since even the call to gopark
		// may be preempted.
		// Let g go to sleep
		gopark(func(g *g, parkp unsafe.Pointer) bool {
			park := (*parkInfo)(parkp)

			// The worker G is no longer running, so it's
			// now safe to allow preemption.
			// Release the currently preempted m
			releasem(park.m.ptr())

			// If the worker isn't attached to its P,
			// attach now. During initialization and after
			// a phase change, the worker may have been
			// running on a different P. As soon as we
			// attach, the owner P may schedule the
			// worker, so this must be done after the G is
			// stopped.
			// set the association p, which is already set above
			ifpark.attach ! =0 {
				p := park.attach.ptr()
				park.attach.set(nil)
				// cas the worker because we may be
				// racing with a new worker starting
				// on this P.
				if! p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) {
					// The P got a new worker.
					// Exit this worker.
					return false}}return true
		}, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0)

		// Loop until the P dies and disassociates this
		// worker (the P may later be reused, in which case
		// it will get a new worker) or we failed to associate.
		// Check whether the gcBgMarkWorker of P is the same as the current gcBgMarkWorker of G. If not, end the current task
		if_p_.gcBgMarkWorker.ptr() ! = gp {break
		}

		// Disable preemption so we can use the gcw. If the
		// scheduler wants to preempt us, we'll stop draining,
		// dispose the gcw, and then preempt.
		// Gopark releases m in the first function, and preempts back
		park.m.set(acquirem())

		if gcBlackenEnabled == 0 {
			throw("gcBgMarkWorker: blackening not enabled")
		}

		startTime := nanotime()
		// Set the start time of gcmark
		_p_.gcMarkWorkerStartTime = startTime

		decnwait := atomic.Xadd(&work.nwait, - 1)
		if decnwait == work.nproc {
			println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc)
			throw("work.nwait was > work.nproc")}// Switch to g0
		systemstack(func(a) {
			// Mark our goroutine preemptible so its stack
			// can be scanned. This lets two mark workers
			// scan each other (otherwise, they would
			// deadlock). We must not modify anything on
			// the G stack. However, stack shrinking is
			// disabled for mark workers, so it is safe to
			// read from the G stack.
			// Set G to be in a waiting state so that another G can scan its stack.
			casgstatus(gp, _Grunning, _Gwaiting)
			switch _p_.gcMarkWorkerMode {
			default:
				throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
			case gcMarkWorkerDedicatedMode:
				// Focus on executing the pattern of markup work
				gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
				if gp.preempt {
					// all G's in the local runqueue are placed in the global runqueue
					// We were preempted. This is
					// a useful signal to kick
					// everything out of the run
					// queue so it can run
					// somewhere else.
					lock(&sched.lock)
					for {
						gp, _ := runqget(_p_)
						if gp == nil {
							break
						}
						globrunqput(gp)
					}
					unlock(&sched.lock)
				}
				// Go back to draining, this time
				// without preemption.
				// Continue with the marking
				gcDrain(&_p_.gcw, gcDrainNoBlock|gcDrainFlushBgCredit)
			case gcMarkWorkerFractionalMode:
				// Perform marking until preempted
				gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
			case gcMarkWorkerIdleMode:
				// Perform marking when free
				gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
			}
			// change G from waiting to runing
			casgstatus(gp, _Gwaiting, _Grunning)
		})

		// If we are nearing the end of mark, dispose
		// of the cache promptly. We must do this
		// before signaling that we're no longer
		// working so that other workers can't observe
		// no workers and no work while we have this
		// cached, and before we compute done.
		// Process the local cache in time and submit it to the global queue
		if gcBlackenPromptly {
			_p_.gcw.dispose()
		}

		// Account for time.
		// Cumulative time
		duration := nanotime() - startTime
		switch _p_.gcMarkWorkerMode {
		case gcMarkWorkerDedicatedMode:
			atomic.Xaddint64(&gcController.dedicatedMarkTime, duration)
			atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1)
		case gcMarkWorkerFractionalMode:
			atomic.Xaddint64(&gcController.fractionalMarkTime, duration)
			atomic.Xaddint64(&_p_.gcFractionalMarkTime, duration)
		case gcMarkWorkerIdleMode:
			atomic.Xaddint64(&gcController.idleMarkTime, duration)
		}

		// Was this the last worker and did we run out
		// of work?
		incnwait := atomic.Xadd(&work.nwait, +1)
		if incnwait > work.nproc {
			println("runtime: p.gcMarkWorkerMode=", _p_.gcMarkWorkerMode,
				"work.nwait=", incnwait, "work.nproc=", work.nproc)
			throw("work.nwait > work.nproc")}// If this worker reached a background mark completion
		// point, signal the main GC goroutine.
		ifincnwait == work.nproc && ! gcMarkWorkAvailable(nil) {
			// Make this G preemptible and disassociate it
			// as the worker for this P so
			// findRunnableGCWorker doesn't try to
			// schedule it.
			// remove the association between p and m
			_p_.gcBgMarkWorker.set(nil)
			releasem(park.m.ptr())

			gcMarkDone()

			// Disable preemption and prepare to reattach
			// to the P.
			//
			// We may be running on a different P at this
			// point, so we can't reattach until this G is
			// parked.
			park.m.set(acquirem())
			park.attach.set(_p_)
		}
	}
}
Copy the code
gcDrain

The main implementation of tricolor marking

GcDrain scans all roots and objects and displays black-gray objects until all roots and objects are marked

func gcDrain(gcw *gcWork, flags gcDrainFlags) {
	if! writeBarrier.needed { throw("gcDrain phase incorrect")
	}

	gp := getg().m.curg
	// See if the preemption flag should be returnedpreemptible := flags&gcDrainUntilPreempt ! =0
	// Whether to wait for tasks when there are no tasks
	blocking := flags&(gcDrainUntilPreempt|gcDrainIdle|gcDrainFractional|gcDrainNoBlock) == 0
	// Whether to calculate the amount of scanning in the background to reduce auxiliary GC and wake up waiting GflushBgCredit := flags&gcDrainFlushBgCredit ! =0
	// Whether to perform marking tasks when idleidle := flags&gcDrainIdle ! =0
	// Record the initial scan tasks that have been performed
	initScanWork := gcw.scanWork

	// checkWork is the scan work before performing the next
	// self-preempt check.
	// Set the work check function for the corresponding mode
	checkWork := int64(1<<63 - 1)
	var check func(a) bool
	if flags&(gcDrainIdle|gcDrainFractional)! = 0 {
		checkWork = initScanWork + drainCheckThreshold
		if idle {
			check = pollWork
		} else ifflags&gcDrainFractional ! =0 {
			check = pollFractionalWorkerExit
		}
	}

	// Drain root marking jobs.
	// If the root object is not completely scanned, it will be scanned
	if work.markrootNext < work.markrootJobs {
		for! (preemptible && gp.preempt) { job := atomic.Xadd(&work.markrootNext, +1) - 1
			if job >= work.markrootJobs {
				break
			}
			// Perform root scanning
			markroot(gcw, job)
			ifcheck ! =nil && check() {
				goto done
			}
		}
	}

	// Drain heap marking jobs.
	// loop until preempted
	for! (preemptible && gp.preempt) {// Try to keep work available on the global queue. We used to
		// check if there were waiting workers, but it's better to
		// just keep work available than to make workers wait. In the
		// worst case, we'll do O(log(_WorkbufSize)) unnecessary
		// balances.
		if work.full == 0 {
			// If the global tag queue is empty, part of the work is done in the global queue
			gcw.balance()
		}

		var b uintptr
		if blocking {
			b = gcw.get()
		} else {
			b = gcw.tryGetFast()
			if b == 0 {
				b = gcw.tryGet()
			}
		}
		// Failed to get the task
		if b == 0 {
			// work barrier reached or tryGet failed.
			break
		}
		// Scan the object
		scanobject(b, gcw)

		// Flush background scan work credit to the global
		// account if we've accumulated enough locally so
		// mutator assists can draw on it.
		// If the current number of scanned objects exceeds gcCreditSlack, add the number of scanned objects to the global number and batch update
		if gcw.scanWork >= gcCreditSlack {
			atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
			if flushBgCredit {
				gcFlushBgCredit(gcw.scanWork - initScanWork)
				initScanWork = 0
			}
			checkWork -= gcw.scanWork
			gcw.scanWork = 0
			// If the number of objects scanned has reached the target number of checkWork for the next preemption, the function of the corresponding mode is called
			// The idle mode is pollWork, and the Fractional mode is pollFractionalWorkerExit, in line 20
			if checkWork <= 0 {
				checkWork += drainCheckThreshold
				ifcheck ! =nil && check() {
					break}}}}// In blocking mode, write barriers are not allowed after this
	// point because we must preserve the condition that the work
	// buffers are empty.

done:
	// Flush remaining scan work credit.
	if gcw.scanWork > 0 {
		// Add the number of scanned objects to the global
		atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
		if flushBgCredit {
			gcFlushBgCredit(gcw.scanWork - initScanWork)
		}
		gcw.scanWork = 0}}Copy the code
markroot

This is used for root object scanning

func markroot(gcw *gcWork, i uint32) {
	// TODO(austin): This is a bit ridiculous. Compute and store
	// the bases in gcMarkRootPrepare instead of the counts.
	baseFlushCache := uint32(fixedRootCount)
	baseData := baseFlushCache + uint32(work.nFlushCacheRoots)
	baseBSS := baseData + uint32(work.nDataRoots)
	baseSpans := baseBSS + uint32(work.nBSSRoots)
	baseStacks := baseSpans + uint32(work.nSpanRoots)
	end := baseStacks + uint32(work.nStackRoots)

	// Note: if you add a case here, please also update heapdump.go:dumproots.
	switch {
	// Release span in McAche
	case baseFlushCache <= i && i < baseData:
		flushmcache(int(i - baseFlushCache))
	// Scan for global variables that can be read and written
	case baseData <= i && i < baseBSS:
		for _, datap := range activeModules() {
			markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-baseData))
		}
	// Scans the read-only global queue
	case baseBSS <= i && i < baseSpans:
		for _, datap := range activeModules() {
			markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-baseBSS))
		}
	// Scan the Finalizer queue
	case i == fixedRootFinalizers:
		// Only do this once per GC cycle since we don't call
		// queuefinalizer during marking.
		if work.markrootDone {
			break
		}
		forfb := allfin; fb ! =nil; fb = fb.alllink {
			cnt := uintptr(atomic.Load(&fb.cnt))
			scanblock(uintptr(unsafe.Pointer(&fb.fin[0])), cnt*unsafe.Sizeof(fb.fin[0]), &finptrmask[0], gcw)
		}
	// Release the terminated stack
	case i == fixedRootFreeGStacks:
		// Only do this once per GC cycle; preferably
		// concurrently.
		if! work.markrootDone {// Switch to the system stack so we can call
			// stackfree.
			systemstack(markrootFreeGStacks)
		}
	/ / scanning MSpan. Specials
	case baseSpans <= i && i < baseStacks:
		// mark MSpan.specials
		markrootSpans(gcw, int(i-baseSpans))

	default:
		// the rest is scanning goroutine stacks
		// Get the g to scan
		var gp *g
		if baseStacks <= i && i < end {
			gp = allgs[i-baseStacks]
		} else {
			throw("markroot: bad index")}// remember when we've first observed the G blocked
		// needed only to output in traceback
		status := readgstatus(gp) // We are not in a scan state
		if (status == _Gwaiting || status == _Gsyscall) && gp.waitsince == 0 {
			gp.waitsince = work.tstart
		}

		// scang must be done on the system stack in case
		// we're trying to scan our own stack.
		// forward to G0 for scanning
		systemstack(func(a) {
			// If this is a self-scan, put the user G in
			// _Gwaiting to prevent self-deadlock. It may
			// already be in _Gwaiting if this is a mark
			// worker or we're in mark termination.
			userG := getg().m.curg
			selfScan := gp == userG && readgstatus(userG) == _Grunning
			// If it is a scan of its own, then switch the state of its own g
			if selfScan {
				casgstatus(userG, _Grunning, _Gwaiting)
				userG.waitreason = waitReasonGarbageCollectionScan
			}

			// TODO: scang blocks until gp's stack has
			// been scanned, which may take a while for
			// running goroutines. Consider doing this in
			// two phases where the first is non-blocking:
			// we scan the stacks we can and ask running
			// goroutines to scan themselves; and the
			// second blocks.
			// Scan g's stack
			scang(gp, gcw)

			if selfScan {
				casgstatus(userG, _Gwaiting, _Grunning)
			}
		})
	}
}
Copy the code
markRootBlock

Scan the [B0, B0 + N0) region according to pTRMask0

func markrootBlock(b0, n0 uintptr, ptrmask0 *uint8, gcw *gcWork, shard int) {
	if rootBlockBytes%(8*sys.PtrSize) ! =0 {
		// This is necessary to pick byte offsets in ptrmask0.
		throw("rootBlockBytes must be a multiple of 8*ptrSize")
	}

	b := b0 + uintptr(shard)*rootBlockBytes
	// If the block area to be scanned exceeds b0+n0, return directly
	if b >= b0+n0 {
		return
	}
	ptrmask := (*uint8)(add(unsafe.Pointer(ptrmask0), uintptr(shard)*(rootBlockBytes/(8*sys.PtrSize))))
	n := uintptr(rootBlockBytes)
	if b+n > b0+n0 {
		n = b0 + n0 - b
	}

	// Scan this shard.
	// Scans the shard of the given block
	scanblock(b, n, ptrmask, gcw)
}
Copy the code
scanblock
func scanblock(b0, n0 uintptr, ptrmask *uint8, gcw *gcWork) {
	// Use local copies of original parameters, so that a stack trace
	// due to one of the throws below shows the original block
	// base and extent.
	b := b0
	n := n0

	for i := uintptr(0); i < n; {
		// Find bits for the next word.
		// Find the corresponding bits in the bitmap
		bits := uint32(*addb(ptrmask, i/(sys.PtrSize*8)))
		if bits == 0 {
			i += sys.PtrSize * 8
			continue
		}
		for j := 0; j < 8 && i < n; j++ {
			if bits&1! =0 {
				// If the address contains a pointer
				// Same work as in scanobject; see comments there.
				obj := *(*uintptr)(unsafe.Pointer(b + i))
				ifobj ! =0 {
					// If the corresponding object is found under this address, mark it with grey
					ifobj, span, objIndex := findObject(obj, b, i); obj ! =0 {
						greyobject(obj, b, i, span, gcw, objIndex)
					}
				}
			}
			bits >>= 1
			i += sys.PtrSize
		}
	}
}
Copy the code
greyobject

A gray object is simply a bitmap that is found, marked alive and thrown into the queue

func greyobject(obj, base, off uintptr, span *mspan, gcw *gcWork, objIndex uintptr) {
	// obj should be start of allocation, and so must be at least pointer-aligned.
	if obj&(sys.PtrSize- 1) != 0 {
		throw("greyobject: obj not pointer-aligned")
	}
	mbits := span.markBitsForIndex(objIndex)

	if useCheckmark {
		// This is used for debugging to ensure that all objects are correctly identified
		if! mbits.isMarked() {// This object is not marked
			printlock()
			print("runtime:greyobject: checkmarks finds unexpected unmarked object obj=", hex(obj), "\n")
			print("runtime: found obj at *(", hex(base), "+", hex(off), ")\n")

			// Dump the source (base) object
			gcDumpObject("base", base, off)

			// Dump the object
			gcDumpObject("obj", obj, ^uintptr(0))

			getg().m.traceback = 2
			throw("checkmark found unmarked object")
		}
		hbits := heapBitsForAddr(obj)
		if hbits.isCheckmarked(span.elemsize) {
			return
		}
		hbits.setCheckmarked(span.elemsize)
		if! hbits.isCheckmarked(span.elemsize) { throw("setCheckmarked and isCheckmarked disagree")}}else {
		if debug.gccheckmark > 0 && span.isFree(objIndex) {
			print("runtime: marking free object ", hex(obj), " found at *(", hex(base), "+", hex(off), ")\n")
			gcDumpObject("base", base, off)
			gcDumpObject("obj", obj, ^uintptr(0))
			getg().m.traceback = 2
			throw("marking free object")}// If marked we have nothing to do.
		// The object is marked correctly, no further action is required
		if mbits.isMarked() {
			return
		}
		// mbits.setMarked() // Avoid extra call overhead with manual inlining.
		// Mark the object
		atomic.Or8(mbits.bytep, mbits.mask)
		// If this is a noscan object, fast-track it to black
		// instead of greying it.
		// If the object is not a pointer, it only needs to be marked. It does not need to be queued
		if span.spanclass.noscan() {
			gcw.bytesMarked += uint64(span.elemsize)
			return}}// Queue the obj for scanning. The PREFETCH(obj) logic has been removed but
	// seems like a nice optimization that can be added back in.
	// There needs to be time between the PREFETCH and the use.
	// Previously we put the obj in an 8 element buffer that is drained at a rate
	// to give the PREFETCH time to do its work.
	// Use of PREFETCHNTA might be more appropriate than PREFETCH
	// Check whether the object is placed in the queue, if not, the gray step is complete
	if! gcw.putFast(obj) { gcw.put(obj) } }Copy the code
gcWork.putFast

Work has two queues, wbuf1 and wbuf2, to store gray objects. First, it adds gray objects to the wBUf1 queue. When wBUf1 is full, wBUF1 and WBUf2 are switched, and wBUF2 is promoted to wBUf1

PutFast here is trying to put the object into the WBUF1 queue

func (w *gcWork) putFast(obj uintptr) bool {
	wbuf := w.wbuf1
	if wbuf == nil {
		// No cache queue is requested, return false
		return false
	} else if wbuf.nobj == len(wbuf.obj) {
		// wbuf1 queue is full, return false
		return false
	}

	// Add an object to the unfilled WBUF1 queue
	wbuf.obj[wbuf.nobj] = obj
	wbuf.nobj++
	return true
}
Copy the code
gcWork.put

When wBUF1 is full, it tries to change the role of wBUf1 and WBUf2. If wBUF1 is full, it applies to the global queue and submits the queue to the global queue

func (w *gcWork) put(obj uintptr) {
	flushed := false
	wbuf := w.wbuf1
	if wbuf == nil {
		// If wbuf1 does not exist, initialize wbuf1 and wbuf2
		w.init()
		wbuf = w.wbuf1
		// wbuf is empty at this point.
	} else if wbuf.nobj == len(wbuf.obj) {
		// Wbuf1 is full, change the role of wbuf1 wbuf2
		w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
		wbuf = w.wbuf1
		if wbuf.nobj == len(wbuf.obj) {
			// WBUF1 is also full after the role change, indicating that both queues are full
			// submit wbuf1 globally and get an empty queue
			putfull(wbuf)
			wbuf = getempty()
			w.wbuf1 = wbuf
			// Set the flag bit to be handed by the queue
			flushed = true
		}
	}

	wbuf.obj[wbuf.nobj] = obj
	wbuf.nobj++

	// If we put a buffer on full, let the GC controller know so
	// it can encourage more workers to run. We delay this until
	// the end of put so that w is in a consistent state, since
	// enlistWorker may itself manipulate w.
	// The GC controller selects to schedule more work
	if flushed && gcphase == _GCmark {
		gcController.enlistWorker()
	}
}
Copy the code

From there, let’s continue analyzing the function inside gcDrain and trace how the objects we gray are marked black

gcw.balance()

Moving on to line 58 of gcDrain, what is balance work

func (w *gcWork) balance(a) {
	if w.wbuf1 == nil {
		// The wBUF1 wBUf2 queue is not initialized yet
		return
	}
	// If wbuf2 is not empty, it is submitted globally and gets an empty island queue to wbuf2
	ifwbuf := w.wbuf2; wbuf.nobj ! =0 {
		putfull(wbuf)
		w.wbuf2 = getempty()
	} else if wbuf := w.wbuf1; wbuf.nobj > 4 {
		// Divide the unfilled WBUF1 into two parts and submit half of them to the global queue
		w.wbuf1 = handoff(wbuf)
	} else {
		return
	}
	// We flushed a buffer to the full list, so wake a worker.
	// The global queue is full, and other works can work
	if gcphase == _GCmark {
		gcController.enlistWorker()
	}
}
Copy the code
gcw.get()

Continuing with line 63 of gcDrain, this is done by first getting an object from the local queue, if the local queue has no WBUF1, then trying to get an object from wbuf2, and if neither, trying to get a full queue from the global queue and getting an object

func (w *gcWork) get(a) uintptr {
	wbuf := w.wbuf1
	if wbuf == nil {
		w.init()
		wbuf = w.wbuf1
		// wbuf is empty at this point.
	}
	if wbuf.nobj == 0 {
		// Wbuf1 is empty, change the wbuf1 wbuf2 role
		w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
		wbuf = w.wbuf1
		// Wbuf2 is also empty, try to get a full queue from the global queue
		if wbuf.nobj == 0 {
			owbuf := wbuf
			wbuf = getfull()
			// If no, return
			if wbuf == nil {
				return 0
			}
			// Upload the empty queue to the global empty queue and use the full queue as its own wbuf1
			putempty(owbuf)
			w.wbuf1 = wbuf
		}
	}

	// TODO: This might be a good place to add prefetch code

	wbuf.nobj--
	return wbuf.obj[wbuf.nobj]
}
Copy the code

Gcw.tryget () gcw.trygetFast ()

scanobject

We continue to analyze L76 of gcDrain, where B has been obtained and the consumption queue has started

func scanobject(b uintptr, gcw *gcWork) {
	// Find the bits for b and the size of the object at b.
	//
	// b is either the beginning of an object, in which case this
	// is the size of the object to scan, or it points to an
	// oblet, in which case we compute the size to scan below.
	// Get the bits corresponding to b
	hbits := heapBitsForAddr(b)
	// Get the span of b
	s := spanOfUnchecked(b)
	n := s.elemsize
	if n == 0 {
		throw("scanobject n == 0")}// If the object is too large, cut and scan again. MaxObletBytes is 128K
	if n > maxObletBytes {
		// Large object. Break into oblets for better
		// parallelism and lower latency.
		if b == s.base() {
			// It's possible this is a noscan object (not
			// from greyobject, but from other code
			// paths), in which case we must *not* enqueue
			// oblets since their bitmaps will be
			// uninitialized.
			// If it is not a pointer, it is marked black
			if s.spanclass.noscan() {
				// Bypass the whole scan.
				gcw.bytesMarked += uint64(n)
				return
			}

			// Enqueue the other oblets to scan later.
			// Some oblets may be in b's scalar tail, but
			// these will be marked as "no more pointers",
			// so we'll drop out immediately when we go to
			// scan those.
			// Press maxObletBytes to cut and queue
			for oblet := b + maxObletBytes; oblet < s.base()+s.elemsize; oblet += maxObletBytes {
				if! gcw.putFast(oblet) { gcw.put(oblet) } } }// Compute the size of the oblet. Since this object
		// must be a large object, s.base() is the beginning
		// of the object.
		n = s.base() + s.elemsize - b
		if n > maxObletBytes {
			n = maxObletBytes
		}
	}

	var i uintptr
	for i = 0; i < n; i += sys.PtrSize {
		// Find bits for this word.
		// Get the corresponding bits
		ifi ! =0 {
			// Avoid needless hbits.next() on last iteration.
			hbits = hbits.next()
		}
		// Load bits once. See CL 22712 and issue 16973 for discussion.
		bits := hbits.bits()
		// During checkmarking, 1-word objects store the checkmark
		// in the type bit for the one word. The only one-word objects
		// are pointers, or else they'd be merged with other non-pointer
		// data into larger allocations.
		ifi ! =1*sys.PtrSize && bits&bitScan == 0 {
			break // no more pointers in this object
		}
		// Not a pointer, continue
		if bits&bitPointer == 0 {
			continue // not a pointer
		}

		// Work here is duplicated in scanblock and above.
		// If you make changes here, make changes there too.
		obj := *(*uintptr)(unsafe.Pointer(b + i))

		// At this point we have extracted the next potential pointer.
		// Quickly filter out nil and pointers back to the current object.
		ifobj ! =0 && obj-b >= n {
			// Test if obj points into the Go heap and, if so,
			// mark the object.
			//
			// Note that it's possible for findObject to
			// fail if obj points to a just-allocated heap
			// object because of a race with growing the
			// heap. In this case, we know the object was
			// just allocated and hence will be marked by
			// allocation itself.
			// Find the object corresponding to the pointer and gray it
			ifobj, span, objIndex := findObject(obj, b, i); obj ! =0 {
				greyobject(obj, b, i, span, gcw, objIndex)
			}
		}
	}
	gcw.bytesMarked += uint64(n)
	gcw.scanWork += int64(i)
}
Copy the code

In summary, we can see that gray is a mark and we put it in the queue, and black is a mark, so when a gray object is taken out of the queue, we can think of it as a black object

At this point, the gcDrain marker work analysis is complete, and we continue back to the gcBgMarkWorker analysis

gcMarkDone

GcMarkDone will enter mark1 phase to Mark2 phase, and Mark2 phase to Mark Termination phase

Mark1 phase: Includes all root flags, global cache queues, and local cache queues

Mark2 phase: local cache queues are disabled

func gcMarkDone(a) {
top:
	semacquire(&work.markDoneSema)

	// Re-check transition condition under transition lock.
	if! (gcphase == _GCmark && work.nwait == work.nproc && ! gcMarkWorkAvailable(nil)) {
		semrelease(&work.markDoneSema)
		return
	}

	// Disallow starting new workers so that any remaining workers
	// in the current mark phase will drain out.
	//
	// TODO(austin): Should dedicated workers keep an eye on this
	// and exit gcDrain promptly?
	// Disable new marking tasks
	atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, -0xffffffff)
	prevFractionalGoal := gcController.fractionalUtilizationGoal
	gcController.fractionalUtilizationGoal = 0

	If the gCBlackenlike table name requires all local cache queues to be surrendered to the global queue immediately and local cache queues are disabled
	if! gcBlackenPromptly {// Transition from mark 1 to mark 2.
		//
		// The global work list is empty, but there can still be work
		// sitting in the per-P work caches.
		// Flush and disable work caches.

		// Disallow caching workbufs and indicate that we're in mark 2.
		// Disable the local cache queue and enter the mark2 phase
		gcBlackenPromptly = true

		// Prevent completion of mark 2 until we've flushed
		// cached workbufs.
		atomic.Xadd(&work.nwait, - 1)

		// GC is set up for mark 2. Let Gs blocked on the
		// transition lock go while we flush caches.
		semrelease(&work.markDoneSema)
		// Switch to g0, local cache upload to global operation
		systemstack(func(a) {
			// Flush all currently cached workbufs and
			// ensure all Ps see gcBlackenPromptly. This
			// also blocks until any remaining mark 1
			// workers have exited their loop so we can
			// start new mark 2 workers.
			forEachP(func(_p_ *p) {
				wbBufFlush1(_p_)
				_p_.gcw.dispose()
			})
		})

		// Check that roots are marked. We should be able to
		// do this before the forEachP, but based on issue
		// #16083 there may be a (harmless) race where we can
		// enter mark 2 while some workers are still scanning
		// stacks. The forEachP ensures these scans are done.
		//
		// TODO(austin): Figure out the race and fix this
		// properly.
		// Check that all root files are marked
		gcMarkRootCheck()

		// Now we can start up mark 2 workers.
		atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 0xffffffff)
		gcController.fractionalUtilizationGoal = prevFractionalGoal

		incnwait := atomic.Xadd(&work.nwait, +1)
		// If there are no more tasks, a second call is made to transition from the Mark2 phase to the Mark Termination phase
		ifincnwait == work.nproc && ! gcMarkWorkAvailable(nil) {
			// This loop will make progress because
			// gcBlackenPromptly is now true, so it won't
			// take this same "if" branch.
			goto top
		}
	} else {
		// Transition to mark termination.
		now := nanotime()
		work.tMarkTerm = now
		work.pauseStart = now
		getg().m.preemptoff = "gcing"
		if trace.enabled {
			traceGCSTWStart(0)
		}
		systemstack(stopTheWorldWithSema)
		// The gcphase is _GCmark, it will transition to _GCmarktermination
		// below. The important thing is that the wb remains active until
		// all marking is complete. This includes writes made by the GC.

		// Record that one root marking pass has completed.
		work.markrootDone = true

		// Disable assists and background workers. We must do
		// this before waking blocked assists.
		atomic.Store(&gcBlackenEnabled, 0)

		// Wake all blocked assists. These will run when we
		// start the world again.
		// Wake up all auxiliary GC
		gcWakeAllAssists()

		// Likewise, release the transition lock. Blocked
		// workers and assists will run when we start the
		// world again.
		semrelease(&work.markDoneSema)

		// endCycle depends on all gcWork cache stats being
		// flushed. This is ensured by mark 2.
		// Calculate the threshold for the next GC departure
		nextTriggerRatio := gcController.endCycle()

		// Perform mark termination. This will restart the world.
		// Start the world and enter the completion phase
		gcMarkTermination(nextTriggerRatio)
	}
}
Copy the code
gcMarkTermination

Finish marking and clean up

func gcMarkTermination(nextTriggerRatio float64) {
	// World is stopped.
	// Start marktermination which includes enabling the write barrier.
	atomic.Store(&gcBlackenEnabled, 0)
	gcBlackenPromptly = false
	// Set the GC phase identifier
	setGCPhase(_GCmarktermination)

	work.heap1 = memstats.heap_live
	startTime := nanotime()

	mp := acquirem()
	mp.preemptoff = "gcing"
	_g_ := getg()
	_g_.m.traceback = 2
	gp := _g_.m.curg
	// Set the state of g to waiting
	casgstatus(gp, _Grunning, _Gwaiting)
	gp.waitreason = waitReasonGarbageCollection

	// Run gc on the g0 stack. We do this so that the g stack
	// we're currently running on will no longer change. Cuts
	// the root set down a bit (g0 stacks are not scanned, and
	// we don't need to scan gc's internal state). We also
	// need to switch to g0 so we can shrink the stack.
	systemstack(func(a) {
		// Scan the current g stack through g0
		gcMark(startTime)
		// Must return immediately.
		// The outer function's stack may have moved
		// during gcMark (it shrinks stacks, including the
		// outer function's stack), so we must not refer
		// to any of its variables. Return back to the
		// non-system stack to pick up the new addresses
		// before continuing.
	})

	systemstack(func(a) {
		work.heap2 = work.bytesMarked
		if debug.gccheckmark > 0 {
			// Run a full stop-the-world mark using checkmark bits,
			// to check that we didn't forget to mark anything during
			// the concurrent mark process.
			// If gccheckmark is enabled, all reachable objects are marked
			gcResetMarkState()
			initCheckmarks()
			gcMark(startTime)
			clearCheckmarks()
		}

		// marking is complete so we can turn the write barrier off
		// Set the gc phase identifier, GCoff will turn off the write barrier
		setGCPhase(_GCoff)
		// Start cleaning
		gcSweep(work.mode)

		if debug.gctrace > 1 {
			startTime = nanotime()
			// The g stacks have been scanned so
			// they have gcscanvalid==true and gcworkdone==true.
			// Reset these so that all stacks will be rescanned.
			gcResetMarkState()
			finishsweep_m()

			// Still in STW but gcphase is _GCoff, reset to _GCmarktermination
			// At this point all objects will be found during the gcMark which
			// does a complete STW mark and object scan.
			setGCPhase(_GCmarktermination)
			gcMark(startTime)
			setGCPhase(_GCoff) // marking is done, turn off wb.
			gcSweep(work.mode)
		}
	})

	_g_.m.traceback = 0
	casgstatus(gp, _Gwaiting, _Grunning)

	if trace.enabled {
		traceGCDone()
	}

	// all done
	mp.preemptoff = ""

	ifgcphase ! = _GCoff { throw("gc done but gcphase ! = _GCoff")}// Update GC trigger and pacing for the next cycle.
	// Update the gc growth ratio for the next departure
	gcSetTriggerRatio(nextTriggerRatio)

	// Update timing memstats
	// Update time
	now := nanotime()
	sec, nsec, _ := time_now()
	unixNow := sec*1e9 + int64(nsec)
	work.pauseNS += now - work.pauseStart
	work.tEnd = now
	atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user
	atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us
	memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS)
	memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow)
	memstats.pause_total_ns += uint64(work.pauseNS)

	// Update work.totaltime.
	sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm)
	// We report idle marking time below, but omit it from the
	// overall utilization here since it's "free".
	markCpu := gcController.assistTime + gcController.dedicatedMarkTime + gcController.fractionalMarkTime
	markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm)
	cycleCpu := sweepTermCpu + markCpu + markTermCpu
	work.totaltime += cycleCpu

	// Compute overall GC CPU utilization.
	totalCpu := sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs)
	memstats.gc_cpu_fraction = float64(work.totaltime) / float64(totalCpu)

	// Reset sweep state.
	// Reset the state of the sweep
	sweep.nbgsweep = 0
	sweep.npausesweep = 0

	// If gc is forcibly enabled, the identifier is increased
	if work.userForced {
		memstats.numforcedgc++
	}

	// Bump GC cycle count and wake goroutines waiting on sweep.
	// Count the number of times GC is performed and wake up G waiting to be cleaned
	lock(&work.sweepWaiters.lock)
	memstats.numgc++
	injectglist(work.sweepWaiters.head.ptr())
	work.sweepWaiters.head = 0
	unlock(&work.sweepWaiters.lock)

	// Finish the current heap profiling cycle and start a new
	// heap profiling cycle. We do this before starting the world
	// so events don't leak into the wrong cycle.
	mProf_NextCycle()
	// start the world
	systemstack(func(a) { startTheWorldWithSema(true)})// Flush the heap profile so we can start a new cycle next GC.
	// This is relatively expensive, so we don't do it with the
	// world stopped.
	mProf_Flush()

	// Prepare workbufs for freeing by the sweeper. We do this
	// asynchronously because it can take non-trivial time.
	prepareFreeWorkbufs()

	// Free stack spans. This must be done between GC cycles.
	systemstack(freeStackSpans)

	// Print gctrace before dropping worldsema. As soon as we drop
	// worldsema another cycle could start and smash the stats
	// we're trying to print.
	if debug.gctrace > 0 {
		util := int(memstats.gc_cpu_fraction * 100)

		var sbuf [24]byte
		printlock()
		print("gc ", memstats.numgc,
			"@".string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6.3)), "s ",
			util, "%.")
		prev := work.tSweepTerm
		for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} {
			ifi ! =0 {
				print("+")}print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev))))
			prev = ns
		}
		print(" ms clock, ")
		for i, ns := range []int64{sweepTermCpu, gcController.assistTime, gcController.dedicatedMarkTime + gcController.fractionalMarkTime, gcController.idleMarkTime, markTermCpu} {
			if i == 2 || i == 3 {
				// Separate mark time components with /.
				print("/")}else ifi ! =0 {
				print("+")}print(string(fmtNSAsMS(sbuf[:], uint64(ns))))
		}
		print(" ms cpu, ",
			work.heap0>>20."- >", work.heap1>>20."- >", work.heap2>>20." MB, ",
			work.heapGoal>>20." MB goal, ",
			work.maxprocs, " P")
		if work.userForced {
			print(" (forced)")}print("\n")
		printunlock()
	}

	semrelease(&worldsema)
	// Careful: another GC cycle may start now.

	releasem(mp)
	mp = nil

	// now that gc is done, kick off finalizer thread if needed
	// If it is not parallel GC, let the current M start scheduling
	if! concurrentSweep {// give the queued finalizers, if any, a chance to run
		Gosched()
	}
}
Copy the code
goSweep

Cleaning tasks

func gcSweep(mode gcMode) {
	ifgcphase ! = _GCoff { throw("gcSweep being done but phase is not GCoff")
	}

	lock(&mheap_.lock)
	// Sweepgen grows by 2 after every GC, and sweepSpans change roles after every GC
	mheap_.sweepgen += 2
	mheap_.sweepdone = 0
	if mheap_.sweepSpans[mheap_.sweepgen/2%2].index ! =0 {
		// We should have drained this list during the last
		// sweep phase. We certainly need to start this phase
		// with an empty swept list.
		throw("non-empty swept list")
	}
	mheap_.pagesSwept = 0
	unlock(&mheap_.lock)
	// If not parallel GC, or force GC
	if! _ConcurrentSweep || mode == gcForceBlockMode {// Special case synchronous sweep.
		// Record that no proportional sweeping has to happen.
		lock(&mheap_.lock)
		mheap_.sweepPagesPerByte = 0
		unlock(&mheap_.lock)
		// Sweep all spans eagerly.
		// Clear all span
		forsweepone() ! = ^uintptr(0) {
			sweep.npausesweep++
		}
		// Free workbufs eagerly.
		// Release all workbufs
		prepareFreeWorkbufs()
		for freeSomeWbufs(false) {}// All "free" events for this mark/sweep cycle have
		// now happened, so we can make this profile cycle
		// available immediately.
		mProf_NextCycle()
		mProf_Flush()
		return
	}

	// Background sweep.
	lock(&sweep.lock)
	// Wake up the background sweep, also known as the bgsweep function, similar to the non-parallel sweep above
	if sweep.parked {
		sweep.parked = false
		ready(sweep.g, 0.true)
	}
	unlock(&sweep.lock)
}
Copy the code
sweepone

Let’s take a look at the sweepone process

func sweepone(a) uintptr {
	_g_ := getg()
	sweepRatio := mheap_.sweepPagesPerByte // For debugging

	// increment locks to ensure that the goroutine is not preempted
	// in the middle of sweep thus leaving the span in an inconsistent state for next GC
	_g_.m.locks++
	// Check to see if cleaning is complete
	ifatomic.Load(&mheap_.sweepdone) ! =0 {
		_g_.m.locks--
		return ^uintptr(0)}// Increase the number of cleaning workers
	atomic.Xadd(&mheap_.sweepers, +1)

	npages := ^uintptr(0)
	sg := mheap_.sweepgen
	for {
		// Loop to get the span to clean
		s := mheap_.sweepSpans[1-sg/2%2].pop()
		if s == nil {
			atomic.Store(&mheap_.sweepdone, 1)
			break
		}
		ifs.state ! = mSpanInUse {// This can happen if direct sweeping already
			// swept this span, but in that case the sweep
			// generation should always be up-to-date.
			ifs.sweepgen ! = sg {print("runtime: bad span s.state=", s.state, " s.sweepgen=", s.sweepgen, " sweepgen=", sg, "\n")
				throw("non in-use span in unswept list")}continue
		}
		// sweepgen == h->sweepgen - 2, indicating that the span needs to be cleaned
		// sweepgen == h->sweepgen - 1, which indicates that the span is being cleaned
		// This is to determine the state of the span and try to convert the state of the span
		ifs.sweepgen ! = sg2 -| |! atomic.Cas(&s.sweepgen, sg2 -, sg- 1) {
			continue
		}
		npages = s.npages
		// Sweep a single span
		if! s.sweep(false) {
			// Span is still in-use, so this returned no
			// pages to the heap and the span needs to
			// move to the swept in-use list.
			npages = 0
		}
		break
	}

	// Decrement the number of active sweepers and if this is the
	// last one print trace information.
	// The number of sweepers is updated when the current worker's cleaning task is complete
	if atomic.Xadd(&mheap_.sweepers, - 1) = =0&& atomic.Load(&mheap_.sweepdone) ! =0 {
		if debug.gcpacertrace > 0 {
			print("pacer: sweep done at heap size ", memstats.heap_live>>20."MB; allocated ", (memstats.heap_live-mheap_.sweepHeapLiveBasis)>>20."MB during sweep; swept ", mheap_.pagesSwept, " pages at ", sweepRatio, " pages/byte\n")
		}
	}
	_g_.m.locks--
	return npages
}
Copy the code
mspan.sweep
func (s *mspan) sweep(preserve bool) bool {
	// It's critical that we enter this function with preemption disabled,
	// GC must not start while we are in the middle of this function.
	_g_ := getg()
	if _g_.m.locks == 0 && _g_.m.mallocing == 0&& _g_ ! = _g_.m.g0 { throw("MSpan_Sweep: m is not locked")
	}
	sweepgen := mheap_.sweepgen
	// Only the span in the clearing state can be executed properly
	ifs.state ! = mSpanInUse || s.sweepgen ! = sweepgen- 1 {
		print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
		throw("MSpan_Sweep: bad span state")}if trace.enabled {
		traceGCSweepSpan(s.npages * _PageSize)
	}
	// Update the number of pages cleaned first
	atomic.Xadd64(&mheap_.pagesSwept, int64(s.npages))

	spc := s.spanclass
	size := s.elemsize
	res := false

	c := _g_.m.mcache
	freeToHeap := false

	// The allocBits indicate which unmarked objects don't need to be
	// processed since they were free at the end of the last GC cycle
	// and were not allocated since then.
	// If the allocBits index is >= s.freeindex and the bit
	// is not marked then the object remains unallocated
	// since the last GC.
	// This situation is analogous to being on a freelist.

	// Unlink & free special records for any objects we're about to free.
	// Two complications here:
	// 1. An object can have both finalizer and profile special records.
	// In such case we need to queue finalizer for execution,
	// mark the object as live and preserve the profile special.
	// 2. A tiny object can have several finalizers setup for different offsets.
	// If such object is not marked, we need to queue all finalizers at once.
	// Both 1 and 2 are possible at the same time.
	specialp := &s.specials
	special := *specialp
	// Determine whether the objects in special exist and have at least one Finalizer. Release the objects that do not have Finalizer and queue the objects that have Finalizer
	forspecial ! =nil {
		// A finalizer can be set for an inner byte of an object, find object beginning.
		objIndex := uintptr(special.offset) / size
		p := s.base() + objIndex*size
		mbits := s.markBitsForIndex(objIndex)
		if! mbits.isMarked() {// This object is not marked and has at least one special record.
			// Pass 1: see if it has at least one finalizer.
			hasFin := false
			endOffset := p - s.base() + size
			fortmp := special; tmp ! =nil && uintptr(tmp.offset) < endOffset; tmp = tmp.next {
				if tmp.kind == _KindSpecialFinalizer {
					// Stop freeing of object if it has a finalizer.
					mbits.setMarkedNonAtomic()
					hasFin = true
					break}}// Pass 2: queue all finalizers _or_ handle profile record.
			forspecial ! =nil && uintptr(special.offset) < endOffset {
				// Find the exact byte for which the special was setup
				// (as opposed to object beginning).
				p := s.base() + uintptr(special.offset)
				ifspecial.kind == _KindSpecialFinalizer || ! hasFin {// Splice out special record.
					y := special
					special = special.next
					*specialp = special
					freespecial(y, unsafe.Pointer(p), size)
				} else {
					// This is profile record, but the object has finalizers (so kept alive).
					// Keep special record.
					specialp = &special.next
					special = *specialp
				}
			}
		} else {
			// object is still live: keep special record
			specialp = &special.next
			special = *specialp
		}
	}

	ifdebug.allocfreetrace ! =0 || raceenabled || msanenabled {
		// Find all newly freed objects. This doesn't have to
		// efficient; allocfreetrace has massive overhead.
		mbits := s.markBitsForBase()
		abits := s.allocBitsForIndex(0)
		for i := uintptr(0); i < s.nelems; i++ {
			if! mbits.isMarked() && (abits.index < s.freeindex || abits.isMarked()) { x := s.base() + i*s.elemsizeifdebug.allocfreetrace ! =0 {
					tracefree(unsafe.Pointer(x), size)
				}
				if raceenabled {
					racefree(unsafe.Pointer(x), size)
				}
				if msanenabled {
					msanfree(unsafe.Pointer(x), size)
				}
			}
			mbits.advance()
			abits.advance()
		}
	}

	// Count the number of free objects in this span.
	// Get the total number of alloc objects to be released
	nalloc := uint16(s.countAlloc())
	// If sizeclass is 0 and the total amount allocated is 0, it is released to the Mheap
	if spc.sizeclass() == 0 && nalloc == 0 {
		s.needzero = 1
		freeToHeap = true
	}
	nfreed := s.allocCount - nalloc
	if nalloc > s.allocCount {
		print("runtime: nelems=", s.nelems, " nalloc=", nalloc, " previous allocCount=", s.allocCount, " nfreed=", nfreed, "\n")
		throw("sweep increased allocation count")
	}

	s.allocCount = nalloc
	// Check whether the span is empty
	wasempty := s.nextFreeIndex() == s.nelems
	/ / reset freeindex
	s.freeindex = 0 // reset allocation index to start of span.
	if trace.enabled {
		getg().m.p.ptr().traceReclaimed += uintptr(nfreed) * s.elemsize
	}

	// gcmarkBits becomes the allocBits.
	// get a fresh cleared gcmarkBits in preparation for next GC
	// Reset allocBits to gcMarkBits
	s.allocBits = s.gcmarkBits
	/ / reset gcMarkBits
	s.gcmarkBits = newMarkBits(s.nelems)

	// Initialize alloc bits cache.
	/ / update the allocCache
	s.refillAllocCache(0)

	// We need to set s.sweepgen = h.sweepgen only when all blocks are swept,
	// because of the potential for a concurrent free/SetFinalizer.
	// But we need to set it before we make the span available for allocation
	// (return it to heap or mcentral), because allocation code assumes that a
	// span is already swept if available for allocation.
	if freeToHeap || nfreed == 0 {
		// The span must be in our exclusive ownership until we update sweepgen,
		// check for potential races.
		ifs.state ! = mSpanInUse || s.sweepgen ! = sweepgen- 1 {
			print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
			throw("MSpan_Sweep: bad span state after sweep")}// Serialization point.
		// At this point the mark bits are cleared and allocation ready
		// to go so release the span.
		atomic.Store(&s.sweepgen, sweepgen)
	}

	if nfreed > 0&& spc.sizeclass() ! =0 {
		c.local_nsmallfree[spc.sizeclass()] += uintptr(nfreed)
		// Release span to McEntral
		res = mheap_.central[spc].mcentral.freeSpan(s, preserve, wasempty)
		// MCentral_FreeSpan updates sweepgen
	} else if freeToHeap {
		// Here is the span release for the large object, echoing line 117
		// Free large span to heap

		// NOTE(rsc,dvyukov): The original implementation of efence
		// in CL 22060046 used SysFree instead of SysFault, so that
		// the operating system would eventually give the memory
		// back to us again, so that an efence program could run
		// longer without running out of memory. Unfortunately,
		// calling SysFree here without any kind of adjustment of the
		// heap data structures means that when the memory does
		// come back to us, we have the wrong metadata for it, either in
		// the MSpan structures or in the garbage collection bitmap.
		// Using SysFault here means that the program will run out of
		// memory fairly quickly in efence mode, but at least it won't
		// have mysterious crashes due to confused memory reuse.
		// It should be possible to switch back to SysFree if we also
		// implement and then call some kind of MHeap_DeleteSpan.
		if debug.efence > 0 {
			s.limit = 0 // prevent mlookup from finding this span
			sysFault(unsafe.Pointer(s.base()), size)
		} else {
			// Release SAPN on the Mheap
			mheap_.freeSpan(s, 1)
		}
		c.local_nlargefree++
		c.local_largefree += size
		res = true
	}
	if! res {// The span has been swept and is still in-use, so put
		// it on the swept in-use list.
		// If the span is not released to McEntral or Mheap, the span is still in in-use state
		mheap_.sweepSpans[sweepgen/2%2].push(s)
	}
	return res
}
Copy the code

Ok, now that the GC flow of Go has been analyzed, it might be a little easier to understand with the top diagram

Reference documentation

  1. Golang source code exploration (three) GC implementation principle
  2. Notes on Language Learning in Go
  3. Learn about tricolor notation in one picture