Go GC has been criticized since its birth, but with the introduction of v1.5’s tricolor markup and v1.8’s hybrid write barrier, the normal GC has been reduced to around 10us, which is pretty good. Let’s explore the GC principle of Go
Three color marking principle
Let’s start by looking at a picture that gives us a general idea of the trichromatic notation:
Principle:
- First put all the objects in the white set
- The object is traversed from the root node, and the white object traversed is moved from the white set into the gray set
- Iterate over the objects in the gray set, putting the objects in the white set referenced by the gray object into the gray set, and putting the objects in the gray set traversed into the black set
- Loop through step 3, knowing that there are no objects in the gray set
- After Step 4, the objects in the white collection are unreachable objects, that is, garbage, and are collected
Write barriers
Go does not have a STW for its tricolor, which means that the object can still be modified
So let’s think about the following situation
We are scanning the gray set in A tricolor, and we find object A and mark all references to object A. At this point, we start scanning the references to object D. At this point, another Goroutine changes the references to D->E to look like the following figure
Will this result in E objects not being scanned and being mistaken for white objects, i.e. garbage
The write barrier is designed to solve this problem. After the introduction of the write barrier, E is considered alive after the above steps, even if it is discarded by A later, E will be reclaimed in the next GC, which will not be reclaimed
The hybrid write barrier is enabled in Go1.9. The pseudocode is as follows
writePointer(slot, ptr):
shade(*slot)
if any stack is grey:
shade(ptr)
*slot = ptr
Copy the code
The hybrid write barrier marks both the “old pointer” and the “new pointer” to which the pointer is written.
The reason for marking the original pointer is that other running threads may copy the value of the pointer to a register or local variable on the stack at the same time. Copying the pointer to a register or local variable on the stack does not pass the write barrier, so the pointer may not be marked.
[go] b = obj [go] oldx = nil [gc] scan oldx... [go] oldx = b.x // Copy b.x to local variables without writing barriers [go] b.x = PTR // Write barriers should mark the original value of b.x [gc] scan b... If the write barrier does not mark the original value, then OLDX will not be scanned.Copy the code
The reason for marking the new pointer is that other running threads might move the pointer. Consider the following:
[go] a = ptr [go] b = obj [gc] scan b... [go] b.x = a // Write barriers should mark b.x with new values [go] a = nil [gc] scan a... If the write barrier does not mark the new value, the PTR will not be scanned.Copy the code
The hybrid write barrier allows the GC to reduce STW time in Mark Termination by eliminating the need to rescan the stacks of each G after the parallel Mark ends
Except for the write barrier, all newly allocated objects immediately turn black during GC, as you can see in the mallocGC function above
The recycling process
GO’s GC is parallel GC, meaning that most of the GC’s processing is run at the same time as normal GO code, which complicates the GO GC process. First, GC has four phases, which are:
- Sweep Termination: Sweep unswept spans. A new GC can be started only after the Sweep of the previous GC is complete
- Mark: Scan all root objects, and all objects that the root object can reach, and Mark them not to be recycled
- Mark Termination: Complete the marking and rescan some of the root objects (STW required)
- Sweep: Sweeps the span according to the marked result
Here is a more complete GC flow, with the four stages sorted by color:
There are two types of background tasks (G) during GC, one for marking and one for cleaning. Background tasks for marking are started when needed, and the number of background tasks that can work at the same time is about 25% of the number of P’s, which is the basis for go to spend 25% of the CPU on GC. A background task for cleaning is started when the program starts and woken up when it enters the cleaning phase.
At present, The whole GC process will carry out STW(Stop The World) twice, The first is The beginning of Mark stage, and The second is The Mark Termination stage. The first STW prepares a scan of the root object, initiating a Write Barrier and mutator assist. The second STW will rescan some of the root objects, disabling Write barriers and mutator assist. Note that not all scans of root objects require STW, such as scanning objects on the stack simply by stopping G. The Write Barrier implementation uses a Hybrid Write Barrier, significantly reducing the time for the second STW.
Source code analysis
gcStart
func gcStart(mode gcMode, trigger gcTrigger) {
// Since this is called from malloc and malloc is called in
// the guts of a number of libraries that might be holding
// locks, don't attempt to start GC in non-preemptible or
// potentially unstable situations.
// Check whether g is preempt. GC is not triggered if g is not preempt
mp := acquirem()
if gp := getg(); gp == mp.g0 || mp.locks > 1|| mp.preemptoff ! ="" {
releasem(mp)
return
}
releasem(mp)
mp = nil
// Pick up the remaining unswept/not being swept spans concurrently
//
// This shouldn't happen if we're being invoked in background
// mode since proportional sweep should have just finished
// sweeping everything, but rounding errors, etc, may leave a
// few spans unswept. In forced mode, this is necessary since
// GC can be forced at any point in the sweeping cycle.
//
// We check the transition condition continuously here in case
// this G gets delayed in to the next GC cycle.
// Clean up the remaining uncleaned garbage
fortrigger.test() && gosweepone() ! = ^uintptr(0) {
sweep.nbgsweep++
}
// Perform GC initialization and the sweep termination
// transition.
semacquire(&work.startSema)
// Re-check transition condition under transition lock.
// Check whether the gcTrriger condition is true
if! trigger.test() { semrelease(&work.startSema)return
}
// For stats, check if this GC was forced by the user
Runtime.gc () can be called by the user and enforced
work.userForced = trigger.kind == gcTriggerAlways || trigger.kind == gcTriggerCycle
// In gcstoptheworld debug mode, upgrade the mode accordingly.
// We do this after re-checking the transition condition so
// that multiple goroutines that detect the heap trigger don't
// start multiple STW GCs.
// Set the mode for gc
if mode == gcBackgroundMode {
if debug.gcstoptheworld == 1 {
mode = gcForceMode
} else if debug.gcstoptheworld == 2 {
mode = gcForceBlockMode
}
}
// Ok, we're doing it! Stop everybody else
semacquire(&worldsema)
if trace.enabled {
traceGCStart()
}
// Start the background tag task
if mode == gcBackgroundMode {
gcBgMarkStartWorkers()
}
// Resets the state associated with gc flags
gcResetMarkState()
work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
if work.stwprocs > ncpu {
// This is used to compute CPU time of the STW phases,
// so it can't be more than ncpu, even if GOMAXPROCS is.
work.stwprocs = ncpu
}
work.heap0 = atomic.Load64(&memstats.heap_live)
work.pauseNS = 0
work.mode = mode
now := nanotime()
work.tSweepTerm = now
work.pauseStart = now
if trace.enabled {
traceGCSTWStart(1)}// STW, stop world
systemstack(stopTheWorldWithSema)
// Finish sweep before we start concurrent scan.
// Clean up the garbage from the previous round
systemstack(func(a) {
finishsweep_m()
})
// clearpools before we start the GC. If we wait they memory will not be
// reclaimed until the next GC cycle.
Sync. Pool sched. Sudogcache, sched. Deferpool
clearpools()
// Add GC technology
work.cycles++
if mode == gcBackgroundMode { // Do as much work concurrently as possible
gcController.startCycle()
work.heapGoal = memstats.next_gc
// Enter concurrent mark phase and enable
// write barriers.
//
// Because the world is stopped, all Ps will
// observe that write barriers are enabled by
// the time we start the world and begin
// scanning.
//
// Write barriers must be enabled before assists are
// enabled because they must be enabled before
// any non-leaf heap objects are marked. Since
// allocations are blocked until assists can
// happen, we want enable assists as early as
// possible.
// Set the GC state to gcMark
setGCPhase(_GCmark)
// Update the bgMark status
gcBgMarkPrepare() // Must happen before assist enable.
// Calculate and queue root scan tasks and initialize the status of related scan tasks
gcMarkRootPrepare()
// Mark all active tinyalloc blocks. Since we're
// allocating from these, they need to be black like
// other allocations. The alternative is to blacken
// the tiny block on every allocation from it, which
// would slow down the tiny allocator.
// Mark tiny objects
gcMarkTinyAllocs()
// At this point all Ps have enabled the write
// barrier, thus maintaining the no white to
// black invariant. Enable mutator assists to
// put back-pressure on fast allocating
// mutators.
// Set gcBlackenEnabled to 1 to enable write barriers
atomic.Store(&gcBlackenEnabled, 1)
// Assists and workers can start the moment we start
// the world.
gcController.markStartTime = now
// Concurrent mark.
systemstack(func(a) {
now = startTheWorldWithSema(trace.enabled)
})
work.pauseNS += now - work.pauseStart
work.tMark = now
} else {
// Non-parallel mode
// Record the start time of the completion marking phase
if trace.enabled {
// Switch to mark termination STW.
traceGCSTWDone()
traceGCSTWStart(0)
}
t := nanotime()
work.tMark, work.tMarkTerm = t, t
work.heapGoal = work.heap0
// Perform mark termination. This will restart the world.
// STW, mark, clean, and start the world
gcMarkTermination(memstats.triggerRatio)
}
semrelease(&work.startSema)
}
Copy the code
gcBgMarkStartWorkers
This function prepares some goroutines that perform BG mark work, but these goroutines do not work immediately, but do not work until the GC state is marked as gcMark, as shown in line 119 of the previous function
func gcBgMarkStartWorkers(a) {
// Background marking is performed by per-P G's. Ensure that
// each P has a background GC G.
for _, p := range allp {
if p.gcBgMarkWorker == 0 {
go gcBgMarkWorker(p)
// Wait for the bgMarkReady signal from gcBgMarkWorker goroutine to continue
notetsleepg(&work.bgMarkReady, - 1)
noteclear(&work.bgMarkReady)
}
}
}
Copy the code
gcBgMarkWorker
A function that marks tasks in the background
func gcBgMarkWorker(_p_ *p) {
gp := getg()
// Used to retrieve p and m after hibernation
type parkInfo struct {
m muintptr // Release this m on park.
attach puintptr // If non-nil, attach to this p on park.
}
// We pass park to a gopark unlock function, so it can't be on
// the stack (see gopark). Prevent deadlock from recursively
// starting GC by disabling preemption.
gp.m.preemptoff = "GC worker init"
park := new(parkInfo)
gp.m.preemptoff = ""
/ / set the park of m and p, stay behind to gopark, being gcController. FindRunnable sensei, easy to find
park.m.set(acquirem())
park.attach.set(_p_)
// Inform gcBgMarkStartWorkers that this worker is ready.
// After this point, the background mark worker is scheduled
// cooperatively by gcController.findRunnable. Hence, it must
// never be preempted, as this would put it into _Grunnable
// and put it on a run queue. Instead, when the preempt flag
// is set, this puts itself into _Gwaiting to be woken up by
// gcController.findRunnable at the appropriate time.
// Make gcBgMarkStartWorkers notetsleepg stop waiting and continue and exit
notewakeup(&work.bgMarkReady)
for {
// Go to sleep until woken by gcController.findRunnable.
// We can't releasem yet since even the call to gopark
// may be preempted.
// Let g go to sleep
gopark(func(g *g, parkp unsafe.Pointer) bool {
park := (*parkInfo)(parkp)
// The worker G is no longer running, so it's
// now safe to allow preemption.
// Release the currently preempted m
releasem(park.m.ptr())
// If the worker isn't attached to its P,
// attach now. During initialization and after
// a phase change, the worker may have been
// running on a different P. As soon as we
// attach, the owner P may schedule the
// worker, so this must be done after the G is
// stopped.
// set the association p, which is already set above
ifpark.attach ! =0 {
p := park.attach.ptr()
park.attach.set(nil)
// cas the worker because we may be
// racing with a new worker starting
// on this P.
if! p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) {
// The P got a new worker.
// Exit this worker.
return false}}return true
}, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0)
// Loop until the P dies and disassociates this
// worker (the P may later be reused, in which case
// it will get a new worker) or we failed to associate.
// Check whether the gcBgMarkWorker of P is the same as the current gcBgMarkWorker of G. If not, end the current task
if_p_.gcBgMarkWorker.ptr() ! = gp {break
}
// Disable preemption so we can use the gcw. If the
// scheduler wants to preempt us, we'll stop draining,
// dispose the gcw, and then preempt.
// Gopark releases m in the first function, and preempts back
park.m.set(acquirem())
if gcBlackenEnabled == 0 {
throw("gcBgMarkWorker: blackening not enabled")
}
startTime := nanotime()
// Set the start time of gcmark
_p_.gcMarkWorkerStartTime = startTime
decnwait := atomic.Xadd(&work.nwait, - 1)
if decnwait == work.nproc {
println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc)
throw("work.nwait was > work.nproc")}// Switch to g0
systemstack(func(a) {
// Mark our goroutine preemptible so its stack
// can be scanned. This lets two mark workers
// scan each other (otherwise, they would
// deadlock). We must not modify anything on
// the G stack. However, stack shrinking is
// disabled for mark workers, so it is safe to
// read from the G stack.
// Set G to be in a waiting state so that another G can scan its stack.
casgstatus(gp, _Grunning, _Gwaiting)
switch _p_.gcMarkWorkerMode {
default:
throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
case gcMarkWorkerDedicatedMode:
// Focus on executing the pattern of markup work
gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
if gp.preempt {
// all G's in the local runqueue are placed in the global runqueue
// We were preempted. This is
// a useful signal to kick
// everything out of the run
// queue so it can run
// somewhere else.
lock(&sched.lock)
for {
gp, _ := runqget(_p_)
if gp == nil {
break
}
globrunqput(gp)
}
unlock(&sched.lock)
}
// Go back to draining, this time
// without preemption.
// Continue with the marking
gcDrain(&_p_.gcw, gcDrainNoBlock|gcDrainFlushBgCredit)
case gcMarkWorkerFractionalMode:
// Perform marking until preempted
gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
case gcMarkWorkerIdleMode:
// Perform marking when free
gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
}
// change G from waiting to runing
casgstatus(gp, _Gwaiting, _Grunning)
})
// If we are nearing the end of mark, dispose
// of the cache promptly. We must do this
// before signaling that we're no longer
// working so that other workers can't observe
// no workers and no work while we have this
// cached, and before we compute done.
// Process the local cache in time and submit it to the global queue
if gcBlackenPromptly {
_p_.gcw.dispose()
}
// Account for time.
// Cumulative time
duration := nanotime() - startTime
switch _p_.gcMarkWorkerMode {
case gcMarkWorkerDedicatedMode:
atomic.Xaddint64(&gcController.dedicatedMarkTime, duration)
atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1)
case gcMarkWorkerFractionalMode:
atomic.Xaddint64(&gcController.fractionalMarkTime, duration)
atomic.Xaddint64(&_p_.gcFractionalMarkTime, duration)
case gcMarkWorkerIdleMode:
atomic.Xaddint64(&gcController.idleMarkTime, duration)
}
// Was this the last worker and did we run out
// of work?
incnwait := atomic.Xadd(&work.nwait, +1)
if incnwait > work.nproc {
println("runtime: p.gcMarkWorkerMode=", _p_.gcMarkWorkerMode,
"work.nwait=", incnwait, "work.nproc=", work.nproc)
throw("work.nwait > work.nproc")}// If this worker reached a background mark completion
// point, signal the main GC goroutine.
ifincnwait == work.nproc && ! gcMarkWorkAvailable(nil) {
// Make this G preemptible and disassociate it
// as the worker for this P so
// findRunnableGCWorker doesn't try to
// schedule it.
// remove the association between p and m
_p_.gcBgMarkWorker.set(nil)
releasem(park.m.ptr())
gcMarkDone()
// Disable preemption and prepare to reattach
// to the P.
//
// We may be running on a different P at this
// point, so we can't reattach until this G is
// parked.
park.m.set(acquirem())
park.attach.set(_p_)
}
}
}
Copy the code
gcDrain
The main implementation of tricolor marking
GcDrain scans all roots and objects and displays black-gray objects until all roots and objects are marked
func gcDrain(gcw *gcWork, flags gcDrainFlags) {
if! writeBarrier.needed { throw("gcDrain phase incorrect")
}
gp := getg().m.curg
// See if the preemption flag should be returnedpreemptible := flags&gcDrainUntilPreempt ! =0
// Whether to wait for tasks when there are no tasks
blocking := flags&(gcDrainUntilPreempt|gcDrainIdle|gcDrainFractional|gcDrainNoBlock) == 0
// Whether to calculate the amount of scanning in the background to reduce auxiliary GC and wake up waiting GflushBgCredit := flags&gcDrainFlushBgCredit ! =0
// Whether to perform marking tasks when idleidle := flags&gcDrainIdle ! =0
// Record the initial scan tasks that have been performed
initScanWork := gcw.scanWork
// checkWork is the scan work before performing the next
// self-preempt check.
// Set the work check function for the corresponding mode
checkWork := int64(1<<63 - 1)
var check func(a) bool
if flags&(gcDrainIdle|gcDrainFractional)! = 0 {
checkWork = initScanWork + drainCheckThreshold
if idle {
check = pollWork
} else ifflags&gcDrainFractional ! =0 {
check = pollFractionalWorkerExit
}
}
// Drain root marking jobs.
// If the root object is not completely scanned, it will be scanned
if work.markrootNext < work.markrootJobs {
for! (preemptible && gp.preempt) { job := atomic.Xadd(&work.markrootNext, +1) - 1
if job >= work.markrootJobs {
break
}
// Perform root scanning
markroot(gcw, job)
ifcheck ! =nil && check() {
goto done
}
}
}
// Drain heap marking jobs.
// loop until preempted
for! (preemptible && gp.preempt) {// Try to keep work available on the global queue. We used to
// check if there were waiting workers, but it's better to
// just keep work available than to make workers wait. In the
// worst case, we'll do O(log(_WorkbufSize)) unnecessary
// balances.
if work.full == 0 {
// If the global tag queue is empty, part of the work is done in the global queue
gcw.balance()
}
var b uintptr
if blocking {
b = gcw.get()
} else {
b = gcw.tryGetFast()
if b == 0 {
b = gcw.tryGet()
}
}
// Failed to get the task
if b == 0 {
// work barrier reached or tryGet failed.
break
}
// Scan the object
scanobject(b, gcw)
// Flush background scan work credit to the global
// account if we've accumulated enough locally so
// mutator assists can draw on it.
// If the current number of scanned objects exceeds gcCreditSlack, add the number of scanned objects to the global number and batch update
if gcw.scanWork >= gcCreditSlack {
atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
if flushBgCredit {
gcFlushBgCredit(gcw.scanWork - initScanWork)
initScanWork = 0
}
checkWork -= gcw.scanWork
gcw.scanWork = 0
// If the number of objects scanned has reached the target number of checkWork for the next preemption, the function of the corresponding mode is called
// The idle mode is pollWork, and the Fractional mode is pollFractionalWorkerExit, in line 20
if checkWork <= 0 {
checkWork += drainCheckThreshold
ifcheck ! =nil && check() {
break}}}}// In blocking mode, write barriers are not allowed after this
// point because we must preserve the condition that the work
// buffers are empty.
done:
// Flush remaining scan work credit.
if gcw.scanWork > 0 {
// Add the number of scanned objects to the global
atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
if flushBgCredit {
gcFlushBgCredit(gcw.scanWork - initScanWork)
}
gcw.scanWork = 0}}Copy the code
markroot
This is used for root object scanning
func markroot(gcw *gcWork, i uint32) {
// TODO(austin): This is a bit ridiculous. Compute and store
// the bases in gcMarkRootPrepare instead of the counts.
baseFlushCache := uint32(fixedRootCount)
baseData := baseFlushCache + uint32(work.nFlushCacheRoots)
baseBSS := baseData + uint32(work.nDataRoots)
baseSpans := baseBSS + uint32(work.nBSSRoots)
baseStacks := baseSpans + uint32(work.nSpanRoots)
end := baseStacks + uint32(work.nStackRoots)
// Note: if you add a case here, please also update heapdump.go:dumproots.
switch {
// Release span in McAche
case baseFlushCache <= i && i < baseData:
flushmcache(int(i - baseFlushCache))
// Scan for global variables that can be read and written
case baseData <= i && i < baseBSS:
for _, datap := range activeModules() {
markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-baseData))
}
// Scans the read-only global queue
case baseBSS <= i && i < baseSpans:
for _, datap := range activeModules() {
markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-baseBSS))
}
// Scan the Finalizer queue
case i == fixedRootFinalizers:
// Only do this once per GC cycle since we don't call
// queuefinalizer during marking.
if work.markrootDone {
break
}
forfb := allfin; fb ! =nil; fb = fb.alllink {
cnt := uintptr(atomic.Load(&fb.cnt))
scanblock(uintptr(unsafe.Pointer(&fb.fin[0])), cnt*unsafe.Sizeof(fb.fin[0]), &finptrmask[0], gcw)
}
// Release the terminated stack
case i == fixedRootFreeGStacks:
// Only do this once per GC cycle; preferably
// concurrently.
if! work.markrootDone {// Switch to the system stack so we can call
// stackfree.
systemstack(markrootFreeGStacks)
}
/ / scanning MSpan. Specials
case baseSpans <= i && i < baseStacks:
// mark MSpan.specials
markrootSpans(gcw, int(i-baseSpans))
default:
// the rest is scanning goroutine stacks
// Get the g to scan
var gp *g
if baseStacks <= i && i < end {
gp = allgs[i-baseStacks]
} else {
throw("markroot: bad index")}// remember when we've first observed the G blocked
// needed only to output in traceback
status := readgstatus(gp) // We are not in a scan state
if (status == _Gwaiting || status == _Gsyscall) && gp.waitsince == 0 {
gp.waitsince = work.tstart
}
// scang must be done on the system stack in case
// we're trying to scan our own stack.
// forward to G0 for scanning
systemstack(func(a) {
// If this is a self-scan, put the user G in
// _Gwaiting to prevent self-deadlock. It may
// already be in _Gwaiting if this is a mark
// worker or we're in mark termination.
userG := getg().m.curg
selfScan := gp == userG && readgstatus(userG) == _Grunning
// If it is a scan of its own, then switch the state of its own g
if selfScan {
casgstatus(userG, _Grunning, _Gwaiting)
userG.waitreason = waitReasonGarbageCollectionScan
}
// TODO: scang blocks until gp's stack has
// been scanned, which may take a while for
// running goroutines. Consider doing this in
// two phases where the first is non-blocking:
// we scan the stacks we can and ask running
// goroutines to scan themselves; and the
// second blocks.
// Scan g's stack
scang(gp, gcw)
if selfScan {
casgstatus(userG, _Gwaiting, _Grunning)
}
})
}
}
Copy the code
markRootBlock
Scan the [B0, B0 + N0) region according to pTRMask0
func markrootBlock(b0, n0 uintptr, ptrmask0 *uint8, gcw *gcWork, shard int) {
if rootBlockBytes%(8*sys.PtrSize) ! =0 {
// This is necessary to pick byte offsets in ptrmask0.
throw("rootBlockBytes must be a multiple of 8*ptrSize")
}
b := b0 + uintptr(shard)*rootBlockBytes
// If the block area to be scanned exceeds b0+n0, return directly
if b >= b0+n0 {
return
}
ptrmask := (*uint8)(add(unsafe.Pointer(ptrmask0), uintptr(shard)*(rootBlockBytes/(8*sys.PtrSize))))
n := uintptr(rootBlockBytes)
if b+n > b0+n0 {
n = b0 + n0 - b
}
// Scan this shard.
// Scans the shard of the given block
scanblock(b, n, ptrmask, gcw)
}
Copy the code
scanblock
func scanblock(b0, n0 uintptr, ptrmask *uint8, gcw *gcWork) {
// Use local copies of original parameters, so that a stack trace
// due to one of the throws below shows the original block
// base and extent.
b := b0
n := n0
for i := uintptr(0); i < n; {
// Find bits for the next word.
// Find the corresponding bits in the bitmap
bits := uint32(*addb(ptrmask, i/(sys.PtrSize*8)))
if bits == 0 {
i += sys.PtrSize * 8
continue
}
for j := 0; j < 8 && i < n; j++ {
if bits&1! =0 {
// If the address contains a pointer
// Same work as in scanobject; see comments there.
obj := *(*uintptr)(unsafe.Pointer(b + i))
ifobj ! =0 {
// If the corresponding object is found under this address, mark it with grey
ifobj, span, objIndex := findObject(obj, b, i); obj ! =0 {
greyobject(obj, b, i, span, gcw, objIndex)
}
}
}
bits >>= 1
i += sys.PtrSize
}
}
}
Copy the code
greyobject
A gray object is simply a bitmap that is found, marked alive and thrown into the queue
func greyobject(obj, base, off uintptr, span *mspan, gcw *gcWork, objIndex uintptr) {
// obj should be start of allocation, and so must be at least pointer-aligned.
if obj&(sys.PtrSize- 1) != 0 {
throw("greyobject: obj not pointer-aligned")
}
mbits := span.markBitsForIndex(objIndex)
if useCheckmark {
// This is used for debugging to ensure that all objects are correctly identified
if! mbits.isMarked() {// This object is not marked
printlock()
print("runtime:greyobject: checkmarks finds unexpected unmarked object obj=", hex(obj), "\n")
print("runtime: found obj at *(", hex(base), "+", hex(off), ")\n")
// Dump the source (base) object
gcDumpObject("base", base, off)
// Dump the object
gcDumpObject("obj", obj, ^uintptr(0))
getg().m.traceback = 2
throw("checkmark found unmarked object")
}
hbits := heapBitsForAddr(obj)
if hbits.isCheckmarked(span.elemsize) {
return
}
hbits.setCheckmarked(span.elemsize)
if! hbits.isCheckmarked(span.elemsize) { throw("setCheckmarked and isCheckmarked disagree")}}else {
if debug.gccheckmark > 0 && span.isFree(objIndex) {
print("runtime: marking free object ", hex(obj), " found at *(", hex(base), "+", hex(off), ")\n")
gcDumpObject("base", base, off)
gcDumpObject("obj", obj, ^uintptr(0))
getg().m.traceback = 2
throw("marking free object")}// If marked we have nothing to do.
// The object is marked correctly, no further action is required
if mbits.isMarked() {
return
}
// mbits.setMarked() // Avoid extra call overhead with manual inlining.
// Mark the object
atomic.Or8(mbits.bytep, mbits.mask)
// If this is a noscan object, fast-track it to black
// instead of greying it.
// If the object is not a pointer, it only needs to be marked. It does not need to be queued
if span.spanclass.noscan() {
gcw.bytesMarked += uint64(span.elemsize)
return}}// Queue the obj for scanning. The PREFETCH(obj) logic has been removed but
// seems like a nice optimization that can be added back in.
// There needs to be time between the PREFETCH and the use.
// Previously we put the obj in an 8 element buffer that is drained at a rate
// to give the PREFETCH time to do its work.
// Use of PREFETCHNTA might be more appropriate than PREFETCH
// Check whether the object is placed in the queue, if not, the gray step is complete
if! gcw.putFast(obj) { gcw.put(obj) } }Copy the code
gcWork.putFast
Work has two queues, wbuf1 and wbuf2, to store gray objects. First, it adds gray objects to the wBUf1 queue. When wBUf1 is full, wBUF1 and WBUf2 are switched, and wBUF2 is promoted to wBUf1
PutFast here is trying to put the object into the WBUF1 queue
func (w *gcWork) putFast(obj uintptr) bool {
wbuf := w.wbuf1
if wbuf == nil {
// No cache queue is requested, return false
return false
} else if wbuf.nobj == len(wbuf.obj) {
// wbuf1 queue is full, return false
return false
}
// Add an object to the unfilled WBUF1 queue
wbuf.obj[wbuf.nobj] = obj
wbuf.nobj++
return true
}
Copy the code
gcWork.put
When wBUF1 is full, it tries to change the role of wBUf1 and WBUf2. If wBUF1 is full, it applies to the global queue and submits the queue to the global queue
func (w *gcWork) put(obj uintptr) {
flushed := false
wbuf := w.wbuf1
if wbuf == nil {
// If wbuf1 does not exist, initialize wbuf1 and wbuf2
w.init()
wbuf = w.wbuf1
// wbuf is empty at this point.
} else if wbuf.nobj == len(wbuf.obj) {
// Wbuf1 is full, change the role of wbuf1 wbuf2
w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
wbuf = w.wbuf1
if wbuf.nobj == len(wbuf.obj) {
// WBUF1 is also full after the role change, indicating that both queues are full
// submit wbuf1 globally and get an empty queue
putfull(wbuf)
wbuf = getempty()
w.wbuf1 = wbuf
// Set the flag bit to be handed by the queue
flushed = true
}
}
wbuf.obj[wbuf.nobj] = obj
wbuf.nobj++
// If we put a buffer on full, let the GC controller know so
// it can encourage more workers to run. We delay this until
// the end of put so that w is in a consistent state, since
// enlistWorker may itself manipulate w.
// The GC controller selects to schedule more work
if flushed && gcphase == _GCmark {
gcController.enlistWorker()
}
}
Copy the code
From there, let’s continue analyzing the function inside gcDrain and trace how the objects we gray are marked black
gcw.balance()
Moving on to line 58 of gcDrain, what is balance work
func (w *gcWork) balance(a) {
if w.wbuf1 == nil {
// The wBUF1 wBUf2 queue is not initialized yet
return
}
// If wbuf2 is not empty, it is submitted globally and gets an empty island queue to wbuf2
ifwbuf := w.wbuf2; wbuf.nobj ! =0 {
putfull(wbuf)
w.wbuf2 = getempty()
} else if wbuf := w.wbuf1; wbuf.nobj > 4 {
// Divide the unfilled WBUF1 into two parts and submit half of them to the global queue
w.wbuf1 = handoff(wbuf)
} else {
return
}
// We flushed a buffer to the full list, so wake a worker.
// The global queue is full, and other works can work
if gcphase == _GCmark {
gcController.enlistWorker()
}
}
Copy the code
gcw.get()
Continuing with line 63 of gcDrain, this is done by first getting an object from the local queue, if the local queue has no WBUF1, then trying to get an object from wbuf2, and if neither, trying to get a full queue from the global queue and getting an object
func (w *gcWork) get(a) uintptr {
wbuf := w.wbuf1
if wbuf == nil {
w.init()
wbuf = w.wbuf1
// wbuf is empty at this point.
}
if wbuf.nobj == 0 {
// Wbuf1 is empty, change the wbuf1 wbuf2 role
w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
wbuf = w.wbuf1
// Wbuf2 is also empty, try to get a full queue from the global queue
if wbuf.nobj == 0 {
owbuf := wbuf
wbuf = getfull()
// If no, return
if wbuf == nil {
return 0
}
// Upload the empty queue to the global empty queue and use the full queue as its own wbuf1
putempty(owbuf)
w.wbuf1 = wbuf
}
}
// TODO: This might be a good place to add prefetch code
wbuf.nobj--
return wbuf.obj[wbuf.nobj]
}
Copy the code
Gcw.tryget () gcw.trygetFast ()
scanobject
We continue to analyze L76 of gcDrain, where B has been obtained and the consumption queue has started
func scanobject(b uintptr, gcw *gcWork) {
// Find the bits for b and the size of the object at b.
//
// b is either the beginning of an object, in which case this
// is the size of the object to scan, or it points to an
// oblet, in which case we compute the size to scan below.
// Get the bits corresponding to b
hbits := heapBitsForAddr(b)
// Get the span of b
s := spanOfUnchecked(b)
n := s.elemsize
if n == 0 {
throw("scanobject n == 0")}// If the object is too large, cut and scan again. MaxObletBytes is 128K
if n > maxObletBytes {
// Large object. Break into oblets for better
// parallelism and lower latency.
if b == s.base() {
// It's possible this is a noscan object (not
// from greyobject, but from other code
// paths), in which case we must *not* enqueue
// oblets since their bitmaps will be
// uninitialized.
// If it is not a pointer, it is marked black
if s.spanclass.noscan() {
// Bypass the whole scan.
gcw.bytesMarked += uint64(n)
return
}
// Enqueue the other oblets to scan later.
// Some oblets may be in b's scalar tail, but
// these will be marked as "no more pointers",
// so we'll drop out immediately when we go to
// scan those.
// Press maxObletBytes to cut and queue
for oblet := b + maxObletBytes; oblet < s.base()+s.elemsize; oblet += maxObletBytes {
if! gcw.putFast(oblet) { gcw.put(oblet) } } }// Compute the size of the oblet. Since this object
// must be a large object, s.base() is the beginning
// of the object.
n = s.base() + s.elemsize - b
if n > maxObletBytes {
n = maxObletBytes
}
}
var i uintptr
for i = 0; i < n; i += sys.PtrSize {
// Find bits for this word.
// Get the corresponding bits
ifi ! =0 {
// Avoid needless hbits.next() on last iteration.
hbits = hbits.next()
}
// Load bits once. See CL 22712 and issue 16973 for discussion.
bits := hbits.bits()
// During checkmarking, 1-word objects store the checkmark
// in the type bit for the one word. The only one-word objects
// are pointers, or else they'd be merged with other non-pointer
// data into larger allocations.
ifi ! =1*sys.PtrSize && bits&bitScan == 0 {
break // no more pointers in this object
}
// Not a pointer, continue
if bits&bitPointer == 0 {
continue // not a pointer
}
// Work here is duplicated in scanblock and above.
// If you make changes here, make changes there too.
obj := *(*uintptr)(unsafe.Pointer(b + i))
// At this point we have extracted the next potential pointer.
// Quickly filter out nil and pointers back to the current object.
ifobj ! =0 && obj-b >= n {
// Test if obj points into the Go heap and, if so,
// mark the object.
//
// Note that it's possible for findObject to
// fail if obj points to a just-allocated heap
// object because of a race with growing the
// heap. In this case, we know the object was
// just allocated and hence will be marked by
// allocation itself.
// Find the object corresponding to the pointer and gray it
ifobj, span, objIndex := findObject(obj, b, i); obj ! =0 {
greyobject(obj, b, i, span, gcw, objIndex)
}
}
}
gcw.bytesMarked += uint64(n)
gcw.scanWork += int64(i)
}
Copy the code
In summary, we can see that gray is a mark and we put it in the queue, and black is a mark, so when a gray object is taken out of the queue, we can think of it as a black object
At this point, the gcDrain marker work analysis is complete, and we continue back to the gcBgMarkWorker analysis
gcMarkDone
GcMarkDone will enter mark1 phase to Mark2 phase, and Mark2 phase to Mark Termination phase
Mark1 phase: Includes all root flags, global cache queues, and local cache queues
Mark2 phase: local cache queues are disabled
func gcMarkDone(a) {
top:
semacquire(&work.markDoneSema)
// Re-check transition condition under transition lock.
if! (gcphase == _GCmark && work.nwait == work.nproc && ! gcMarkWorkAvailable(nil)) {
semrelease(&work.markDoneSema)
return
}
// Disallow starting new workers so that any remaining workers
// in the current mark phase will drain out.
//
// TODO(austin): Should dedicated workers keep an eye on this
// and exit gcDrain promptly?
// Disable new marking tasks
atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, -0xffffffff)
prevFractionalGoal := gcController.fractionalUtilizationGoal
gcController.fractionalUtilizationGoal = 0
If the gCBlackenlike table name requires all local cache queues to be surrendered to the global queue immediately and local cache queues are disabled
if! gcBlackenPromptly {// Transition from mark 1 to mark 2.
//
// The global work list is empty, but there can still be work
// sitting in the per-P work caches.
// Flush and disable work caches.
// Disallow caching workbufs and indicate that we're in mark 2.
// Disable the local cache queue and enter the mark2 phase
gcBlackenPromptly = true
// Prevent completion of mark 2 until we've flushed
// cached workbufs.
atomic.Xadd(&work.nwait, - 1)
// GC is set up for mark 2. Let Gs blocked on the
// transition lock go while we flush caches.
semrelease(&work.markDoneSema)
// Switch to g0, local cache upload to global operation
systemstack(func(a) {
// Flush all currently cached workbufs and
// ensure all Ps see gcBlackenPromptly. This
// also blocks until any remaining mark 1
// workers have exited their loop so we can
// start new mark 2 workers.
forEachP(func(_p_ *p) {
wbBufFlush1(_p_)
_p_.gcw.dispose()
})
})
// Check that roots are marked. We should be able to
// do this before the forEachP, but based on issue
// #16083 there may be a (harmless) race where we can
// enter mark 2 while some workers are still scanning
// stacks. The forEachP ensures these scans are done.
//
// TODO(austin): Figure out the race and fix this
// properly.
// Check that all root files are marked
gcMarkRootCheck()
// Now we can start up mark 2 workers.
atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 0xffffffff)
gcController.fractionalUtilizationGoal = prevFractionalGoal
incnwait := atomic.Xadd(&work.nwait, +1)
// If there are no more tasks, a second call is made to transition from the Mark2 phase to the Mark Termination phase
ifincnwait == work.nproc && ! gcMarkWorkAvailable(nil) {
// This loop will make progress because
// gcBlackenPromptly is now true, so it won't
// take this same "if" branch.
goto top
}
} else {
// Transition to mark termination.
now := nanotime()
work.tMarkTerm = now
work.pauseStart = now
getg().m.preemptoff = "gcing"
if trace.enabled {
traceGCSTWStart(0)
}
systemstack(stopTheWorldWithSema)
// The gcphase is _GCmark, it will transition to _GCmarktermination
// below. The important thing is that the wb remains active until
// all marking is complete. This includes writes made by the GC.
// Record that one root marking pass has completed.
work.markrootDone = true
// Disable assists and background workers. We must do
// this before waking blocked assists.
atomic.Store(&gcBlackenEnabled, 0)
// Wake all blocked assists. These will run when we
// start the world again.
// Wake up all auxiliary GC
gcWakeAllAssists()
// Likewise, release the transition lock. Blocked
// workers and assists will run when we start the
// world again.
semrelease(&work.markDoneSema)
// endCycle depends on all gcWork cache stats being
// flushed. This is ensured by mark 2.
// Calculate the threshold for the next GC departure
nextTriggerRatio := gcController.endCycle()
// Perform mark termination. This will restart the world.
// Start the world and enter the completion phase
gcMarkTermination(nextTriggerRatio)
}
}
Copy the code
gcMarkTermination
Finish marking and clean up
func gcMarkTermination(nextTriggerRatio float64) {
// World is stopped.
// Start marktermination which includes enabling the write barrier.
atomic.Store(&gcBlackenEnabled, 0)
gcBlackenPromptly = false
// Set the GC phase identifier
setGCPhase(_GCmarktermination)
work.heap1 = memstats.heap_live
startTime := nanotime()
mp := acquirem()
mp.preemptoff = "gcing"
_g_ := getg()
_g_.m.traceback = 2
gp := _g_.m.curg
// Set the state of g to waiting
casgstatus(gp, _Grunning, _Gwaiting)
gp.waitreason = waitReasonGarbageCollection
// Run gc on the g0 stack. We do this so that the g stack
// we're currently running on will no longer change. Cuts
// the root set down a bit (g0 stacks are not scanned, and
// we don't need to scan gc's internal state). We also
// need to switch to g0 so we can shrink the stack.
systemstack(func(a) {
// Scan the current g stack through g0
gcMark(startTime)
// Must return immediately.
// The outer function's stack may have moved
// during gcMark (it shrinks stacks, including the
// outer function's stack), so we must not refer
// to any of its variables. Return back to the
// non-system stack to pick up the new addresses
// before continuing.
})
systemstack(func(a) {
work.heap2 = work.bytesMarked
if debug.gccheckmark > 0 {
// Run a full stop-the-world mark using checkmark bits,
// to check that we didn't forget to mark anything during
// the concurrent mark process.
// If gccheckmark is enabled, all reachable objects are marked
gcResetMarkState()
initCheckmarks()
gcMark(startTime)
clearCheckmarks()
}
// marking is complete so we can turn the write barrier off
// Set the gc phase identifier, GCoff will turn off the write barrier
setGCPhase(_GCoff)
// Start cleaning
gcSweep(work.mode)
if debug.gctrace > 1 {
startTime = nanotime()
// The g stacks have been scanned so
// they have gcscanvalid==true and gcworkdone==true.
// Reset these so that all stacks will be rescanned.
gcResetMarkState()
finishsweep_m()
// Still in STW but gcphase is _GCoff, reset to _GCmarktermination
// At this point all objects will be found during the gcMark which
// does a complete STW mark and object scan.
setGCPhase(_GCmarktermination)
gcMark(startTime)
setGCPhase(_GCoff) // marking is done, turn off wb.
gcSweep(work.mode)
}
})
_g_.m.traceback = 0
casgstatus(gp, _Gwaiting, _Grunning)
if trace.enabled {
traceGCDone()
}
// all done
mp.preemptoff = ""
ifgcphase ! = _GCoff { throw("gc done but gcphase ! = _GCoff")}// Update GC trigger and pacing for the next cycle.
// Update the gc growth ratio for the next departure
gcSetTriggerRatio(nextTriggerRatio)
// Update timing memstats
// Update time
now := nanotime()
sec, nsec, _ := time_now()
unixNow := sec*1e9 + int64(nsec)
work.pauseNS += now - work.pauseStart
work.tEnd = now
atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user
atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us
memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS)
memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow)
memstats.pause_total_ns += uint64(work.pauseNS)
// Update work.totaltime.
sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm)
// We report idle marking time below, but omit it from the
// overall utilization here since it's "free".
markCpu := gcController.assistTime + gcController.dedicatedMarkTime + gcController.fractionalMarkTime
markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm)
cycleCpu := sweepTermCpu + markCpu + markTermCpu
work.totaltime += cycleCpu
// Compute overall GC CPU utilization.
totalCpu := sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs)
memstats.gc_cpu_fraction = float64(work.totaltime) / float64(totalCpu)
// Reset sweep state.
// Reset the state of the sweep
sweep.nbgsweep = 0
sweep.npausesweep = 0
// If gc is forcibly enabled, the identifier is increased
if work.userForced {
memstats.numforcedgc++
}
// Bump GC cycle count and wake goroutines waiting on sweep.
// Count the number of times GC is performed and wake up G waiting to be cleaned
lock(&work.sweepWaiters.lock)
memstats.numgc++
injectglist(work.sweepWaiters.head.ptr())
work.sweepWaiters.head = 0
unlock(&work.sweepWaiters.lock)
// Finish the current heap profiling cycle and start a new
// heap profiling cycle. We do this before starting the world
// so events don't leak into the wrong cycle.
mProf_NextCycle()
// start the world
systemstack(func(a) { startTheWorldWithSema(true)})// Flush the heap profile so we can start a new cycle next GC.
// This is relatively expensive, so we don't do it with the
// world stopped.
mProf_Flush()
// Prepare workbufs for freeing by the sweeper. We do this
// asynchronously because it can take non-trivial time.
prepareFreeWorkbufs()
// Free stack spans. This must be done between GC cycles.
systemstack(freeStackSpans)
// Print gctrace before dropping worldsema. As soon as we drop
// worldsema another cycle could start and smash the stats
// we're trying to print.
if debug.gctrace > 0 {
util := int(memstats.gc_cpu_fraction * 100)
var sbuf [24]byte
printlock()
print("gc ", memstats.numgc,
"@".string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6.3)), "s ",
util, "%.")
prev := work.tSweepTerm
for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} {
ifi ! =0 {
print("+")}print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev))))
prev = ns
}
print(" ms clock, ")
for i, ns := range []int64{sweepTermCpu, gcController.assistTime, gcController.dedicatedMarkTime + gcController.fractionalMarkTime, gcController.idleMarkTime, markTermCpu} {
if i == 2 || i == 3 {
// Separate mark time components with /.
print("/")}else ifi ! =0 {
print("+")}print(string(fmtNSAsMS(sbuf[:], uint64(ns))))
}
print(" ms cpu, ",
work.heap0>>20."- >", work.heap1>>20."- >", work.heap2>>20." MB, ",
work.heapGoal>>20." MB goal, ",
work.maxprocs, " P")
if work.userForced {
print(" (forced)")}print("\n")
printunlock()
}
semrelease(&worldsema)
// Careful: another GC cycle may start now.
releasem(mp)
mp = nil
// now that gc is done, kick off finalizer thread if needed
// If it is not parallel GC, let the current M start scheduling
if! concurrentSweep {// give the queued finalizers, if any, a chance to run
Gosched()
}
}
Copy the code
goSweep
Cleaning tasks
func gcSweep(mode gcMode) {
ifgcphase ! = _GCoff { throw("gcSweep being done but phase is not GCoff")
}
lock(&mheap_.lock)
// Sweepgen grows by 2 after every GC, and sweepSpans change roles after every GC
mheap_.sweepgen += 2
mheap_.sweepdone = 0
if mheap_.sweepSpans[mheap_.sweepgen/2%2].index ! =0 {
// We should have drained this list during the last
// sweep phase. We certainly need to start this phase
// with an empty swept list.
throw("non-empty swept list")
}
mheap_.pagesSwept = 0
unlock(&mheap_.lock)
// If not parallel GC, or force GC
if! _ConcurrentSweep || mode == gcForceBlockMode {// Special case synchronous sweep.
// Record that no proportional sweeping has to happen.
lock(&mheap_.lock)
mheap_.sweepPagesPerByte = 0
unlock(&mheap_.lock)
// Sweep all spans eagerly.
// Clear all span
forsweepone() ! = ^uintptr(0) {
sweep.npausesweep++
}
// Free workbufs eagerly.
// Release all workbufs
prepareFreeWorkbufs()
for freeSomeWbufs(false) {}// All "free" events for this mark/sweep cycle have
// now happened, so we can make this profile cycle
// available immediately.
mProf_NextCycle()
mProf_Flush()
return
}
// Background sweep.
lock(&sweep.lock)
// Wake up the background sweep, also known as the bgsweep function, similar to the non-parallel sweep above
if sweep.parked {
sweep.parked = false
ready(sweep.g, 0.true)
}
unlock(&sweep.lock)
}
Copy the code
sweepone
Let’s take a look at the sweepone process
func sweepone(a) uintptr {
_g_ := getg()
sweepRatio := mheap_.sweepPagesPerByte // For debugging
// increment locks to ensure that the goroutine is not preempted
// in the middle of sweep thus leaving the span in an inconsistent state for next GC
_g_.m.locks++
// Check to see if cleaning is complete
ifatomic.Load(&mheap_.sweepdone) ! =0 {
_g_.m.locks--
return ^uintptr(0)}// Increase the number of cleaning workers
atomic.Xadd(&mheap_.sweepers, +1)
npages := ^uintptr(0)
sg := mheap_.sweepgen
for {
// Loop to get the span to clean
s := mheap_.sweepSpans[1-sg/2%2].pop()
if s == nil {
atomic.Store(&mheap_.sweepdone, 1)
break
}
ifs.state ! = mSpanInUse {// This can happen if direct sweeping already
// swept this span, but in that case the sweep
// generation should always be up-to-date.
ifs.sweepgen ! = sg {print("runtime: bad span s.state=", s.state, " s.sweepgen=", s.sweepgen, " sweepgen=", sg, "\n")
throw("non in-use span in unswept list")}continue
}
// sweepgen == h->sweepgen - 2, indicating that the span needs to be cleaned
// sweepgen == h->sweepgen - 1, which indicates that the span is being cleaned
// This is to determine the state of the span and try to convert the state of the span
ifs.sweepgen ! = sg2 -| |! atomic.Cas(&s.sweepgen, sg2 -, sg- 1) {
continue
}
npages = s.npages
// Sweep a single span
if! s.sweep(false) {
// Span is still in-use, so this returned no
// pages to the heap and the span needs to
// move to the swept in-use list.
npages = 0
}
break
}
// Decrement the number of active sweepers and if this is the
// last one print trace information.
// The number of sweepers is updated when the current worker's cleaning task is complete
if atomic.Xadd(&mheap_.sweepers, - 1) = =0&& atomic.Load(&mheap_.sweepdone) ! =0 {
if debug.gcpacertrace > 0 {
print("pacer: sweep done at heap size ", memstats.heap_live>>20."MB; allocated ", (memstats.heap_live-mheap_.sweepHeapLiveBasis)>>20."MB during sweep; swept ", mheap_.pagesSwept, " pages at ", sweepRatio, " pages/byte\n")
}
}
_g_.m.locks--
return npages
}
Copy the code
mspan.sweep
func (s *mspan) sweep(preserve bool) bool {
// It's critical that we enter this function with preemption disabled,
// GC must not start while we are in the middle of this function.
_g_ := getg()
if _g_.m.locks == 0 && _g_.m.mallocing == 0&& _g_ ! = _g_.m.g0 { throw("MSpan_Sweep: m is not locked")
}
sweepgen := mheap_.sweepgen
// Only the span in the clearing state can be executed properly
ifs.state ! = mSpanInUse || s.sweepgen ! = sweepgen- 1 {
print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
throw("MSpan_Sweep: bad span state")}if trace.enabled {
traceGCSweepSpan(s.npages * _PageSize)
}
// Update the number of pages cleaned first
atomic.Xadd64(&mheap_.pagesSwept, int64(s.npages))
spc := s.spanclass
size := s.elemsize
res := false
c := _g_.m.mcache
freeToHeap := false
// The allocBits indicate which unmarked objects don't need to be
// processed since they were free at the end of the last GC cycle
// and were not allocated since then.
// If the allocBits index is >= s.freeindex and the bit
// is not marked then the object remains unallocated
// since the last GC.
// This situation is analogous to being on a freelist.
// Unlink & free special records for any objects we're about to free.
// Two complications here:
// 1. An object can have both finalizer and profile special records.
// In such case we need to queue finalizer for execution,
// mark the object as live and preserve the profile special.
// 2. A tiny object can have several finalizers setup for different offsets.
// If such object is not marked, we need to queue all finalizers at once.
// Both 1 and 2 are possible at the same time.
specialp := &s.specials
special := *specialp
// Determine whether the objects in special exist and have at least one Finalizer. Release the objects that do not have Finalizer and queue the objects that have Finalizer
forspecial ! =nil {
// A finalizer can be set for an inner byte of an object, find object beginning.
objIndex := uintptr(special.offset) / size
p := s.base() + objIndex*size
mbits := s.markBitsForIndex(objIndex)
if! mbits.isMarked() {// This object is not marked and has at least one special record.
// Pass 1: see if it has at least one finalizer.
hasFin := false
endOffset := p - s.base() + size
fortmp := special; tmp ! =nil && uintptr(tmp.offset) < endOffset; tmp = tmp.next {
if tmp.kind == _KindSpecialFinalizer {
// Stop freeing of object if it has a finalizer.
mbits.setMarkedNonAtomic()
hasFin = true
break}}// Pass 2: queue all finalizers _or_ handle profile record.
forspecial ! =nil && uintptr(special.offset) < endOffset {
// Find the exact byte for which the special was setup
// (as opposed to object beginning).
p := s.base() + uintptr(special.offset)
ifspecial.kind == _KindSpecialFinalizer || ! hasFin {// Splice out special record.
y := special
special = special.next
*specialp = special
freespecial(y, unsafe.Pointer(p), size)
} else {
// This is profile record, but the object has finalizers (so kept alive).
// Keep special record.
specialp = &special.next
special = *specialp
}
}
} else {
// object is still live: keep special record
specialp = &special.next
special = *specialp
}
}
ifdebug.allocfreetrace ! =0 || raceenabled || msanenabled {
// Find all newly freed objects. This doesn't have to
// efficient; allocfreetrace has massive overhead.
mbits := s.markBitsForBase()
abits := s.allocBitsForIndex(0)
for i := uintptr(0); i < s.nelems; i++ {
if! mbits.isMarked() && (abits.index < s.freeindex || abits.isMarked()) { x := s.base() + i*s.elemsizeifdebug.allocfreetrace ! =0 {
tracefree(unsafe.Pointer(x), size)
}
if raceenabled {
racefree(unsafe.Pointer(x), size)
}
if msanenabled {
msanfree(unsafe.Pointer(x), size)
}
}
mbits.advance()
abits.advance()
}
}
// Count the number of free objects in this span.
// Get the total number of alloc objects to be released
nalloc := uint16(s.countAlloc())
// If sizeclass is 0 and the total amount allocated is 0, it is released to the Mheap
if spc.sizeclass() == 0 && nalloc == 0 {
s.needzero = 1
freeToHeap = true
}
nfreed := s.allocCount - nalloc
if nalloc > s.allocCount {
print("runtime: nelems=", s.nelems, " nalloc=", nalloc, " previous allocCount=", s.allocCount, " nfreed=", nfreed, "\n")
throw("sweep increased allocation count")
}
s.allocCount = nalloc
// Check whether the span is empty
wasempty := s.nextFreeIndex() == s.nelems
/ / reset freeindex
s.freeindex = 0 // reset allocation index to start of span.
if trace.enabled {
getg().m.p.ptr().traceReclaimed += uintptr(nfreed) * s.elemsize
}
// gcmarkBits becomes the allocBits.
// get a fresh cleared gcmarkBits in preparation for next GC
// Reset allocBits to gcMarkBits
s.allocBits = s.gcmarkBits
/ / reset gcMarkBits
s.gcmarkBits = newMarkBits(s.nelems)
// Initialize alloc bits cache.
/ / update the allocCache
s.refillAllocCache(0)
// We need to set s.sweepgen = h.sweepgen only when all blocks are swept,
// because of the potential for a concurrent free/SetFinalizer.
// But we need to set it before we make the span available for allocation
// (return it to heap or mcentral), because allocation code assumes that a
// span is already swept if available for allocation.
if freeToHeap || nfreed == 0 {
// The span must be in our exclusive ownership until we update sweepgen,
// check for potential races.
ifs.state ! = mSpanInUse || s.sweepgen ! = sweepgen- 1 {
print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
throw("MSpan_Sweep: bad span state after sweep")}// Serialization point.
// At this point the mark bits are cleared and allocation ready
// to go so release the span.
atomic.Store(&s.sweepgen, sweepgen)
}
if nfreed > 0&& spc.sizeclass() ! =0 {
c.local_nsmallfree[spc.sizeclass()] += uintptr(nfreed)
// Release span to McEntral
res = mheap_.central[spc].mcentral.freeSpan(s, preserve, wasempty)
// MCentral_FreeSpan updates sweepgen
} else if freeToHeap {
// Here is the span release for the large object, echoing line 117
// Free large span to heap
// NOTE(rsc,dvyukov): The original implementation of efence
// in CL 22060046 used SysFree instead of SysFault, so that
// the operating system would eventually give the memory
// back to us again, so that an efence program could run
// longer without running out of memory. Unfortunately,
// calling SysFree here without any kind of adjustment of the
// heap data structures means that when the memory does
// come back to us, we have the wrong metadata for it, either in
// the MSpan structures or in the garbage collection bitmap.
// Using SysFault here means that the program will run out of
// memory fairly quickly in efence mode, but at least it won't
// have mysterious crashes due to confused memory reuse.
// It should be possible to switch back to SysFree if we also
// implement and then call some kind of MHeap_DeleteSpan.
if debug.efence > 0 {
s.limit = 0 // prevent mlookup from finding this span
sysFault(unsafe.Pointer(s.base()), size)
} else {
// Release SAPN on the Mheap
mheap_.freeSpan(s, 1)
}
c.local_nlargefree++
c.local_largefree += size
res = true
}
if! res {// The span has been swept and is still in-use, so put
// it on the swept in-use list.
// If the span is not released to McEntral or Mheap, the span is still in in-use state
mheap_.sweepSpans[sweepgen/2%2].push(s)
}
return res
}
Copy the code
Ok, now that the GC flow of Go has been analyzed, it might be a little easier to understand with the top diagram
Reference documentation
- Golang source code exploration (three) GC implementation principle
- Notes on Language Learning in Go
- Learn about tricolor notation in one picture