Analysis of Go memory source code

Learn the last 2 Go language memory management principle (a, TCMalloc principle juejin.cn/post/691900… Go Memory management principle juejin.cn/post/692048…) We understand the source code flow is very easy.

Let’s start with McAche, McEntral, and Mheap. We select the longest used field for analysis. Note: This is all based on the source analysis of GO1.14

// mcache
type mcache struct {
	tiny             uintptr  // Allocator start address
	tinyoffset       uintptr // The next one is an idle bias
	local_tinyallocs uintptr // The number of allocated objects
	alloc [numSpanClasses]*mspan // Apply 134 spans
	/ / stack linked list
	stackcache [_NumStackOrders]stackfreelist

	// Local allocator stats, flushed during GC.
	local_largefree  uintptr  
	local_nlargefree uintptr      
	local_nsmallfree [_NumSizeClasses]uintptr
	flushGen uint32
}
// mcentral
type mcentral struct {
	lock      mutex  / / the mutex
	spanclass spanClass / / 67 * 2
	nonempty  mSpanList // Contains a list of free objects
	empty     mSpanList // There is no free object list
	nmalloc uint64 // Number of allocated objects
}

// mheap
type mheap struct {
	lock      mutex   / / the mutex
	pages     pageAlloc // page allocation data structure
	sweepgen  uint32    / / GC
	sweepdone uint32    / / GC
	sweepers  uint32    / / GC

	allspans []*mspan // Span of all applications

	sweepSpans [2]gcSweepBuf / / GC
	pagesInUse         uint64  // pages of spans in stats mSpanInUse; updated atomically

	arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena

	// arenaHints is a list of addresses at which to attempt to
	// add more heap arenas. This is initially populated with a
	// set of general hint addresses, and grown with the bounds of
	// actual heap arena ranges.
	arenaHints *arenaHint

	// arena is a pre-reserved space for allocating heap arenas
	// (the actual arenas). This is only used on 32-bit.
	arena linearAlloc

	// allArenas is the arenaIndex of every mapped arena. This can
	// be used to iterate through the address space.
	//
	// Access is protected by mheap_.lock. However, since this is
	// append-only and old backing arrays are never freed, it is
	// safe to acquire mheap_.lock, copy the slice header, and
	// then release mheap_.lock.
	allArenas []arenaIdx

	// sweepArenas is a snapshot of allArenas taken at the
	// beginning of the sweep cycle. This can be read safely by
	// simply blocking GC (by disabling preemption).
	sweepArenas []arenaIdx  / / GC

	// curArena is the arena that the heap is currently growing
	// into. This should always be physPageSize-aligned.
	curArena struct {
		base, end uintptr
	}

	// McEntral allocates memory from McEntral when McAche does not have enough memory to allocate
    // McEntral is managed by mheap
	central [numSpanClasses]struct {
		mcentral mcentral
		pad      [cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte}}Copy the code

Next, we introduce the process of memory allocation according to the classification of Tiny objects, small objects and large objects.

0. Objects allocate memory in the NewObject () interface, which includes memory requisition and reclamation. This article mainly study memory application source code

func newobject(typ *_type) unsafe.Pointer {
	return mallocgc(typ.size, typ, true)}func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
	// Check whether there is a pointer.// size <= 32KB
	if size <= maxSmallSize {
        // Tiny object memory allocation < 16B
        if noscan && size < maxTinySize {
            // Memory allocation for tiny objects
		} else {
			// Small object allocation
	} else {
		// Large object allocation}.../ / GC collection
	return x
}
Copy the code

1, Tiny object memory request (<16B)

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer{...// Tiny objects are applied when the size is smaller than 16B
	if noscan && size < maxTinySize {
		// Tiny allocator.
		off := c.tinyoffset
		// align to adjust the offset
		if size&7= =0 { // The multiples of 7 are resized to 8
			off = alignUp(off, 8)}else if size&3= =0 {
			off = alignUp(off, 4)}else if size&1= =0 {
			off = alignUp(off, 2)}// If offset +size is less than 16B, allocate small objects
		ifoff+size <= maxTinySize && c.tiny ! =0 {
			// The object fits into existing tiny block.
			x = unsafe.Pointer(c.tiny + off)
			c.tinyoffset = off + size
			c.local_tinyallocs++ // The number of applications +1
			mp.mallocing = 0
			releasem(mp)
			return x
		}
		// The current tiny block memory space is insufficient, request a span from mache
		// tinySpanClass = 5 (0,0,1,1)
		span := c.alloc[tinySpanClass]
		// I try to get memory from the current span. If I don't get memory, it returns 0
		v := nextFreeFast(span)
		if v == 0 {
			// No memory from allocCache, netxtFree function
			// Try to get a new span memory size from McEntral
			// Replace the original memory block with insufficient memory space and allocate memory
			v, _, shouldhelpgc = c.nextFree(tinySpanClass)  // This will be explained below
		}
		x = unsafe.Pointer(v)
		(*[2]uint64)(x)[0] = 0(* [2]uint64)(x)[1] = 0
		// See if we need to replace the existing tiny block with the new one
		// based on amount of remaining free space.
		if size < c.tinyoffset || c.tiny == 0 {
			c.tiny = uintptr(x)
			c.tinyoffset = size
		}
		size = maxTinySize
	}
    ....
}
Copy the code

If the local memory blocks are insufficient, apply for more memory blocks from McAche

// Apply for free memory in mache
func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
	s = c.alloc[spc]
	shouldhelpgc = false
	freeIndex := s.nextFreeIndex()
	if freeIndex == s.nelems {
		// The span is full.x
		if uintptr(s.allocCount) ! = s.nelems {println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
			throw("s.allocCount ! = s.nelems && freeIndex == s.nelems")
		}
		c.refill(spc)  // This place McAche applied to McEntral
		shouldhelpgc = true
		s = c.alloc[spc]
		// After McAche has applied to McEntral, it applies to McAche again
		freeIndex = s.nextFreeIndex()
	}

	if freeIndex >= s.nelems {
		throw("freeIndex is not valid")
	}

	v = gclinkptr(freeIndex*s.elemsize + s.base())
	s.allocCount++
	if uintptr(s.allocCount) > s.nelems {
		println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
		throw("s.allocCount > s.nelems")}return
}
Copy the code

To summarize, requests for Tiny objects (less than 16B) are made locally if the Tiny allocator currently has enough memory locally; Otherwise, apply for sapN with sizeclass=5 from McAche. If McAche does not have span, apply for SAPN with McEntral.

2. Small memory application (< 32KB)

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer{{// Memory allocation for small objects <32KB
		var sizeclass uint8
		// All levels are divided into 2 classes
		// 1) size <= 1024-8
		if size <= smallSizeMax- 8 - {
			// eg: size =700 (size+smallSizeDiv-1)/smallSizeDiv = (700+8-1/8)= 88 sizeclass=28 =>704
			// Get sizeclass
			sizeclass = size_to_class8[(size+smallSizeDiv- 1)/smallSizeDiv]
		} else { // 2) size >1024-8
			// Get sizeclass
			sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv- 1)/largeSizeDiv]
		}
		// sizeclass specifies the size of a span
		size = uintptr(class_to_size[sizeclass])
		spc := makeSpanClass(sizeclass, noscan)
		// find the corresponding span
		span := c.alloc[spc]
		// Try to fetch memory from allocCache
		v := nextFreeFast(span) // This will be explained below
		if v == 0 {
			v, span, shouldhelpgc = c.nextFree(spc)
		}
		x = unsafe.Pointer(v)
		ifneedzero && span.needzero ! =0 {
			memclrNoHeapPointers(unsafe.Pointer(v), size)
		}
	}
}
Copy the code

If the cache does not have a corresponding span, apply to Central using the nextFreeFast(SPAN) interface, which has been introduced in Tiny object memory application.

The Refill interface is mainly used in the nextFreeFast(SPAN) interface

func (c *mcache) refill(spc spanClass) {
	// Return the current cached span to the central lists.
	s := c.alloc[spc]

	if uintptr(s.allocCount) ! = s.nelems { throw("refill of span with free space remaining")}ifs ! = &emptymspan {// Mark this span as no longer cached.
		ifs.sweepgen ! = mheap_.sweepgen+3 {
			throw("bad sweepgen in refill")
		}
		atomic.Store(&s.sweepgen, mheap_.sweepgen)
	}

	// Apply to Central
	s = mheap_.central[spc].mcentral.cacheSpan() // This will be explained below
	if s == nil {
		throw("out of memory")}if uintptr(s.allocCount) == s.nelems {
		throw("span has no free space")}// Indicate that this span is cached and prevent asynchronous
	// sweeping in the next sweep phase.
	s.sweepgen = mheap_.sweepgen + 3

	c.alloc[spc] = s
}
Copy the code

CacheSpan is the core interface to apply for spans from McEntral

func (c *mcentral) cacheSpan(a) *mspan{...// McEntral is locked
	lock(&c.lock)
	traceDone := false
	if trace.enabled {
		traceGCSweepStart()
	}
	sg := mheap_.sweepgen
retry:
	var s *mspan
    // Find the free span from nonempty
	fors = c.nonempty.first; s ! =nil; s = s.next {
		if s.sweepgen == sg2 - && atomic.Cas(&s.sweepgen, sg2 -, sg- 1) { / / in accordance with span
			c.nonempty.remove(s) // Delete the span in the nonempty list
			c.empty.insertBack(s) // Add the span to the empty list
			unlock(&c.lock)
			s.sweep(true)
			goto havespan
		}
		if s.sweepgen == sg- 1 {
			// the span is being swept by background sweeper, skip
			continue
		}
		// we have a nonempty span that does not require sweeping, allocate from it
		c.nonempty.remove(s)
		c.empty.insertBack(s)
		unlock(&c.lock)
		goto havespan
	}
	// Look in empty to see if there are any available spans because some spans may have been marked as idle by garbage collection but have not been cleaned up
	fors = c.empty.first; s ! =nil; s = s.next {
		if s.sweepgen == sg2 - && atomic.Cas(&s.sweepgen, sg2 -, sg- 1) {
			// we have an empty span that requires sweeping,
			// sweep it and see if we can free some space in it
			c.empty.remove(s)
			// swept spans are at the end of the list
			c.empty.insertBack(s)
			unlock(&c.lock)
			s.sweep(true)
			freeIndex := s.nextFreeIndex()
			iffreeIndex ! = s.nelems { s.freeindex = freeIndexgoto havespan
			}
			lock(&c.lock)
			// the span is still empty after sweep
			// it is already in the empty list, so just retry
			goto retry
		}
		if s.sweepgen == sg- 1 {
			// the span is being swept by background sweeper, skip
			continue
		}
		// already swept empty span,
		// all subsequent ones must also be either swept or in process of sweeping
		break
	}
	if trace.enabled {
		traceGCSweepDone()
		traceDone = true
	}
	unlock(&c.lock)
    ...
}
Copy the code

If mache does not have a span available, mache will apply to McEntral, lock it, find an available span, delete it from NonEmpty, place it in the empty list, return the span to the worker thread, and unlock it. If there is not enough memory, McEntral will continue to apply to the Mheap.

3. Large object memory application

Large objects are applied directly to the Mheap

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
		var s *mspan
		shouldhelpgc = true
		systemstack(func(a) {
			// Apply directly from the Mheap
			s = largeAlloc(size, needzero, noscan) // This will be explained below
		})
		s.freeindex = 1
		s.allocCount = 1
		x = unsafe.Pointer(s.base())
		size = s.elemsize
}
Copy the code

Apply with classsize=0

// Large object request
func largeAlloc(size uintptr, needzero bool, noscan bool) *mspan {
	// Determine memory overflow
	if size+_PageSize < size {
		throw("out of memory")}// The number of pages needed to calculate the object is an integer page for large memory allocations greater than 32K
	npages := size >> _PageShift
	ifsize&_PageMask ! =0 {
		npages++
	}

	deductSweepCredit(npages*_PageSize, npages)
	// Allocate function concrete implementation, using span->sizeclass=0
	s := mheap_.alloc(npages, makeSpanClass(0, noscan), needzero)  // This will be explained below
	if s == nil {
		throw("out of memory")
	}

	s.limit = s.base() + size
	// bitmap records the allocated span
	heapBitsForAddr(s.base()).initSpan(s)
	return s
}
Copy the code

Request a SPAN from mHEAP

func (h *mheap) alloc(npages uintptr, spanclass spanClass, needzero bool) *mspan {
	var s *mspan
	systemstack(func(a) {
		// To prevent excessive heap growth, before allocating n pages
		// we need to sweep and reclaim at least n pages.
		if h.sweepdone == 0 {
			// In order to prevent the memory usage and heap growth, we need to call runtime.mhep. reclaim method before allocating the memory
			h.reclaim(npages)
		}
        // Apply an Mspan from the mheap
		s = h.allocSpan(npages, false, spanclass, &memstats.heap_inuse)  // This will be explained below
	})

	ifs ! =nil {
		ifneedzero && s.needzero ! =0 {
			memclrNoHeapPointers(unsafe.Pointer(s.base()), s.npages<<_PageShift)
		}
		s.needzero = 0
	}
	return s
}
Copy the code
func (h *mheap) allocSpan(npages uintptr, manual bool, spanclass spanClass, sysStat *uint64) (s *mspan) {
	// Function-global state.
	gp := getg() // Get the current coroutine
	base, scav := uintptr(0), uintptr(0)  // Base address, reclaim address (bit)

	// If the allocation is small enough, try the page cache!
	pp := gp.m.p.ptr()
	// Apply pageCache from P when the page number is less than 8*64/4=128
    // Note that pageCache is contiguous
	ifpp ! =nil && npages < pageCachePages/4 {
		c := &pp.pcache

		// If the cache is empty, refill it.
		// if the pageCache corresponding to p is empty, apply for cache memory - pageCache is explained below
		if c.empty() {
			lock(&h.lock)
            // Fill the local pageCache
			*c = h.pages.allocToCache()
			unlock(&h.lock)
		}

		// Try to allocate from the cache.
		// get the base address and size of the memory region from the page cache of p
		base, scav = c.alloc(npages)  // This will be explained below
		// The application was successful
		ifbase ! =0 {
			/ / span
			s = h.tryAllocMSpan()

			ifs ! =nil && gcBlackenEnabled == 0&& (manual || spanclass.sizeclass() ! =0) {
				goto HaveSpan
			}
		}
	}

	// For one reason or another, we couldn't get the
	// whole job done without the heap lock.
	lock(&h.lock)
	// the page cache of P does not have enough memory, so the page heap is allocated memory
	if base == 0 {
		// Try to acquire a base address.
		// mheap global heap area
		base, scav = h.pages.alloc(npages)
		if base == 0 {
			if! h.grow(npages) {// Request a fixed page size memory area from the system
				unlock(&h.lock)
				return nil
			}
			base, scav = h.pages.alloc(npages) // Retrieve from mheap again
			if base == 0 {
				throw("grew heap, but no adequate free space found")}}}.... }Copy the code

Applying from the page cache can only apply for consecutive pages. If not, an error message is displayed

/ / application pageCache
func (c *pageCache) alloc(npages uintptr) (uintptr.uintptr) {
	if c.cache == 0 {
		return 0.0
	}
	// The page number is 1
	if npages == 1 {
		i := uintptr(sys.TrailingZeros64(c.cache))
		scav := (c.scav >> i) & 1
		c.cache &^= 1 << i // Set bit to mark in-use
		c.scav &^= 1 << i  // clear bit to mark unscavenged
		return c.base + i*pageSize, uintptr(scav) * pageSize
	}
	// Apply consecutive n pages
	return c.allocN(npages) // This will be explained below
}
Copy the code
func (c *pageCache) allocN(npages uintptr) (uintptr.uintptr) {
	i := findBitRange64(c.cache, uint(npages))
	if i >= 64 {
		return 0.0
	}
	mask := ((uint64(1) << npages) - 1) << i
	scav := sys.OnesCount64(c.scav & mask)
	c.cache &^= mask // set the page number to 0. Mark in-use bits: 1110&^10=1100
	c.scav &^= mask  // clear scavenged bits
	return c.base + uintptr(i*pageSize), uintptr(scav) * pageSize
}
Copy the code

Here’s how pcache works

Go1.14 uses bitmaps to manage memory pages and maintains a copy of pageCache in each P. PageCache is a bitmap where 1 means unallocated and 0 means allocated

type pageCache struct {
	base  uintptr // Base address of the chunk
	cache uint64  // Bit indicates whether memory is allocated
	scav  uint64  // Bit indicates whether memory is reclaimed
}
Copy the code

One bit represents a page (8KB), so the maximum represents 8KB*64=512KB cache. If the number of pages to be allocated is less than 512/4 (128KB), the pages must be allocated from the cache

If the page cache does not have enough pages, pages are requested from virtual memory

// Apply the page in heap
func (s *pageAlloc) alloc(npages uintptr) (addr uintptr, scav uintptr) {
	// If the searchAddr refers to a region which has a higher address than
	// any known chunk, then we know we're out of memory.
	if chunkIndex(s.searchAddr) >= s.end {
		return 0.0
	}

	// If npages has a chance of fitting in the chunk where the searchAddr is,
	// search it directly.
	searchAddr := uintptr(0)
	// chunkPages Total Number of pages - Currently in Use >= Number of pages to be applied Description chunk Indicates the number of pages that can be applied
	if pallocChunkPages-chunkPageIndex(s.searchAddr) >= uint(npages) {
		// npages is guaranteed to be no greater than pallocChunkPages here.
		i := chunkIndex(s.searchAddr)
		if max := s.summary[len(s.summary)- 1][i].max(); max >= uint(npages) {
			j, searchIdx := s.chunkOf(i).find(npages, chunkPageIndex(s.searchAddr))
			if j < 0 {
				print("runtime: max = ", max, ", npages = ", npages, "\n")
				print("runtime: searchIdx = ", chunkPageIndex(s.searchAddr), ", s.searchAddr = ", hex(s.searchAddr), "\n")
				throw("bad summary data")
			}
			addr = chunkBase(i) + uintptr(j)*pageSize
			searchAddr = chunkBase(i) + uintptr(searchIdx)*pageSize
			goto Found
		}
	}
	// We failed to use a searchAddr for one reason or another, so try
	// the slow path.
	addr, searchAddr = s.find(npages)
	if addr == 0 {
		if npages == 1 {
			// We failed to find a single free page, the smallest unit
			// of allocation. This means we know the heap is completely
			// exhausted. Otherwise, the heap still might have free
			// space in it, just not enough contiguous space to
			// accommodate npages.
			s.searchAddr = maxSearchAddr
		}
		return 0.0}...return addr, scav
}
Copy the code

At this point, memory allocation for large objects is complete. For large objects, mHEAP is directly allocated. If the MHEAP does not have enough memory, the MHEAP applies for several Pages from virtual memory. As you can see, the cost of memory is getting higher and higher

From source code analysis, we can learn

1, compared with previous version go use the tree to manage the memory block, go1.14 use – a much more efficient, and is also easy to manage, in source will see all kinds of bit operation can also achieve the same effect, see when call: great (weak foundation have to do your own re-count of more than 2, see the use of space, in time of operation, provide the efficiency is a big case

Next, we will learn what GC is

reference

[Allocate unit size _Go Memory management trilogy]blog.csdn.net/weixin_3994… [golang memory management] legendtkl.com/2017/04/02/… [deep understanding of the Go – memory allocation] studygolang.com/articles/22…