A series of reading

This series is based on 64-bit platform, 1Page=8KBCopy the code

Today we kick off the second chapter of the Go Easy Progression series, “Memory and Garbage Collection.”

The chapter on memory and garbage Collection is divided into three parts:

  • Knowledge preparation: Do some knowledge preparation for the following content, knowledge preparation includes
    • Pointer size Click here to view this article
    • Tcmalloc Memory Allocation Principle
  • Go memory design and implementation
  • Go garbage collection principle

This article introduction

Our main goal is to understand how the Go language allocates memory. However, the Go language’s memory allocation is mainly based on the Tcmalloc memory allocator. Therefore, before we want to understand the memory allocation principle of Go language, we must first understand the Tcmalloc memory allocator so that we can better understand the memory allocation principle of Go language.

The contents of this article are as follows:

  • Knowledge before reading
    • Linear allocation of memory
    • What is theFreeList?
    • Virtual memory
    • What is theTCMalloc?
  • TCMallocFive basic concepts
    • PageThe concept of
    • SpanThe concept of
      • SpanListThe concept of
    • ObjectThe concept of
      • SizeClassThe concept of
  • decryptionTcmallocThe basic structure of
  • PageHeap,CentralFreeList,ThreadCacheDetailed composition of
    • decryptionPageHeap
    • decryptionCentralFreeListandTransferCacheManagerThe composition of the
      • decryptionCentralFreeList
      • decryptionTransferCacheManager
    • decryptionThreadCache
  • decryptionTCMallocThe dependencies of the base structure
    • Simple version of
    • A detailed version

Knowledge before reading

This section reads as follows:

  • Linear allocation of memory
  • What is theFreeList?
  • Virtual memory
  • What is theTCMalloc?

Objective: To help us better understand the principle of memory allocation.

Linear allocation of memory

Linear allocation is roughly the allocation of how much needs to be used, as shown in the following figure:

Linear allocation has a problem: “Allocated memory has been freed, how can we reallocate?” . You might think of LinkedList, yes, but in memory management it’s a FreeList.

What is a FreeList?

FreeList is also essentially a LinkedList, the difference between a LinkedList and a LinkedList:

  • FreeListThere is noNextProperty, so not usingNextProperty to hold the value of the pointer to the next node.
  • FreeList“It’s as good as usingValueThe first 8 bytes of the entire memory block hold the pointer to the next node.
  • The entire memory space of the allocated node can be overwritten (the pointer value can be overwritten).

As shown below:

Conclusion: A node in a FreeList should have a minimum of 8 bytes

Note: Because we want to store pointer, the size of pointer is 8 bytes, why? See the previous article "Why is the size of the pointer itself 8 bytes on a 64-bit platform?" (http://tigerb.cn/2021/01/23/go-base/memory-pointer/)Copy the code

Virtual memory

Our process runs on virtual memory, as shown below:

  • For our process, the available memory is continuous
  • Security, preventing processes from operating directly on physical memory (if processes can operate directly on physical memory, there is a possibility that one process may tamper with other processes’ data)
  • Virtual Memory and physical Memory are mapped by MMU(Memory Manage Unit).
  • Etc. (Research if you’re interested)

Therefore, we refer to virtual memory in the following articles.

What is TCMalloc?

TCMalloc, short for Thread Cache Alloc, is an open source memory allocator from Google. It is implemented based on FreeList data structure and introduces thread-level Cache with better performance.

The five basic concepts in TCMalloc

This section reads as follows:

  • PageThe concept of
  • SpanThe concept of
    • SpanListThe concept of
  • ObjectThe concept of
    • SizeClassThe concept of

Purpose: The main parts of TCMalloc are based on these basic concepts.

The concept of the Page

The operating system manages memory by Page. In this article, 1Page is 8KB, as shown in the following figure:

Note: Why does the operating system manage memory by 'Page'? Outside the scope of this article.Copy the code

Span and SpanList

A Span is composed of N pages, and:

  • The range of N is zero1 ~ + up
  • Constitute theSpanThe NPageMust be contiguous in memory space

As shown below:

As can be seen from the figure, there are:

  • 1PageMade up of 8KBSpan
  • Two consecutivePageMade up of 16KBSpan
  • Three consecutivePageMade up of 24KBSpan

In addition, a bidirectional linked list can be formed between the Span and the Span, which is called the SpanList. In memory management, the Span with the same number of pages is usually formed into a bidirectional linked list, as shown in the figure below (the SpanList formed by N spans with 1Page) :

Let’s look at the Span code again, as follows:

class Span : public SpanList::Elem {
 public:

  / / a little...

  // Split span into object
  // See below for the concept of object
  void BuildFreelist(size_t size, size_t count);

  
  / / a little...

  union {
    // Freelist composed of object
    // See below for the concept of object
    ObjIdx cache_[kCacheSize];
    
    / / a little...
  };

  PageId first_page_;  // Which page does the current span start from
  Length num_pages_;   // The number of pages currently held

  / / a little...
};

Copy the code

Object and SizeClass concepts

A Span will be split into N Objects of a certain size, and these N Objects form a FreeList.

Take a Span with 1Page as an example. The relationship between Span, Page, and Object is shown as follows:

After looking at the picture above, the question arises:

How do we know that Span is split into 24-byte objects?

Answer: A list of mappings maintained by code. We used the Google open source TCMalloc source code (commit:9D274df) to look at the mapping list HTTPS://github.com/google/tcmalloc/tree/master/tcmallocCode location: tcmalloc/tcmalloc/size_classes.ccconst SizeClassInfo SizeMap::kSizeClasses[SizeMap::kSizeClassesCount] = {
    // Each row here is called SizeClass
    // 
      
       , 
       
        , 
         
         
        
       
      
    // Object size column, number of pages applied at a time, number of objects moved at a time (memory applied or reclaimed)
    {        0.0.0},  // +Inf%
    // So you know why the minimum is 8 bytes?
    // Object will form a FreeList
    // A pointer to a FreeList
    // The pointer is 8 bytes
    {        8.1.32},  / / 0.59%
    {       16.1.32},  / / 0.59%
    {       24.1.32},  / / 0.68%
    {       32.1.32},  / / 0.59%
    {       40.1.32},  / / 0.98%
    {       48.1.32},  / / 0.98%
    / /... A little...
    {    98304.12.2},  / / 0.05%
    {   114688.14.2},  / / 0.04%
    {   131072.16.2},  / / 0.04%
    {   147456.18.2},  / / 0.03%
    {   163840.20.2},  / / 0.03%
    {   180224.22.2},  / / 0.03%
    {   204800.25.2},  / / 0.02%
    {   229376.28.2},  / / 0.02%
    {   262144.32.2},  / / 0.02%}; The process of obtaining the split rule (finding the row first and then the value of the first column of the row) :1.First find the corresponding row (how to find this row? If anyone is confused, if you want to know the answer you need to know the structure of 'CentralFreeList', which we'll talk about later.)2.Find the first column. This number is the size of objectCopy the code

SizeMap::kSizeClasses each row is called SizeClass.

What exactly do these five basic concepts do?

Answer: support the realization of the basic structure of 'Tcmalloc'.Copy the code

Decrypt the basic structure of Tcmalloc

Tcmalloc is mainly composed of three parts:

  • PageHeap
  • CentralFreeList
  • ThreadCache

The illustration is as follows:

However, CentralFreeList is actually managed by TransferCacheManager, so the basic structure of Tcmalloc should actually look like this:

Next, ThreadCache is actually held by threads. Why?

Answer: Reduce contention between threads and reduce the locking process when allocating memory. This is why it is called Thread Cache Alloc.Copy the code

Further obtain a simple structure diagram:

Decrypt the detailed composition of PageHeap, CentralFreeList, ThreadCache

This section reads as follows:

  • decryptionPageHeap
  • decryptionCentralFreeListandTransferCacheManagerThe composition of the
    • decryptionCentralFreeList
    • decryptionTransferCacheManager
  • decryptionThreadCache

Objective: To understand the realization of each component of TCMalloc in detail.

Decryption PageHeap

PageHeap is mainly responsible for managing spans of different sizes, and spans of the same size constitute a SpanList(recall the concept of SpanList above).

What is a Span of the same size?

A: Span with the same number of pages.Copy the code

The PageHeap object maintains a property called free_, which is an array. The type of the array element is SpanList, and the free_ element has the following properties:

  • Corresponding to the index value of 1SpanListtheSpanListtheSpanAll have 1Pages;
  • Corresponding to the index value of 2SpanListtheSpanListtheSpanAll have 2Pages;
  • And so on,free_The index value is corresponding to MaxNumberSpanListtheSpanListtheSpanMaxNumber Pages;
  • The value of MaxNumber bykMaxPagesdecision
Array index The number of pages held by a single Span in the SpanList
1 1Pages
2 2Pages
3 3Pages
4 4Pages
5 5Pages
. .
kMaxPages kMaxPages Pages

The illustration is as follows:

But you actually know this from the code: the actual type of the array element is SpanListPair, which looks like this

class PageHeap final : public PageAllocatorInterface {
 public:

  / /... slightly

 private:
  // Hold a bidirectional list of two spans
  struct SpanListPair {
    // Span is normal
    SpanList normal; 
    // Span: physical memory has been reclaimed but virtual memory is still held
    SpanList returned;
  };

  / /... slightly

  // kMaxPages. Raw_num () is an array of spanlistpairs
  SpanListPair free_[kMaxPages.raw_num()] ABSL_GUARDED_BY(pageheap_lock);

  / /... slightly
};

Copy the code

Conclusion:

  • free_The array element is of typeSpanListPair
  • SpanListPairTwo are maintained in theSpanList

Based on this conclusion, we modify the PageHeap structure diagram as follows:

Since memory allocation for Pages larger than kMaxPages is from large_, the code looks like this:

class PageHeap final : public PageAllocatorInterface {
 public:

  / /... slightly

  // The memory of the large object is allocated here (length >= kMaxPages)
  SpanListPair large_ ABSL_GUARDED_BY(pageheap_lock);

  / /... slightly
};

Copy the code

So we add the large_ attribute of the large object allocation, and get the structure diagram of PageHeap as follows:

Meanwhile, the code snippet of the PageHeap core is as follows:

class PageHeap final : public PageAllocatorInterface {
 public:

  / /... slightly

 private:
  // Hold a bidirectional list of two spans
  struct SpanListPair {
    // Span is a bidirectional list
    SpanList normal; 
    // Span is a bidirectional list
    SpanList returned;
  };

  // The memory of the large object is allocated here (length >= kMaxPages)
  SpanListPair large_ ABSL_GUARDED_BY(pageheap_lock);

  // kMaxPages. Raw_num () is an array of spanlistpairs
  SpanListPair free_[kMaxPages.raw_num()] ABSL_GUARDED_BY(pageheap_lock);

  / /... slightly
};

Copy the code

Decrypt the composition of CentralFreeList and TransferCacheManager

Decryption CentralFreeList

We can call this the central cache, the central cache is shared by threads, and fetching the cache from the central cache, CentralFreeList, requires a lock.

CentralFreeList has a property called size_class_, which is the SizeClass value, which is the index of the SizeMap array. A Span in a CentralFreeList will do one thing, split a Span into multiple objects according to the size_class_ rule, and those objects form a FreeList.

Meanwhile, each SizeClass in SizeMap will correspond to one CentralFreeList, so there will be at most N CentralFreelists with the value of N being kNumClasses. The key codes are as follows:

class CentralFreeList {

  / /... slightly

 private:

  / / lock
  // Thread fetching memory from this point requires a lock to ensure concurrency safety
  absl::base_internal::SpinLock lock_;

  // It corresponds to an index of the mapping table SizeClassInfo mentioned above
  // Objective To find rules such as the size of object when splitting Span into object
  size_t size_class_;  
  // The total number of objects
  size_t object_size_;
  // The number of objects held by a Span
  size_t objects_per_span_;
  // The number of pages held by a Span
  Length pages_per_span_;

  / /... slightly
Copy the code

CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList

Decryption TransferCacheManager

Because there are kNumClasses CentralFreeList, where are these centralFreelists maintained?

Answer: the 'freelist_' property in the 'TransferCacheManager' structure.Copy the code

The code is as follows:

class TransferCacheManager {
  
  / /... slightly

 private:
  // Freelist_ is an array
  // The element's type is CentralFreeList above
  // The number of elements corresponds to mapping table SizeClassInfo
  CentralFreeList freelist_[kNumClasses];
} ABSL_CACHELINE_ALIGNED;
Copy the code

Decrypt the composition of ThreadCache

We can call this the thread cache, the heart of the TCMalloc memory allocator. ThreadCache is held by each thread and allocates memory without locking.

ThreadCache maintains an attribute of type list_, which is an array and whose element is of type FreeList:

class ThreadCache {
  / /... slightly

  // list_ is an array
  // The element is of type FreeList
  // The number of elements corresponds to mapping table SizeClassInfo
  FreeList list_[kNumClasses]; 

  / /... slightly
};
Copy the code

The element in a FreeList also has the following properties:

  • Corresponding to the index value of 1FreeListtheFreeListtheObjectThe size is 8 Bytes;
  • Corresponding to the index value of 2FreeListtheFreeListtheObjectSize: 16 Bytes;
  • And so on,free_The index value is corresponding to MaxNumberFreeListtheFreeListtheObjectSize: MaxNumber Bytes;
  • The value of MaxNumber bykNumClassesdecision

Where does this rule come from? TCMalloc (commit:9d274df)

https://github.com/google/tcmalloc/tree/master/tcmallocCode location: tcmalloc/tcmalloc/size_classes.ccconst SizeClassInfo SizeMap::kSizeClasses[SizeMap::kSizeClassesCount] = {
    // Each row here is called SizeClass
    // 
      
       , 
       
        , 
         
         
        
       
      
    // Object size column, number of pages applied at a time, number of objects moved at a time (memory applied or reclaimed)
    {        0.0.0},  // +Inf%
    {        8.1.32},  / / 0.59%
    {       16.1.32},  / / 0.59%
    {       24.1.32},  / / 0.68%
    {       32.1.32},  / / 0.59%
    {       40.1.32},  / / 0.98%
    {       48.1.32},  / / 0.98%
    / /... A little...
};
Copy the code

We can get:

Array index The size of a single Object in a FreeList
1 8 Bytes
2 16 Bytes
3 24 Bytes
4 32 Bytes
5 40 Bytes
. .
kNumClasses kNumClasses Bytes

The structure diagram of ThreadCache is shown as follows:

Note: The Span tail of a FreeList with index 3 in the diagram wastes 8 bytes.Copy the code

Decrypt Tcmalloc memory allocation process

This section reads as follows:

  • Simple version of
  • A detailed version

Objective: To understand the general process of MEMORY allocation in Tcmalloc.

Simple version of

We divide the objects allocated in Tcmalloc into two categories:

  • Small objects
  • A small object

The size range of small objects comes from the mapping table maintained by SizeMap, that is, the size range of a single Object. Using the following code snippet as an example, we can see that the size range of a single Osbject is:

8 Byte ~ 262144 Byte == 8 Byte ~ 256 KB

const SizeClassInfo SizeMap::kSizeClasses[SizeMap::kSizeClassesCount] = {
    // Each row here is called SizeClass
    // 
      
       , 
       
        , 
         
         
        
       
      
    // Object size column, number of pages applied at a time, number of objects moved at a time (memory applied or reclaimed)
    / /... A little...
    {        8.1.32},  / / 0.59%
    / /... A little...
    {   262144.32.2},  / / 0.02%
};
Copy the code

So:

type The size of the source
Small objects <= 256 KB ThreadCache,CentralFreeList
A small object > 256 KB PageHeap.free_andPageHeap.large_

When allocating memory to small objects: When ThreadCache runs low, it fetches memory from the CentralFreeList corresponding to SizeClass, or if not, it fetches memory from PageHeap.

When allocating memory to non-small objects: pageheap. free_ and pageheap. large_.

A detailed version

Finally, let’s take a look at the detailed memory allocation process using the example of getting a small 6-byte object (SizeClass with a value of 1).

conclusion

To summarize briefly, we can get knowledge points from this article:

  • To understand theFreeList
  • Know theTCMallocMainly by theThreadCache,CentralFreeList,PageHeapThree parts
    • ObjectIn the composition ofFreeList,ThreadCachemaintenance
    • SpanConstitute theSpanList,CentralFreeListMaintenance, at the same timeSpanWill be disassembled intoObject
    • whenThreadCacheIn theObjectIf no, from the correspondingSizeClasstheCentralFreeListget
    • Small objects come fromThreadCache,CentralFreeList
    • Non-small objects fromPageHeap
  • The thread fromThreadCacheAccess to memory does not require a lock

With that in mind, it should be easy to Go back to the memory allocation of the Go language. Next time, we’ll look at the Go memory design and implementation.

Reference: 1. Tcmalloc source (commit: 9 d274df) 2 https://github.com/google/tcmalloc/tree/master/tcmalloc. Available space table List (Free) 3 at https://songlee24.github.io/2015/04/08/free-list/. Graphic TCMalloc https://zhuanlan.zhihu.com/p/29216091 4. TCMalloc decryption https://wallenwang.com/2018/11/tcmalloc/ 5. TCMalloc: Thread-Caching Malloc https://github.com/google/tcmalloc/blob/master/docs/design.md 6. TCMalloc : Thread - Caching Malloc https://gperftools.github.io/gperftools/tcmalloc.html 7. Tcmalloc principle analysis (based on gperftools - 2.1) http://gao-xiao-long.github.io/2017/11/25/tcmalloc/Copy the code

“Go language easy to advance” series links tigerb.cn/go/#/kernal…