A series of reading
This series is based on 64-bit platform, 1Page=8KBCopy the code
Today we kick off the second chapter of the Go Easy Progression series, “Memory and Garbage Collection.”
The chapter on memory and garbage Collection is divided into three parts:
- Knowledge preparation: Do some knowledge preparation for the following content, knowledge preparation includes
- Pointer size Click here to view this article
- Tcmalloc Memory Allocation Principle
- Go memory design and implementation
- Go garbage collection principle
This article introduction
Our main goal is to understand how the Go language allocates memory. However, the Go language’s memory allocation is mainly based on the Tcmalloc memory allocator. Therefore, before we want to understand the memory allocation principle of Go language, we must first understand the Tcmalloc memory allocator so that we can better understand the memory allocation principle of Go language.
The contents of this article are as follows:
- Knowledge before reading
- Linear allocation of memory
- What is the
FreeList
? - Virtual memory
- What is the
TCMalloc
?
TCMalloc
Five basic conceptsPage
The concept ofSpan
The concept ofSpanList
The concept of
Object
The concept ofSizeClass
The concept of
- decryption
Tcmalloc
The basic structure of PageHeap
,CentralFreeList
,ThreadCache
Detailed composition of- decryption
PageHeap
- decryption
CentralFreeList
andTransferCacheManager
The composition of the- decryption
CentralFreeList
- decryption
TransferCacheManager
- decryption
- decryption
ThreadCache
- decryption
- decryption
TCMalloc
The dependencies of the base structure- Simple version of
- A detailed version
Knowledge before reading
This section reads as follows:
- Linear allocation of memory
- What is the
FreeList
? - Virtual memory
- What is the
TCMalloc
?
Objective: To help us better understand the principle of memory allocation.
Linear allocation of memory
Linear allocation is roughly the allocation of how much needs to be used, as shown in the following figure:
Linear allocation has a problem: “Allocated memory has been freed, how can we reallocate?” . You might think of LinkedList, yes, but in memory management it’s a FreeList.
What is a FreeList?
FreeList is also essentially a LinkedList, the difference between a LinkedList and a LinkedList:
FreeList
There is noNext
Property, so not usingNext
Property to hold the value of the pointer to the next node.FreeList
“It’s as good as usingValue
The first 8 bytes of the entire memory block hold the pointer to the next node.- The entire memory space of the allocated node can be overwritten (the pointer value can be overwritten).
As shown below:
Conclusion: A node in a FreeList should have a minimum of 8 bytes
Note: Because we want to store pointer, the size of pointer is 8 bytes, why? See the previous article "Why is the size of the pointer itself 8 bytes on a 64-bit platform?" (http://tigerb.cn/2021/01/23/go-base/memory-pointer/)Copy the code
Virtual memory
Our process runs on virtual memory, as shown below:
- For our process, the available memory is continuous
- Security, preventing processes from operating directly on physical memory (if processes can operate directly on physical memory, there is a possibility that one process may tamper with other processes’ data)
- Virtual Memory and physical Memory are mapped by MMU(Memory Manage Unit).
- Etc. (Research if you’re interested)
Therefore, we refer to virtual memory in the following articles.
What is TCMalloc?
TCMalloc, short for Thread Cache Alloc, is an open source memory allocator from Google. It is implemented based on FreeList data structure and introduces thread-level Cache with better performance.
The five basic concepts in TCMalloc
This section reads as follows:
Page
The concept ofSpan
The concept ofSpanList
The concept of
Object
The concept ofSizeClass
The concept of
Purpose: The main parts of TCMalloc are based on these basic concepts.
The concept of the Page
The operating system manages memory by Page. In this article, 1Page is 8KB, as shown in the following figure:
Note: Why does the operating system manage memory by 'Page'? Outside the scope of this article.Copy the code
Span and SpanList
A Span is composed of N pages, and:
- The range of N is zero
1 ~ + up
- Constitute the
Span
The NPage
Must be contiguous in memory space
As shown below:
As can be seen from the figure, there are:
- 1
Page
Made up of 8KBSpan
- Two consecutive
Page
Made up of 16KBSpan
- Three consecutive
Page
Made up of 24KBSpan
In addition, a bidirectional linked list can be formed between the Span and the Span, which is called the SpanList. In memory management, the Span with the same number of pages is usually formed into a bidirectional linked list, as shown in the figure below (the SpanList formed by N spans with 1Page) :
Let’s look at the Span code again, as follows:
class Span : public SpanList::Elem {
public:
/ / a little...
// Split span into object
// See below for the concept of object
void BuildFreelist(size_t size, size_t count);
/ / a little...
union {
// Freelist composed of object
// See below for the concept of object
ObjIdx cache_[kCacheSize];
/ / a little...
};
PageId first_page_; // Which page does the current span start from
Length num_pages_; // The number of pages currently held
/ / a little...
};
Copy the code
Object and SizeClass concepts
A Span will be split into N Objects of a certain size, and these N Objects form a FreeList.
Take a Span with 1Page as an example. The relationship between Span, Page, and Object is shown as follows:
After looking at the picture above, the question arises:
How do we know that Span is split into 24-byte objects?
Answer: A list of mappings maintained by code. We used the Google open source TCMalloc source code (commit:9D274df) to look at the mapping list HTTPS://github.com/google/tcmalloc/tree/master/tcmallocCode location: tcmalloc/tcmalloc/size_classes.ccconst SizeClassInfo SizeMap::kSizeClasses[SizeMap::kSizeClassesCount] = {
// Each row here is called SizeClass
//
,
,
// Object size column, number of pages applied at a time, number of objects moved at a time (memory applied or reclaimed)
{ 0.0.0}, // +Inf%
// So you know why the minimum is 8 bytes?
// Object will form a FreeList
// A pointer to a FreeList
// The pointer is 8 bytes
{ 8.1.32}, / / 0.59%
{ 16.1.32}, / / 0.59%
{ 24.1.32}, / / 0.68%
{ 32.1.32}, / / 0.59%
{ 40.1.32}, / / 0.98%
{ 48.1.32}, / / 0.98%
/ /... A little...
{ 98304.12.2}, / / 0.05%
{ 114688.14.2}, / / 0.04%
{ 131072.16.2}, / / 0.04%
{ 147456.18.2}, / / 0.03%
{ 163840.20.2}, / / 0.03%
{ 180224.22.2}, / / 0.03%
{ 204800.25.2}, / / 0.02%
{ 229376.28.2}, / / 0.02%
{ 262144.32.2}, / / 0.02%}; The process of obtaining the split rule (finding the row first and then the value of the first column of the row) :1.First find the corresponding row (how to find this row? If anyone is confused, if you want to know the answer you need to know the structure of 'CentralFreeList', which we'll talk about later.)2.Find the first column. This number is the size of objectCopy the code
SizeMap::kSizeClasses each row is called SizeClass.
What exactly do these five basic concepts do?
Answer: support the realization of the basic structure of 'Tcmalloc'.Copy the code
Decrypt the basic structure of Tcmalloc
Tcmalloc is mainly composed of three parts:
PageHeap
CentralFreeList
ThreadCache
The illustration is as follows:
However, CentralFreeList is actually managed by TransferCacheManager, so the basic structure of Tcmalloc should actually look like this:
Next, ThreadCache is actually held by threads. Why?
Answer: Reduce contention between threads and reduce the locking process when allocating memory. This is why it is called Thread Cache Alloc.Copy the code
Further obtain a simple structure diagram:
Decrypt the detailed composition of PageHeap, CentralFreeList, ThreadCache
This section reads as follows:
- decryption
PageHeap
- decryption
CentralFreeList
andTransferCacheManager
The composition of the- decryption
CentralFreeList
- decryption
TransferCacheManager
- decryption
- decryption
ThreadCache
Objective: To understand the realization of each component of TCMalloc in detail.
Decryption PageHeap
PageHeap is mainly responsible for managing spans of different sizes, and spans of the same size constitute a SpanList(recall the concept of SpanList above).
What is a Span of the same size?
A: Span with the same number of pages.Copy the code
The PageHeap object maintains a property called free_, which is an array. The type of the array element is SpanList, and the free_ element has the following properties:
- Corresponding to the index value of 1
SpanList
theSpanList
theSpan
All have 1Pages; - Corresponding to the index value of 2
SpanList
theSpanList
theSpan
All have 2Pages; - And so on,
free_
The index value is corresponding to MaxNumberSpanList
theSpanList
theSpan
MaxNumber Pages; - The value of MaxNumber by
kMaxPages
decision
Array index | The number of pages held by a single Span in the SpanList |
---|---|
1 | 1Pages |
2 | 2Pages |
3 | 3Pages |
4 | 4Pages |
5 | 5Pages |
. | . |
kMaxPages | kMaxPages Pages |
The illustration is as follows:
But you actually know this from the code: the actual type of the array element is SpanListPair, which looks like this
class PageHeap final : public PageAllocatorInterface {
public:
/ /... slightly
private:
// Hold a bidirectional list of two spans
struct SpanListPair {
// Span is normal
SpanList normal;
// Span: physical memory has been reclaimed but virtual memory is still held
SpanList returned;
};
/ /... slightly
// kMaxPages. Raw_num () is an array of spanlistpairs
SpanListPair free_[kMaxPages.raw_num()] ABSL_GUARDED_BY(pageheap_lock);
/ /... slightly
};
Copy the code
Conclusion:
free_
The array element is of typeSpanListPair
SpanListPair
Two are maintained in theSpanList
Based on this conclusion, we modify the PageHeap structure diagram as follows:
Since memory allocation for Pages larger than kMaxPages is from large_, the code looks like this:
class PageHeap final : public PageAllocatorInterface {
public:
/ /... slightly
// The memory of the large object is allocated here (length >= kMaxPages)
SpanListPair large_ ABSL_GUARDED_BY(pageheap_lock);
/ /... slightly
};
Copy the code
So we add the large_ attribute of the large object allocation, and get the structure diagram of PageHeap as follows:
Meanwhile, the code snippet of the PageHeap core is as follows:
class PageHeap final : public PageAllocatorInterface {
public:
/ /... slightly
private:
// Hold a bidirectional list of two spans
struct SpanListPair {
// Span is a bidirectional list
SpanList normal;
// Span is a bidirectional list
SpanList returned;
};
// The memory of the large object is allocated here (length >= kMaxPages)
SpanListPair large_ ABSL_GUARDED_BY(pageheap_lock);
// kMaxPages. Raw_num () is an array of spanlistpairs
SpanListPair free_[kMaxPages.raw_num()] ABSL_GUARDED_BY(pageheap_lock);
/ /... slightly
};
Copy the code
Decrypt the composition of CentralFreeList and TransferCacheManager
Decryption CentralFreeList
We can call this the central cache, the central cache is shared by threads, and fetching the cache from the central cache, CentralFreeList, requires a lock.
CentralFreeList has a property called size_class_, which is the SizeClass value, which is the index of the SizeMap array. A Span in a CentralFreeList will do one thing, split a Span into multiple objects according to the size_class_ rule, and those objects form a FreeList.
Meanwhile, each SizeClass in SizeMap will correspond to one CentralFreeList, so there will be at most N CentralFreelists with the value of N being kNumClasses. The key codes are as follows:
class CentralFreeList {
/ /... slightly
private:
/ / lock
// Thread fetching memory from this point requires a lock to ensure concurrency safety
absl::base_internal::SpinLock lock_;
// It corresponds to an index of the mapping table SizeClassInfo mentioned above
// Objective To find rules such as the size of object when splitting Span into object
size_t size_class_;
// The total number of objects
size_t object_size_;
// The number of objects held by a Span
size_t objects_per_span_;
// The number of pages held by a Span
Length pages_per_span_;
/ /... slightly
Copy the code
CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList CentralFreeList
Decryption TransferCacheManager
Because there are kNumClasses CentralFreeList, where are these centralFreelists maintained?
Answer: the 'freelist_' property in the 'TransferCacheManager' structure.Copy the code
The code is as follows:
class TransferCacheManager {
/ /... slightly
private:
// Freelist_ is an array
// The element's type is CentralFreeList above
// The number of elements corresponds to mapping table SizeClassInfo
CentralFreeList freelist_[kNumClasses];
} ABSL_CACHELINE_ALIGNED;
Copy the code
Decrypt the composition of ThreadCache
We can call this the thread cache, the heart of the TCMalloc memory allocator. ThreadCache is held by each thread and allocates memory without locking.
ThreadCache maintains an attribute of type list_, which is an array and whose element is of type FreeList:
class ThreadCache {
/ /... slightly
// list_ is an array
// The element is of type FreeList
// The number of elements corresponds to mapping table SizeClassInfo
FreeList list_[kNumClasses];
/ /... slightly
};
Copy the code
The element in a FreeList also has the following properties:
- Corresponding to the index value of 1
FreeList
theFreeList
theObject
The size is 8 Bytes; - Corresponding to the index value of 2
FreeList
theFreeList
theObject
Size: 16 Bytes; - And so on,
free_
The index value is corresponding to MaxNumberFreeList
theFreeList
theObject
Size: MaxNumber Bytes; - The value of MaxNumber by
kNumClasses
decision
Where does this rule come from? TCMalloc (commit:9d274df)
https://github.com/google/tcmalloc/tree/master/tcmallocCode location: tcmalloc/tcmalloc/size_classes.ccconst SizeClassInfo SizeMap::kSizeClasses[SizeMap::kSizeClassesCount] = {
// Each row here is called SizeClass
//
,
,
// Object size column, number of pages applied at a time, number of objects moved at a time (memory applied or reclaimed)
{ 0.0.0}, // +Inf%
{ 8.1.32}, / / 0.59%
{ 16.1.32}, / / 0.59%
{ 24.1.32}, / / 0.68%
{ 32.1.32}, / / 0.59%
{ 40.1.32}, / / 0.98%
{ 48.1.32}, / / 0.98%
/ /... A little...
};
Copy the code
We can get:
Array index | The size of a single Object in a FreeList |
---|---|
1 | 8 Bytes |
2 | 16 Bytes |
3 | 24 Bytes |
4 | 32 Bytes |
5 | 40 Bytes |
. | . |
kNumClasses | kNumClasses Bytes |
The structure diagram of ThreadCache is shown as follows:
Note: The Span tail of a FreeList with index 3 in the diagram wastes 8 bytes.Copy the code
Decrypt Tcmalloc memory allocation process
This section reads as follows:
- Simple version of
- A detailed version
Objective: To understand the general process of MEMORY allocation in Tcmalloc.
Simple version of
We divide the objects allocated in Tcmalloc into two categories:
- Small objects
- A small object
The size range of small objects comes from the mapping table maintained by SizeMap, that is, the size range of a single Object. Using the following code snippet as an example, we can see that the size range of a single Osbject is:
8 Byte ~ 262144 Byte == 8 Byte ~ 256 KB
const SizeClassInfo SizeMap::kSizeClasses[SizeMap::kSizeClassesCount] = {
// Each row here is called SizeClass
//
,
,
// Object size column, number of pages applied at a time, number of objects moved at a time (memory applied or reclaimed)
/ /... A little...
{ 8.1.32}, / / 0.59%
/ /... A little...
{ 262144.32.2}, / / 0.02%
};
Copy the code
So:
type | The size of the | source |
---|---|---|
Small objects | <= 256 KB | ThreadCache ,CentralFreeList |
A small object | > 256 KB | PageHeap.free_ andPageHeap.large_ |
When allocating memory to small objects: When ThreadCache runs low, it fetches memory from the CentralFreeList corresponding to SizeClass, or if not, it fetches memory from PageHeap.
When allocating memory to non-small objects: pageheap. free_ and pageheap. large_.
A detailed version
Finally, let’s take a look at the detailed memory allocation process using the example of getting a small 6-byte object (SizeClass with a value of 1).
conclusion
To summarize briefly, we can get knowledge points from this article:
- To understand the
FreeList
- Know the
TCMalloc
Mainly by theThreadCache
,CentralFreeList
,PageHeap
Three partsObject
In the composition ofFreeList
,ThreadCache
maintenanceSpan
Constitute theSpanList
,CentralFreeList
Maintenance, at the same timeSpan
Will be disassembled intoObject
- when
ThreadCache
In theObject
If no, from the correspondingSizeClass
theCentralFreeList
get - Small objects come from
ThreadCache
,CentralFreeList
- Non-small objects from
PageHeap
- The thread from
ThreadCache
Access to memory does not require a lock
With that in mind, it should be easy to Go back to the memory allocation of the Go language. Next time, we’ll look at the Go memory design and implementation.
Reference: 1. Tcmalloc source (commit: 9 d274df) 2 https://github.com/google/tcmalloc/tree/master/tcmalloc. Available space table List (Free) 3 at https://songlee24.github.io/2015/04/08/free-list/. Graphic TCMalloc https://zhuanlan.zhihu.com/p/29216091 4. TCMalloc decryption https://wallenwang.com/2018/11/tcmalloc/ 5. TCMalloc: Thread-Caching Malloc https://github.com/google/tcmalloc/blob/master/docs/design.md 6. TCMalloc : Thread - Caching Malloc https://gperftools.github.io/gperftools/tcmalloc.html 7. Tcmalloc principle analysis (based on gperftools - 2.1) http://gao-xiao-long.github.io/2017/11/25/tcmalloc/Copy the code
“Go language easy to advance” series links tigerb.cn/go/#/kernal…