CPython memory manager

CPython source package features classification

This article is written according to Python3.9, in which some assert statements and some unnecessary macro fields will be removed, keep the core logic and add comments, easy for yourself and others to understand. The source code will be noted in the code to facilitate full reading.

directory	The profile
Demo	A demo application using Python
Doc	The document
Grammer	Python syntax files
Include	Various header files referenced when compiling Python
Lib	Standard additional library
Mac	Mac tools, etc
Misc	A collection of many files (gdbinit, vIMrc, etc.)
Modules	C extension module for Python
Objects	C code for Python objects
PC	Programs that depend on OS and other environments
PCbuild	Used when building Win32 and X64
Parser	A parser for Python
Python	The core of the Python

Python’s memory management architecture

Python is a dynamic, everything object language, and these memory requests can create a lot of small memory. To speed up memory operations and reduce memory fragmentation, Python’s own memory manager, called PyMalloc, is used.

# Objects/obmalloc.c code comments
/* An object allocator for Python.

   Here is an introduction to the layers of the Python memory architecture,
   showing where the object allocator is actually used (layer +2), It is
   called for every object allocation and deallocation (PyObject_New/Del),
   unless the object-specific allocators implement a proprietary allocation
   scheme (ex.: ints use a simple free list). This is also the place where
   the cyclic garbage collector operates selectively on container objects.


    Object-specific allocators
    _____   ______   ______       ________
   [ int ] [ dict ] [ list]... [ string ] Python core | +3 | <----- Object-specific memory -----> | <-- Non-object memory --> |			 Object specific memory allocator
    _______________________________       |                           |
   [   Python's object allocator] | | | + 2 # # # # # # # object memory # # # # # # # | < -- -- -- -- -- - Internal buffers -- -- -- -- -- - > | # Python object allocator ______________________________________________________________ | [ Python's raw memory allocator (PyMem_ API)          ]   |
+1 | <----- Python memory (under PyMem manageR 's control) -- -- -- -- -- - > | | # Python low-level memory allocator __________________________________________________________________ [physicist  general-purpose allocator (ex: C library malloc)] 0 | < -- -- -- -- -- -- Virtual memory allocated for the python process -- -- -- -- -- - > | # general based distributor (such as glibc malloc, etc.) = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = _______________________________________________________________________ [ OS-specific Virtual Memory Manager (VMM) ] -1 | < -- -- -- the Kernel dynamic storage allocation & management (page - -based) -- - > | # OS characteristic of the virtual memory manager __________________________________ __________________________________ [ ] [ ] -2 | <-- Physical memory: ROM/RAM - > | | < -- Secondary storage (swap) -- - > | # physical memory and swap the destination (such as HDD) * /Copy the code

PyDict_New()               / / the third floor
 PyObject_GC_New()				 / / the second floor
 	PyObject_Malloc()				 / / the second floor
 	 new_arena()						 / / the first layer
 		malloc(a)/ / the zero-th layer
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / the following 2 layers belong to the category of the operating system, not in the scope of / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
Copy the code

Figure 1

Universal base allocator (Layer 0)

512 bytes is the CPython threshold

//Objects/obmalloc.c
#define SMALL_REQUEST_THRESHOLD 512
#define NB_SMALL_SIZE_CLASSES   (SMALL_REQUEST_THRESHOLD / ALIGNMENT)

/* Largest positive value of type Py_ssize_t. */
#define PY_SSIZE_T_MAX ((Py_ssize_t)(((size_t)-1)>>1))

static void *
_PyObject_Malloc(void *ctx, size_t nbytes)
{		// go to the Python allocator and the function will have a judgment (0,512)
    void* ptr = pymalloc_alloc(ctx, nbytes);
    if(LIKELY(ptr ! =NULL)) {
        return ptr;
    }
		Py_ssize_t = Py_ssize_t = Py_ssize_t = Py_ssize_t
    ptr = PyMem_RawMalloc(nbytes);
    if(ptr ! =NULL) {
        raw_allocated_blocks++;
    }
    return ptr;
}
Copy the code

0: Call malloc directly
1 to 512: Allocated by Python's memory pool, which is divided by memory size
Above 512: Directly invoke the malloc function

All functions prefixed with PyMem_ in source code are wrapped around C for Python syntax use, with C library functions like malloc at their core.

In general, Python does not place any restrictions on the size of the memory pool for small chunks of memory

When the PythonWITH_MEMORY_LIMITSWhen compiling with the compiled symbol open in the background, another symbol inside Python is activated, this one calledSMALL_MEMORY_LIMITThe symbol limits the size of the entire memory pool and, in turn, the number of arenas that can be created.

By default, this compilation symbol is not turned on on either Win32 or Unix platforms, so Python generally does not place any restrictions on the size of the memory pool for small chunks of memory.

[obmalloc.c] #ifdef WITH_MEMORY_LIMITS #ifndef SMALL_MEMORY_LIMIT #define SMALL_MEMORY_LIMIT (64 * 1024 * 1024) /* 64 MB  -- more? */ #endif #endif #ifdef WITH_MEMORY_LIMITS #define MAX_ARENAS (SMALL_MEMORY_LIMIT / ARENA_SIZE) #endifCopy the code

CPython lets us just provide the type and quantity

With the following macro definitions, we only need to provide types and quantities when we write code, rather than calculate how much space we need to claim

//Include/pymem.h
#define PyMem_New(type, n) \
  ( ((size_t)(n) > PY_SSIZE_T_MAX / sizeof(type)) ? NULL :      \
        ( (type *) PyMem_Malloc((n) * sizeof(type)) ) )
#define PyMem_NEW(type, n) \
  ( ((size_t)(n) > PY_SSIZE_T_MAX / sizeof(type)) ? NULL :      \
        ( (type *) PyMem_MALLOC((n) * sizeof(type)) ) )
#define PyMem_Resize(p, type, n) \
  ( (p) = ((size_t)(n) > PY_SSIZE_T_MAX / sizeof(type)) ? NULL :        \
        (type *) PyMem_Realloc((p), (n) * sizeof(type)) )
#define PyMem_RESIZE(p, type, n) \
  ( (p) = ((size_t)(n) > PY_SSIZE_T_MAX / sizeof(type)) ? NULL :        \
        (type *) PyMem_REALLOC((p), (n) * sizeof(type)) )
#define PyMem_Del               PyMem_Free
#define PyMem_DEL               PyMem_FREE
Copy the code

Memory fragmentation problem

Each time you apply for memory will not be met just go get assigned every time, so once a chunk of memory will be used by cutting, then the middle will produce a lot of small fragments but may not be used, but the whole together is also a big can use block), and every time to find the right pieces need to traverse the entire heap, So in order to reduce fragmentation and quickly allocate memory, we need memory management

Figure 2

Partition of Python memory management

Memory requests smaller than 512 bytes are taken over by Python’s low-level allocator (blank memory, raw memory), which is divided into three levels: blocks, pools, and arenas

A block is Python’s smallest unit of memory management, where it is the same size as pool_head’s szidx and uses a best-fit allocation strategy
- Best-fit allocation policy: Returns the smallest partition greater than or equal to size
Pool is a block that manages a type of specification. It is a memory management abstraction with the concept of size and has a szidx management for pool_head. (Of course, she also has state management, which will be described later.)
An arena can manage multiple pools, and each pool can have different specifications. (He also has his own state management, which will be covered later.)

Figure 3

Figure 4: Pool and Arena headers are connected differently to boby

Python low-level memory allocator (Tier 1)

Now comes the real Python memory management talk, what Python memory management does

Reduce memory fragmentation problems
- The above block concept is proposed to effectively solve the memory fragmentation problem, but it is impossible to solve
It is impossible to traverse the heap for every allocation
- So arena_head and pool_head are both complex and maintain multiple linked lists to reduce overhead from O(N) to O(1)
The Python allocator deals primarily with memory that is <512 bytes small, and frequent allocation/freeing is bound to be wasteful
- Most of Python’s base classes introduce a cache pool mechanism to manage the allocation and release of small chunks of memory, providing **pymalloc_alloc,pymalloc_realloc,pymalloc_free** Three interfaces
  - For example, dictionaries have 80-size arrays as cache pools
  - Lists also have 80-size arrays as cache pools

arena

The core of the first layer is to create an arena

The size of the arena

The default value of arena is 256K

#define ARENA_BITS              18                    /* 256 KiB */
#define ARENA_SIZE              (1 << ARENA_BITS)
Copy the code

Arena head structure

// Objects/obmalloc.c
struct arena_object {
    / / arena_object address
    uintptr_t address;

    // Use arena's address to align the address for pool use
    block* pool_address;

    // The number of pools available in the arena
    uint nfreepools;

    // The number of pools in the arena
    uint ntotalpools;

    // Pool is maintained with a single linked list
    struct pool_header* freepools;

    // Two-way linked list pointer
    struct arena_object* nextarena;
    struct arena_object* prevarena;
};
Copy the code

Why do arenA_object need address and pool_address2 fields?

The above division of memory management mentions that arenA_Object and Arena are discontinuous, figure 4

When pool_header is requested, the memory of the collection of blocks it manages must also be requested; So it’s a continuous piece of space

However, when aerna_object is applied, the memory of the pool set managed by aerna_object is not applied. The arena requires the pointer to be connected

So address specifies the header data, pool_address specifies the start of the real data, so different

new_arena

type

Uintptr_t is provided by stdint.h, which is imported from C99 and plays a big role in converting C Pointers to integers. The Uintptr_t is responsible for filling this environmental gap. Depending on the environment, the Uintptr_t transforms into 4 or 8 bytes, safely transforming the pointer to avoid overflow problems.

// uchar and uint are abbreviations for unsigned ×××, respectively.
#undef uchar
#define uchar unsigned char /* About 8 bits */

#undef uint
#define uint unsigned int /* Is approximately greater than or equal to 16 bits */

#undef ulong
#define ulong unsigned long /* Is approximately greater than or equal to 32 bits */

#undef uptr
#define uptr Py_uintptr_t

typedef uchar block;
Copy the code

//[obmalloc.c]
// ArenAS manages the arenA_Object collection
static struct arena_object* arenas = NULL;
// Number of ARENA_Objects managed in the current ARENAS
static uint maxarenas = 0;
// "unused" arenA_objectd unidirectional linked list
static struct arena_object* unused_arena_objects = NULL;
// The "available" arenA_Object list
static struct arena_object* usable_arenas = NULL;
// The number of ARENA_Objects to apply for during initialization
#define INITIAL_ARENA_OBJECTS 16
Copy the code

//[obmalloc.c]
static struct arena_object* 
new_arena(void)
{
    struct arena_object* arenaobj; 
    uint excess;  /* number of bytes above pool alignment */
  
		// Initialize the default value to NULL and generate arenA_objects
    if (unused_arena_objects == NULL) {
      uint i;
      uint numarenas;
      size_t nbytes;

      // Determine the number of arenas to apply for, initialize to 16, then double the capacity
      numarenas = maxarenas ? maxarenas << 1 : INITIAL_ARENA_OBJECTS;
      
      // Overflow judgment
      if (numarenas <= maxarenas)
        return NULL;  
      nbytes = numarenas * sizeof(*arenas);
      if (nbytes / sizeof(*arenas) ! = numarenas)return NULL;  
     
      // Use layer 0 allocator to allocate raw memory for arenA_Object (header information)
      // Arenas as a static global variable after allocation
      arenaobj = (struct arena_object *)realloc(arenas, nbytes);
      if (arenaobj == NULL)
        return NULL;
      arenas = arenaobj;

      // Maintain the raw memory allocated above into unusED_ARENA_OBJECTS
      for (i = maxarenas; i < numarenas; ++i) {
        // Arena address, 0 if not assigned
        arenas[i].address = 0; 
        // The last arena points to NULL, the rest to the next pointer, and the initial allocation is a continuous singly linked list
        arenas[i].nextarena = i < numarenas - 1 ? &arenas[i+1] : NULL;
      }
      /* is reflected in the global variable */
      unused_arena_objects = &arenas[maxarenas];
      maxarenas = numarenas;
    }

/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / the above completed arenas initialization, as shown in the figure below / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
  
    // Retrieve an "unused" ARENA_Object from the unused_ARENA_Objects list
    arenaobj = unused_arena_objects;
    unused_arena_objects = arenaobj->nextarena;
    assert(arenaobj->address == 0);
  
    // Allocate a block of arena memory, 256KB
    // Address is a specific address
    arenaobj->address = (uptr)malloc(ARENA_SIZE);
    ++narenas_currently_allocated;
  	
    if (arenaobj->address == 0) {
				// Allocate failed, let the removed head back into the unusED_ARENA_OBJECTS list
        arenaobj->nextarena = unused_arena_objects;
        unused_arena_objects = arenaobj;
        return NULL;
    }
  
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / is to link with arena_object distribution arena space / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
  
    // Split the space within an arena into pools
    arenaobj->freepools = NULL;
  	/* pool_address specifies the pool_address of the pool. */
    arenaobj->pool_address = (block*)arenaobj->address;
    arenaobj->nfreepools = ARENA_SIZE / POOL_SIZE;
    
  	// Memory alignment
    excess = (uint)(arenaobj->address & POOL_SIZE_MASK);
    if(excess ! =0) {
      --arenaobj->nfreepools;
      arenaobj->pool_address += POOL_SIZE - excess;
   }
    arenaobj->ntotalpools = arenaobj->nfreepools;

    return arenaobj;
}
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / above is divided into the pool / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
Copy the code

1. Initialize 16 ARENA_Objects

Figure 5

Figure 6.

3. Allocate arena space, where the arena header is connected to real data

Figure 7.

Excess = pool = pool = pool = pool = pool = pool

The pool_address member of the arenA_object structure holds the address of the pool aligned in 4K bytes.

Here we use POOL_SIZE_MASK to mask the address of the arena reserved with malloc() and calculate the excess.

If excess is 0, the program returns the allocated ArenA_Object as is, because the arena address is a multiple of 4K bytes (2 ^ 12). Since the arena is already filled with pools, you can calculate the number of pools in the arena by calculating the size of the arena or pool.

If the amount exceeded is not zero, the program calculates arena’s address + amount exceeded and sets it to member pool_address. Nfreepools = pool null;

Figure 8.

Two states of arena

Whether the ARENA_Object is associated with the pool causes the status difference

Unused_arena_object (Unused state)

The structure ARENA_Object is stored in this list only if its member Address is 0
- The arenA_object created by new_arena() is not connected to the pool
- When arena is empty when PyObject_Free(), arenA_Object is headinserted into this list
Unidirectional linked list maintenance

Usable_arenas (available state)

All used pools and unused pools are empty, that is, nfreepool>0
- Used states are governed by usedpools. When all used arenas are deleted from the list, even if pools still have blocks that may be used. Because when applying for memory will go to the USedPool to find. Usable_arenas ->nfreepools == 0
Bidirectional linked list maintenance
- The linked list is arranged in the order of arena with the largest number of blocks. (Based on the ascending order of member nfreepools, which means using as many arenas as possible first)

Figure 9.

PythonObject allocator (Layer 2)

The level 2 allocator manages blocks in the pool. This layer actually returns the beginning address of the block to the applicant and frees the block, etc.

block

A pool is divided into blocks. When objects are generated in Python, they are eventually divided into one or more blocks. A block is the smallest unit of Python memory allocation

Memory alignment

Memory blocks with an 8-byte gradient in size are memory aligned (word aligned)

1, improve the CPU read and write speed

2. Reduced fragment size (essential waste)

// The following macro
// Index 0 is 1 << 3, which is obviously 8
// Index 1 is 2 << 3, which is obviously 16
#define INDEX2SIZE(I) (((uint)(I) + 1) << ALIGNMENT_SHIFT) 
 * Request in bytes     Size of allocated block      Size class idx* -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 1-8 * 8 0 * 9-16 * 16 1 17-24 24 32 3 * 2 * 25-32 33-40 40 4 * 41-48 48 5 * 49-56 56 6 * 57-64 64 7 * 65-72 72 8 *... . . * 497-504 504 62 * 505-512 512 63Copy the code

PyObject_Malloc allocates a 48-byte block from the memory pool when we need to allocate 44 bytes of memory

//Objects/obmalloc.c
#define ALIGNMENT               8               /* must be 2^N */
#define ALIGNMENT_SHIFT         3
Copy the code

Figure 10.

As you can see in Figure 8, excess is aligned for pool4K size in the arena, so blocks are naturally aligned in multiples of 8 bytes

Since szidx is determined in pool_header

Utilize memory alignment hacks

The CPU can in principle fetch data from aligned addresses. Accordingly, the address assigned by malloc() should also be aligned with the CPU to return data.

A famous hack that takes advantage of this is to use the lower three bits of an address as a flag.

Let’s say we store some pointer in the structure. If the address returned from malloc() is 8-byte aligned, the lower three bits of its pointer must be “0”. So we came up with the idea of putting bits here and using them as flags. When we actually want to access the pointer, we set the lower three bits to zero, ignoring the flag.

This is a very bold hack, but glibc Malloc actually implements it.

The state of the block

A block has three types of state management

Have been allocated
Used up: a used block is released again
- Freeblock A block that is attached to the list when it is released
- Freeblock refers to the first freeblock that is available for use. It is NULL if no used block has been created. Unused blocks are always used by nextoffset. When there is a reclaimed block freeblock points to the first freeblock and is used with the offset nextoffset first.
Unused: Unused naturally there is no pointer to the linked list, so we have to set the offset of the first available block on pool_headnextoffset

Figure 11.

pool

The size of the pool

The pool size is the same as the system page size of 4KB, one pool can manage only one size block, byszidxField to identify. So pool is a collection of blocks with the concept of size

//Objects/obmalloc.c
#define SYSTEM_PAGE_SIZE        (4 * 1024)
#define SYSTEM_PAGE_SIZE_MASK   (SYSTEM_PAGE_SIZE - 1)
#define POOL_SIZE               SYSTEM_PAGE_SIZE        /* must be 2^N */
#define POOL_SIZE_MASK          SYSTEM_PAGE_SIZE_MASK
Copy the code

Pool memory alignment

In part 4 of arena initialization, excess was mentioned for pool memory alignment, as shown in Figure 8. I won’t repeat it here

Pool header structure

A pool header consists of 48 bytes, and all pools are connected in a bidirectional linked list

//Objects/obmalloc.c

/* When you say memory, my mind reasons in terms of (pointers to) blocks */
typedef uint8_t block;

/* Pool for small blocks. */
struct pool_header {
    union { block *_padding;
            uint count; } ref;          /* Number of allocated blocks */
    block *freeblock;                   /* points to the first block in the free block list */
    struct pool_header *nextpool;       /* Next and prev provide usedPool to reduce cache table space */
    struct pool_header *prevpool;     
    uint arenaindex;                    /* The index of the arena to which you belong (for Arenas) */
    uint szidx;                         /* The size of the allocated block, so all blocks in the pool are the same size */
    uint nextoffset;                    /* The memory offset of the next available block */
    uint maxnextoffset;                 /* The offset from the start of the last block */
};

typedef struct pool_header *poolp;
Copy the code

Figure 12

The state of the pool

The empty state: All blocks in the pool are not used
- Pool_size specifies the size of the pool that has been used up
Informs the state: At least one block in the pool has been used and at least one block has not been used. Maintained by the USedPools array
Full state: All blocks in the pool have been used and removed from the USedPools list.

Figure 13

usedpools

The pool that manages all used states

// poolp is probably the alias of the pointer to pool_header. That is, usedpools is an array of Pointers to pool_header.
typedef struct pool_header *poolp;
Copy the code

Macro NB_SMALL_SIZE_CLASSES

#define ALIGNMENT 8 PI /* is going to have to be 2 to the N */
#define SMALL_REQUEST_THRESHOLD 512
// Specifies how many size classes there are under the current configuration.
#define NB_SMALL_SIZE_CLASSES   (SMALL_REQUEST_THRESHOLD / ALIGNMENT)
Copy the code

The initial size of usedPools

// This macro defines a pointer to "the size of two block Pointers" from the beginning of a group.
#define PTA(x)  ((poolp )((uint8_t *)&(usedpools[2*(x)]) - 2*sizeof(block *)))

The macro PT() calls the macro PTA() in groups of two.
#define PT(x)   PTA(x), PTA(x)

// The usedPools array has 128 entries
static poolp usedpools[2 * ((NB_SMALL_SIZE_CLASSES + 7) / 8) * 8] = {
    PT(0), PT(1), PT(2), PT(3), PT(4), PT(5), PT(6), PT(7)
#if NB_SMALL_SIZE_CLASSES > 8
    , PT(8), PT(9), PT(10), PT(11), PT(12), PT(13), PT(14), PT(15)
#if NB_SMALL_SIZE_CLASSES > 16
    , PT(16), PT(17), PT(18), PT(19), PT(20), PT(21), PT(22), PT(23)
#if NB_SMALL_SIZE_CLASSES > 24
    , PT(24), PT(25), PT(26), PT(27), PT(28), PT(29), PT(30), PT(31)
#if NB_SMALL_SIZE_CLASSES > 32
    , PT(32), PT(33), PT(34), PT(35), PT(36), PT(37), PT(38), PT(39)
#if NB_SMALL_SIZE_CLASSES > 40
    , PT(40), PT(41), PT(42), PT(43), PT(44), PT(45), PT(46), PT(47)
#if NB_SMALL_SIZE_CLASSES > 48
    , PT(48), PT(49), PT(50), PT(51), PT(52), PT(53), PT(54), PT(55)
#if NB_SMALL_SIZE_CLASSES > 56
    , PT(56), PT(57), PT(58), PT(59), PT(60), PT(61), PT(62), PT(63)
#if NB_SMALL_SIZE_CLASSES > 64
#error "NB_SMALL_SIZE_CLASSES should be less than 64"
#endif /* NB_SMALL_SIZE_CLASSES > 64 */
#endif /* NB_SMALL_SIZE_CLASSES > 56 */
#endif /* NB_SMALL_SIZE_CLASSES > 48 */
#endif /* NB_SMALL_SIZE_CLASSES > 40 */
#endif /* NB_SMALL_SIZE_CLASSES > 32 */
#endif /* NB_SMALL_SIZE_CLASSES > 24 */
#endif /* NB_SMALL_SIZE_CLASSES > 16 */
#endif /* NB_SMALL_SIZE_CLASSES > 8 */
};
Copy the code

Figure 14, now viewed from the perspective of usedPool

How do usedpools do it fast – hash like

“Used” is a combination of pools that have used at least one block, but have not used up all of them, into a usedpool. This method is similar to the hash table chained address method, using the subscript O (1) to reach the position of the usedpool[subscript] of the same size, then using the linked list. Empty -> Used and used->full make it easy to insert and delete pools

A case in point

1. When requesting 20 bytes of memory, Python will first get the size class index. Size = (uint)(nbytes-1) >> ALIGNMENT_SHIFT where ALIGNMENT_SHIFT is memory alignment needs to be shifted 3 bits right (that is, 8 bytes aligned) to get (20-1) >>3=2

Nextpool [I + I]-> nextPool usedPools [I + I]-> nextPool

byte = 20 /* Number of requested bytes */
byte = (20 - 1) > >3 /* align: result 2 */
pool = usedpools[byte+byte] /* Double the index because it is in pairs: index 4 */ 		 // O (1)
// The pool has the following relationship.
pool; == pool->nextpool
pool; == pool->prevpool
pool->nextpool == pool->prevpool         // O (1)
Copy the code

Usedpool also needs to save as much space as possible

When a cache is needed, it is possible to make the cache hold as few reference tables as possible. (You only need the two internal pointer members of pool_header, next and prev)

If pool_header is left alone, usedpools tend to become too large for the cache. Because we refer to the array USedPools frequently, making it smaller will reduce the stress of the cache.

Release policies for arena and pool

By minimizing the use of memory Spaces that have more available space, you increase the chances of making them completely empty. If this part of memory space is completely empty, it can be freed.

Usable_arenas: Is sorted by ascending order of nfreepools in order to use as many arenas as possible first
When full->used: all headers are inserted into usedpools, and pools are currently used up

Why does USedPools need twice as much space

When pymalloc_free is released, it can be seen that the head is inserted into the USedPool [odd], and the state of full changes to used

		// Code in free,
    if (UNLIKELY(lastfree == NULL)) {
				uint size = pool->szidx;
        poolp next = usedpools[size + size];   // The end of the bidirectional list
        poolp prev = next->prevpool;

        pool->nextpool = next;
        pool->prevpool = prev;
        next->prevpool = pool;
        prev->nextpool = pool;
        return 1;
    }
Copy the code

Allocate the pool directly from usedpools, so run out of pools

		// Use it directly
   	poolp pool = usedpools[size + size];
   	block *bp;


    if(LIKELY(pool ! = pool->nextpool)) {// Number of blocks used ++
        ++pool->ref.count;
        bp = pool->freeblock;      // freeblock refers to the first freeblock, used directlyassert(bp ! =NULL);
Copy the code

Allocation execution process

pymalloc_alloc

This function is used to allocate blocks, pools, and arenas when the requested memory is less than 512 bytes

// Subscripts are mapped to size size
#define INDEX2SIZE(I) (((uint)(I) + 1) << ALIGNMENT_SHIFT)
// Memory aligned macros
#define POOL_OVERHEAD   _Py_SIZE_ROUND_UP(sizeof(struct pool_header), ALIGNMENT)

#define DUMMY_SIZE_IDX          0xffff  /* size class of newly cached pools */
Copy the code

//Objects/obmalloc.c
static void*
pymalloc_alloc(void *ctx, size_t nbytes)
{

    Python0 ==0; python0 ==0; python0 ==0
    // Python takes over raw memory. Of course, RAW memory is also created by C
  	if (UNLIKELY(nbytes == 0)) {
        return NULL;
    }
    if (UNLIKELY(nbytes > SMALL_REQUEST_THRESHOLD)) {
        return NULL;
    }
  
  
  	// use size to calculate the location of the usedpools array.
    uint size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT;
   	poolp pool = usedpools[size + size];
   	block *bp;


  	// Pool is allocated if pool exists in usedPools
    if(LIKELY(pool ! = pool->nextpool)) {// Number of blocks used ++
        ++pool->ref.count;
        bp = pool->freeblock;      // freeblock refers to the first freeblock, used directlyassert(bp ! =NULL);


        if (UNLIKELY((pool->freeblock = *(block **)bp) == NULL)) {
          // If freeBlock is NULL, use the offset to fetch unused blocks
          if (UNLIKELY(pool->nextoffset <= pool->maxnextoffset)) {
            pool->freeblock = (block*)pool + pool->nextoffset;
            // Restore size with a subscript
            pool->nextoffset += INDEX2SIZE(size);
            *(block **)(pool->freeblock) = NULL;       
            return;
          }

          /* There are no more blocks to allocate, so delete */ from usedpools
          poolp next;
          next = pool->nextpool;
          pool = pool->prevpool;
          next->prevpool = pool;
          pool->nextpool = next;
    }
  
    // Pools are not available
    else {
        bp = allocate_from_new_pool(size);
    }
		  
      
      // Return the block in the pool
    return (void *)bp;
  
}
Copy the code

allocate_from_new_pool

#define ROUNDUP(x) (((x) + ALIGNMENT_MASK) & ~ALIGNMENT_MASK)
#define POOL_OVERHEAD ROUNDUP(sizeof(struct pool_header))
// A virtual large value is used to initialize an empty pool in case it matches a block in Freepool
#define DUMMY_SIZE_IDX 0xffff
Copy the code

static void*
allocate_from_new_pool(uint size)
  	// new_arena () = new_arena ();
    // new_arena sets arenA_object to usable_arenas. The bidirectional list pointer is null because it is the first one
    if (usable_arenas == NULL) {
      usable_arenas = new_arena();
      usable_arenas->nextarena = usable_arenas->prevarena = NULL;
    }
		
    poolp pool = usable_arenas->freepools;

    
		Szidx = szidx; szidx = szidx;
		// Select from freepools and place in usedpools
    if(pool ! =NULL) {
        usable_arenas->freepools = pool->nextpool;
      	--usable_arenas->nfreepools;
      	// Freepools run out, then use the next usable_arenas, return the arenA_object header
        if (UNLIKELY(usable_arenas->nfreepools == 0)) {
          	usable_arenas = usable_arenas->nextarena;
            if(usable_arenas ! =NULL) {
                usable_arenas->prevarena = NULL; }}}// Select pool from freepools; // select pool from freepools
    else {
        pool = (poolp)usable_arenas->pool_address;
        pool->arenaindex = (uint)(usable_arenas - arenas);
      	// The dummy value is set to prevent a match with the block in Freepool. The dummy value is used to initialize the empty pool
        pool->szidx = DUMMY_SIZE_IDX;
        usable_arenas->pool_address += POOL_SIZE;
        --usable_arenas->nfreepools;

      	// Return the arenA_object header if no pool is available
        if (usable_arenas->nfreepools == 0) { usable_arenas = usable_arenas->nextarena; }}// After either case 1 or 2 returns a block, the pool is inserted as the first pool in the bidirectional list of usedPools
    block *bp;
    poolp next = usedpools[size + size]; /* == prev */
    pool->nextpool = next;
    pool->prevpool = next;
    next->nextpool = pool;
    next->prevpool = pool;
    pool->ref.count = 1; 
		

		// Use case 1, which directly uses blocks from the freepools linked list
    if(pool->szidx == size) { bp = pool->freeblock; assert(bp ! =NULL);
        pool->freeblock = *(block **)bp;
        return bp;
    }

		// In case 2, you need to initialize an empty pool for the pool header
    pool->szidx = size;
		// a macro that converts szidx to the size of a memory block, such as 0->8, 1->16, 63->512
    size = INDEX2SIZE(size);
		// Skip memory for pool_header and align it
    bp = (block *)pool + POOL_OVERHEAD;
    pool->nextoffset = POOL_OVERHEAD + (size << 1);
    pool->maxnextoffset = POOL_SIZE - size;
    pool->freeblock = bp + size;
    *(block **)(pool->freeblock) = NULL;    // There is a free list header pointing to nothing
    return bp;
}

Copy the code

Release execution flow

This function has three functions: “release block”, “release pool”, and “release arena”.

pymalloc_free

Tips for searching pools from blocks

#define SYSTEM_PAGE_SIZE (4 * 1024)
#define SYSTEM_PAGE_SIZE_MASK (SYSTEM_PAGE_SIZE - 1)
#define POOL_SIZE_MASK SYSTEM_PAGE_SIZE_MASK


// Obtain the boundary address of the pool nearest P based on address P
#define POOL_ADDR(P) ((poolp)_Py_ALIGN_DOWN((P), POOL_SIZE)) // The equivalent is as follows
#define POOL_ADDR(P) (P & 0xfffff000)
Copy the code

Pool address alignment is aligned in 4K bytes. That is, if you start at the address of a block within the pool and flag it with 0xfffff000, you are guaranteed to get the start of the pool.

The last three zeros are 16^3=4096, so the first ones must be a multiple of 4K

//Objects/obmalloc.c

static inline int
pymalloc_free(void *ctx, void *p)
{
    poolp pool = POOL_ADDR(p);
  	// Check whether the pool obtained with the macro POOL_ADDR() is correct
    if(UNLIKELY(! address_in_range(p, pool))) {return 0;
    }
  	// Insert the free p into the freeblock header
    block *lastfree = pool->freeblock;
    *(block **)p = lastfree;
    pool->freeblock = (block *)p;
    pool->ref.count--;
		
		// The full state changes to the used state, and the head is inserted into the usedpools
    if (UNLIKELY(lastfree == NULL)) {
				uint size = pool->szidx;
        poolp next = usedpools[size + size];
        poolp prev = next->prevpool;

        pool->nextpool = next;
        pool->prevpool = prev;
        next->prevpool = pool;
        prev->nextpool = pool;
        return 1;
    }

		// There are allocatable blocks
    if(LIKELY(pool->ref.count ! =0)) {
        /* pool isn't empty: leave it in usedpools */
        return 1;
    }

		// If the release is the last block, from the used state to empty, add the freepool linked list (this is the most complicated case, use insert_to_freepool)
    insert_to_freepool(pool);
    return 1;
}
Copy the code

insert_to_freepool

Memory leaks were a problem until Python2.4, because Python2.4 did not distinguish between “unused” and “available” states for arenas, so when pools were freed, arena never freed the pool set it maintained.

The post-2.5 arena treatment is actually divided into four scenarios

If all pools in an arena are empty, free the memory occupied by the pool set
- Returning the pools memory maintained by arena out of the system, Python also tweaked the USable_ARENas and unusED_ARENA_OBJECT linked lists, moving the arena state to “unused,” and some other maintenance work.
If there was no empty pool in the arena before, then the arena will not be found in the USable_arenas list. Since there is now a pool in the arena, you need to chain the arena to the head of the USable_arenas list.
If the number of pools in an empty arena is n, start with usable_arenas to find where an arena can be inserted, and insert the arena into usable_arenas. The reason for this operation is that usable_arenas is actually an ordered linked list, and from the beginning of the list, the number of empty pools in each arena, i.e., nfreepools, cannot be larger or smaller than the previous arena. The reason for this order is that blocks are allocated from the head of usable_arenas to find available arenas, ensuring that the more empty pools an arena has, the less likely it is to be used. Therefore, the greater the chance that it will eventually free the memory of the pool set it maintains, ensuring that excess memory is returned to the system.
In other cases, no processing of the arena is done.

static void
insert_to_freepool(poolp pool)
{
  	// Select pool from usedPools
    poolp next = pool->nextpool;
    poolp prev = pool->prevpool;
    next->prevpool = prev;
    prev->nextpool = next;

  	// Insert pool headers into freepools in arena
    struct arena_object *ao = &arenas[pool->arenaindex];
    pool->nextpool = ao->freepools;
    ao->freepools = pool;
    uint nf = ao->nfreepools;
    struct arena_object* lastnf = nfp2lasta[nf];

    if (lastnf == ao) {  /* it is the rightmost */
        struct arena_object* p =ao->prevarena; nfp2lasta[nf] = (p ! =NULL && p->nfreepools == nf) ? p : NULL;
    }
    ao->nfreepools = ++nf;

    if(nf == ao->ntotalpools && ao->nextarena ! =NULL) {
        /* The last block, last pool, and finally return arenA_object */
  			// Fetch arenA_object from usable_arenas
        if (ao->prevarena == NULL) {
            usable_arenas = ao->nextarena;
        }
        else {
            ao->prevarena->nextarena =
                ao->nextarena;
        }
        if(ao->nextarena ! =NULL) {
            assert(ao->nextarena->prevarena == ao);
            ao->nextarena->prevarena =
                ao->prevarena;
        }
			
      	// Insert the header into the unused_ARENA_OBJECTS list
        ao->nextarena = unused_arena_objects;
        unused_arena_objects = ao;


        // Free memory
        _PyObject_Arena.free(_PyObject_Arena.ctx,
                             (void *)ao->address, ARENA_SIZE);
      	
      	// The "arena has not yet been assigned" flag
        ao->address = 0;                       
        --narenas_currently_allocated;
        return;
    }

  	// If the pool is full/used, release a block that makes the used-Empty state unique
  	// Add to the usable_arenas list
    if (nf == 1) {
        ao->nextarena = usable_arenas;
        ao->prevarena = NULL;
        if(usable_arenas) usable_arenas->prevarena = ao; usable_arenas = ao; assert(usable_arenas->address ! =0);
        if (nfp2lasta[1] = =NULL) {
            nfp2lasta[1] = ao;
        }
        return;
    }

    /* If this arena is now out of order, we need to keep * the list sorted. The list is kept sorted so that * the "most full" arenas are used first, which allows * the nearly empty arenas to be completely freed. In * a few un-scientific tests, it seems like this * approach allowed a lot more memory to be freed. */
    /* If this is the only arena with nf, record that. */
    if (nfp2lasta[nf] == NULL) {
        nfp2lasta[nf] = ao;

		/* 时 间 4, Nothing to do. */
  	if (ao == lastnf) {
        return;
    }

    Since usable_arenas maintains an ordered table, insert the position of the response
    if(ao->prevarena ! =NULL) {
        /* ao isn't at the head of the list */
        ao->prevarena->nextarena = ao->nextarena;
    }
    else {
        /* ao is at the head of the list */
        usable_arenas = ao->nextarena;
    }
    ao->nextarena->prevarena = ao->prevarena;
    /* And insert after lastnf. */
    ao->prevarena = lastnf;
    ao->nextarena = lastnf->nextarena;
    if(ao->nextarena ! =NULL) {
        ao->nextarena->prevarena = ao;
    }
    lastnf->nextarena = ao;
    /* Verify that the swaps worked. */
    assert(ao->nextarena == NULL || nf <= ao->nextarena->nfreepools);
    assert(ao->prevarena == NULL || nf > ao->prevarena->nfreepools);
    assert(ao->nextarena == NULL || ao->nextarena->prevarena == ao);
    assert((usable_arenas == ao && ao->prevarena == NULL)
           || ao->prevarena->nextarena == ao);
}
Copy the code

Case 3

Object specific allocator (Layer 3)

Objects have a variety of types, such as lists and tuples, which are generated using their own allocators. See my analysis of other Python underlying data structures. The next article will talk about GC in Python

Public number: always flying legless bird

Refer to the reference

Garbage collection algorithm and implementation

Parsing Python source Code

memory management in python

An in-depth look at Python memory management