1. Memory management

Everything is an object in Python, and objects can be divided into mutable and immutable objects. If the address remains unchanged after modification, it is a mutable object; otherwise, it is an immutable object. The address information can be viewed by id().

>>> a = 10
>>> id(a)
4339392960
>>> a =11
>>> id(a)
4339392992
>>> a = [1, 2]
>>> id(a)
4342877192
>>> a.append(3)
>>> a
[1, 2, 3]
>>> id(a)
4342877192
Copy the code

Python has a memory pool mechanism, Pymalloc mechanism, for memory allocation and release management. Let’s take a look at why there are memory pools:

Frequent calls to new/malloc in C can result in a large amount of memory fragmentation and reduce efficiency when creating a large number of objects that consume little memory.

The concept of a memory pool is to apply for a certain number of memory blocks of equal size in advance and reserve them for later use. When new memory requirements are met, the system allocates memory from the memory pool to the new memory requirements. The most obvious advantage of doing this is to reduce memory fragmentation and improve efficiency.

Pymalloc for small objects, Pymalloc allocates space in the memory pool, usually less than 256kb. For large objects, Pymalloc calls new/malloc.

Once memory is created, it needs to be collected. The garbage collection mechanism is also a must in Python interview.

2. Garbage collection mechanism

Garbage collection. Python uses garbage collection (GC) as an automatic memory management mechanism. The GC does two things: one is to find garbage objects in memory, and the other is to clean up garbage objects, freeing up memory for other objects to use.

Python uses reference counting as its primary strategy, flag clearing and generational collection as its secondary strategy.

2.1 Reference Counting

Look at the source code, each object, in the source code is a structure representation, will have a count field.

typedef struct_object {
 int ob_refcnt;
 struct_typeobject *ob_type;
} PyObject;
Copy the code

PyObject is mandatory for every object, where ob_RefCNt is used as a reference count. When an object has a new reference, its OB_refCNt increases, and when the object referencing it is deleted, its OB_refcnt decreases. Once the reference count of an object reaches zero, the object is immediately reclaimed and the memory occupied by the object is freed.

The advantages and disadvantages of this algorithm are very obvious:

Advantages:

  • simple
  • Real-time: Once there is no reference, memory is freed directly. Don’t wait for a specific moment like other mechanics.

Disadvantages:

  • Extra space is required to maintain reference counts.
  • Cannot resolve circular references to objects. (Major disadvantages)

Here’s what a circular reference is:

A and B refer to each other without external references to either A or B. That is, objects are applied to each other, causing the chain of references to form a ring.

>>>>>>a = { } Object A has A reference count of 1
>>>b = { } Object B has a reference count of 1
>>>a['b'] = b  The reference count of #B is increased by 1
>>>b['a'] = a  The reference count of #A increases by 1
>>>del a The reference to #A is subtracted by 1, and finally the reference to the object A is 1
>>>del b The reference to #B is subtracted by 1, and finally the reference to B object is 1
Copy the code

After del, objects A and B no longer have any references to each other, but each contains A reference to the other, even though the last two objects can’t be referred to by any other variable, which is two inactive or garbage objects to GC. In theory it needs to be recycled. According to the reference counting principle above, the count must be 0 before the collection, but their reference count is not reduced to zero. So if you use reference counting to manage these two objects, they will not be reclaimed, they will remain in memory, causing a memory leak.

To solve the problem of cyclic references to objects, Python introduces two GC mechanisms, tag scavenging and generational collection.

2.2 Mark Clearing

The main problem with tag clearing is the circular reference problem.

The tag clearing algorithm is a garbage collection algorithm based on tracing GC. It is divided into two phases: the first phase is the marking phase, in which GC marks all live objects, and the second phase is to reclaim those unmarked objects that are not live. So how does GC determine which objects are live and which are not?

Objects are connected by references (Pointers) to form a digraph, objects form nodes of the digraph, and reference relations form edges of the digraph. Starting from the root object, the object is traversed along the directed edge. The reachable objects are marked as active objects. The unreachable objects are the inactive objects to be removed. The root object is the global variable, call stack, register.

In the diagram above, we put the little black circles as a global variable, or use it as the root object, from the black circle, 1 can direct object, then it will be marked, object 2, 3 can be indirectly reach will be marked, and 4, and 5 inaccessible, then 1, 2, 3, is active objects, 4 and 5 are active objects by GC.

The tag-clearing algorithm is Python’s assistant garbage collection technique that works with container objects. Iterators, such as list, dict, tuple, etc., are not possible for string or numeric objects to cause circular references. Python uses a bidirectional linked list to organize these container objects.

Python’s crude markup removal algorithm also has an obvious drawback: it must scan the entire heap sequentially before removing inactive objects, even if only a small number of live objects are left.

2.3 Generational recycling

Generational recycling is a space – for – time operation.

Python memory based on survival time of the object is divided into different sets, each set is called a generation, Python memory can be divided into three “generation”, respectively, in the younger generation (0), s (1), (2) the old s, they are the corresponding three list, their frequency of garbage collection and object decreases with the increasing of survival time. New objects are allocated to the young generation. When the total number of lists in the young generation reaches a limit, Python garbage collection is triggered to recycle objects that can be collected, and objects that can’t be collected are moved to the middle age, and so on. Objects in the old age are the ones that live the longest. Or even for the entire life of the system. At the same time, the generation recycling is based on the marker removal technology.

Generational collection also works with container objects as an auxiliary garbage collection technique in Python.

You can check out the Python interview details at gitbook.cn/gitchat/act…

More exciting articles please pay attention to the public account: “Tiancheng Technology Miscellaneous talk”