Garbage Collection is probably one of the most common interview questions, but what about Python’s Garbage Collection? I remember every introduction to Python book saying don’t worry about memory leaks in Python. What’s behind this?

GC algorithms in Python

There are three points: reference counting/mark-sweep/generational recycling

· Reference count (main)

When you first start learning Python, you’ll always be told that everything is an object. At the heart of every object in Python is a PyObject structure with an internal reference counter (OB_refCNt).

 typedef struct_object {
 int ob_refcnt;
 struct_typeobject *ob_type;
} PyObject;
Copy the code
a=10
Copy the code

The reference count means that when an object is hit by a New method, its reference count is 1 because it is referenced by the New method. If it is referenced (i.e., based on the previous example: B = a, was thrown into the function list, and so on are referenced in the reference count on the 1), if the reference when its object is deleted (on the basis of the previous DEL b) then its reference count will reduce a until when its reference count becomes zero, the garbage collection mechanism will seek to do it (recycling), imagine this: open the door check water meter.

Advantages/disadvantages:

Since reference counting is the primary GC method, take a look at the pros and cons.

Best:

Simple, real-time (once zero, no more BB with you, do it)

Missing:

· High maintainability (simple and real-time, but it takes up an extra part of resources. Although logic is simple, it is troublesome. Like when you eat strawberries and wash your hands once, not after.)

· Unsolvable case: –> Circular reference (below) :

A =[1,2] b=[2,3] a. apend (b) b. apend (a) DEL a DEL bCopy the code

If a reference to b is 1, and a reference to b is subtracted by 1, then the reference count is still 1. If it’s always a 1, you get a free pass. This situation cannot be solved by reference counting alone. This also introduces the following theme tag – clear

· Mark-clear: Mark-clear is used to solve the problem of circular references. Reference loops occur only on container objects, such as lists, dictionaries, classes, and tuples. First, to keep track of container objects, we need to maintain two additional Pointers for each container object to form a linked list of container objects, one pointing to the next and the other to facilitate insertion and deletion. Imagine that there are two situations:

A:

A =[1,3] b=[2,4] a. apend (b) b. apend (a) del a del bCopy the code

B:

A =[1,3] b=[2,4] a. apend (b) b. apend (a) del aCopy the code

Okay, let’s get to the point. In the mark-purge algorithm, there are two concentration camps, one is root object and the other is unreachable list.

· For scenario A, the reference counts of A and B are both 2 (init+append=2) before the DEL statement is executed, but after the DEL statement is executed, the reference counts of A and B are reduced by 1. A and B are trapped in a loop of references, and the mark-clear algorithm goes out and does its work. It finds one end, A, and starts undoing the reference ring of A and B (we start with A, because it has a reference to B, and then we subtract b’s reference count by one; Then follow the reference to B, since B has A reference to A, and also subtract A reference to A by 1, thus completing the loop removal between reference objects. , it was found that the circular reference of A and B changed to 0, so a and B were processed in the unreachable list and deleted directly.

· For scenario B, the reference count is still 1 after B takes the ring, but it is 0 after A takes the ring. At this time, A has entered the unreachable list and has been sentenced to death, but at this time, B is in the root list. What kind of justice is there in the world if A is taken away… In the root list, b will be referenced. If a is omitted, then B will be referenced. Cool cool, the first trial completed, the second trial a is not guilty, so was pulled to the root list.

QA: Why these two lists

The reason for splitting into two linked lists is based on the following consideration: Currently, unreachable objects may exist in the root list, which are directly or indirectly referenced. These objects cannot be reclaimed. Once such objects are found in the tagging process, they are moved from the unreachable list to the root list. When the tag is complete, all remaining objects in the Unreachable list are truly garbage objects, and subsequent garbage collection is restricted to the unreachable list.

Generation recycling:

To understand the sorting collection, we must first understand the GC threshold, the so-called threshold is a critical point value. As your program runs, the Python interpreter keeps track of newly created objects, as well as objects that are released because the reference count is zero. In theory, create == release quantity should look like this. However, if there is a cyclic reference, it must be create > free number. When the difference between create and free number reaches the specified threshold, the generation recycle mechanism will come into play.

Garbage collection = garbage detection + release.

The idea of generation recycling divides objects into three generations (generation 0,1,2), with 0 representing young objects,1 representing young objects, and 2 representing old objects. According to the weak generation hypothesis (younger objects are more likely to die, and older objects generally live longer). The new object is put into generation 0, and if it survives a gc garbage collection in generation 0, it is put into generation 1 (it is upgraded). If an object in generation 1 survives a gc garbage collection in generation 1, it is placed in generation 2. Gc.set_threshold (threshold0[,threshold1[,threshold2]]) Sets the threshold for each generation of GC garbage collection. Since the last generation 0 GC, if the number of allocated objects minus the number of freed objects is greater than Threshold0, then objects in generation 0 are checked for GC garbage collection. Objects in generation 1 are checked for GC garbage collection if the number of gc garbage collections in generation 0 since the last generation 1 gc is greater than Threshold1. Similarly, objects in generation 2 are checked for GC garbage collection if the number of gc garbage collections past Generation 1 is greater than Threshold2 since the last generation 2 GC.

This is a simple GC for Python.