Life is too short to talk about romance and recycling.
Those of us who have written C or C++ know that there is no such thing as garbage collection in C. Manual allocation and release of memory are our programmers to do it. Both “memory leaks” and wild Pointers are a major headache for developers. So one of the most discussed topics in C development is memory management. But for other high-level languages, such as Java, C#, Python, etc., garbage collection is already available. This eliminates the complexity of memory management and allows developers to focus on the core business logic.
For us Python developers, it’s a no-brainer. You don’t have to worry about how it recycles the garbage generated by running the program. But this is the inner workings of a language. Do we want to spend our lives as API tuners?
1. What is garbage?
When the Python interpreter executes the syntax that defines a variable, it allocates memory space to store the value of the variable, and the memory capacity is limited. This involves reclaiming the memory occupied by the value of the variable.
When an object or variable is no longer useful, it is considered “garbage”. So what kind of variables are useless?
a = 10000
Copy the code
When the interpreter executes up here, it allocates a block of memory to store the value 10000. At this point 10000 is referenced by variable A
a = 30000
Copy the code
When we change the value of this variable, we allocate another block of memory to hold 30000, and the variable a references 30000.
At this point, our 10000 has no references to it, so we can say it’s garbage, but it still hogs the memory that was given to it. So our interpreter is going to have to reclaim this memory territory.
2. Memory leaks or memory overflows
Above we know what is the “garbage” in the process of running the program, so if garbage is generated, we do not deal with it, what will happen? Imagine if your house never threw out the garbage, and the resulting garbage just piled up in your house?
- Your house is filled with garbage, and there’s a beautiful woman who wants to be your date, but there’s no room for her.
- You can still live there, but garbage takes up a lot of space and wastes a lot of space, and eventually your house will be full of garbage
The result is a computer problem that scares every programmer. Memory leaks and leaks can slow down programs at best or crash them at worst.
Out of memory: when a program requests memory, it does not have enough memory space to use
Memory leak: The program cannot release the allocated memory space after applying for memory. The damage caused by a memory leak can be ignored. However, the accumulation of memory leaks is serious
3. Reference count
We mentioned earlier that garbage happens when the object is not referenced by any other variable. So how does our interpreter know if an object is still referenced?
The answer is reference counting. Python uses an internal reference counting mechanism to count the number of times an object is referenced. When this number becomes zero, the object is not referenced. At this point it becomes “garbage”.
Where is this reference count? Let’s look at the code
text = "hello,world"
Copy the code
What does the above line of code do?
- Create a string object: its value is Hello,world,
- Open up memory: When an object is instantiated, the interpreter allocates a memory address space for the object. Store the structure of this object in this memory address space.
Let’s look at the structure of this object
typedef struct_object {\ int ob_refcnt; \ struct_typeobject *ob_type; \ } PyObject;Copy the code
If you’re familiar with C or C ++, you should be particularly familiar with this, it’s a structure. This is because our official Python interpreter is CPython, which calls a lot of C libraries and interfaces underneath. So some of the underlying data is stored in a structure. It doesn’t matter if a friend doesn’t understand.
Here, we only need to focus on one parameter: ob_refcnt
The magic of this parameter is that it records the number of times the object is referenced by the variable. So the reference count for the hello,world object above is 1, because it’s now referred to only by the text variable.
① Variable initialization assignment:
text = "hello,world"
Copy the code
② Variable reference transfer:
new_text = text
Copy the code
③ Delete the first variable:
del text
Copy the code
④ Delete the second variable:
del new_text
Copy the code
At this point, the “hello,world” object has a reference count of: 0 and is treated as garbage. The next step is to be picked up by our garbage collector.
4. How does reference counting change
Above we saw what reference counting is. So when does this parameter change?
4.1 Case of reference count increment by one
-
Object created
a = "hello,world" Copy the code
-
Object referenced by another variable (assigned to a variable)
b = a Copy the code
-
Objects are placed as elements in containers (e.g. in lists)
list = [] list.append(a) Copy the code
-
The object is passed to the function as an argument
func(a) Copy the code
4.2 The reference count is reduced by one
-
The reference variable of the object is displayed destroyed
del a Copy the code
-
Object’s reference variable assignment refers to other objects
A = "hello, Python" # a's original reference object: a = "hello,world"Copy the code
-
The object is removed from the container, or the container is destroyed (e.g. the object is removed from the list, or the list is destroyed)
del list Copy the code
list.remove(a) Copy the code
-
A reference is out of scope
Func (): a = "hello,world" return func() #Copy the code
4.3 Viewing the Reference Count of an Object
To view the reference count of an object, you can use the getrefcount method provided by the built-in sys module.
import sys
a = "hello,world"
print(sys.getrefcount(a))
Copy the code
Note: When a reference is passed as an argument to getrefCount (), the argument actually creates a temporary reference. Therefore, getrefcount() returns one more than expected
5. Garbage collection mechanism
Python’s garbage collection mechanism has already been covered.
Python implements garbage collection by reference counting. When an object’s reference count reaches zero, garbage collection is performed. But using reference counting alone is a bit of a problem. As a result, Python introduced two mechanisms, mark-sweep and generational collection.
Python uses a strategy of reference counting as the primary mechanism, with mark-sweep and generational collection as the secondary.
Now that we know about reference counting, what about mark-sweep and generational collection?
5.1 Disadvantages of the reference Counting mechanism
Python’s default garbage collection mechanism is “reference counting,” which was first proposed by George E. Collins in 1960 and is still used by many programming languages 50 years later.
Reference counting method: Each object maintains an ob_refCNt field, which records the number of times the object is currently referenced. Its reference count ob_REFCNt increases by 1 each time a new reference refers to the object, and ob_REFCNt decreases by 1 each time the object’s reference count expires. Once the object’s reference count reaches zero, the object is immediately reclaimed. The memory occupied by the object will be freed.
Disadvantages:
- Extra space is required to maintain reference counts
- Unable to resolve the circular reference problem
What is a circular reference problem? Take a look at the following example
A = {" key ":" a "} # a dictionary object reference count: 1 b = {" key ":" b "} # dictionary object b reference count: 1 a (" b ") = b # dictionary object b reference counting: # 2 b/" a "= a dictionary object is a reference count: 2 del a # dictionary object a reference count: 1 del b # dictionary object B reference count: 1Copy the code
Look at the example above, where both objects are not freed even though both variables are deleted. The reason is that their reference count is not reduced to zero. Our garbage collection mechanism only releases objects when the reference count is zero. This is a fatal problem that cannot be solved. These two objects are never destroyed, resulting in a memory leak.
So how do you solve this problem? This is where mark-clear comes in handy. Tag clearing can handle this case of circular references.
5.2 Mark-sweep strategy
Python uses a mark-clear strategy to solve the problem of circular references that container objects can produce.
This strategy has two steps in garbage collection, which are as follows:
- When the object is reachable, the object is marked as reachable. When the object is reachable, the object is labeled as reachable.
- In the cleanup phase, objects are iterated again, and if an object is not marked as reachable, it is reclaimed
Here’s a quick overview of the mark-sweep strategy flow
Reachable (active) objects: Object nodes that have paths (via chained references) from root collection nodes
Unreachable (inactive) object: Object node that does not have a path (via chained reference) from the root collection node
Process:
- First, starting from the root collection node, all object nodes are traversed along the directed edge
- Mark each object individually as reachable or unreachable
- All nodes are iterated again, and all objects marked unreachable are garbage collected and destroyed.
Mark-sweep is a periodic strategy, equivalent to a scheduled task that scans at regular intervals.
In addition, mark-sweep suspends the entire application and resumes the application after the mark-sweep is complete.
5.3 Generational Recycling Policy
Generational recycling builds on the basis of tag cleanup, because our mark-cleanup strategy would block our program. To reduce application pauses, Python uses a “Generational Collection” strategy. Improve garbage collection efficiency by using space for time.
Generational garbage collection technology is a garbage collection mechanism developed in the early 1980s.
To put it simply: The older an object is, the less likely it is not garbage and the less it should be collected
Python divides memory into different sets based on the lifetime of objects. Each set is called a generation. Python divides memory into three “generations” : young (generation 0), middle (generation 1), and old (generation 2).
So when will generational recycling be triggered?
Gc.set_threshold (500, 5, 5) import gc print(gc.get_threshold()) #Copy the code
- 700: when the number of allocated objects reaches 700, generation 0 is reclaimed
- 10: A generation 1 collection is triggered after 10 generation 0 collections are performed
- 10: A generation 2 collection is triggered after 10 generation 1 collections are performed
5.4 the gc module
- Gc.get_count () : Gets the current automatic garbage collection counter, returning a list of length 3
- Gc.get_threshold () : Gets the frequency of automatic garbage collection in the GC module. Default is (700, 10, 10).
- Gc.set_threshold (threshold0[,threshold1,threshold2]) : Sets the frequency of automatic garbage collection
- Gc.disable () : Python3 enables GC by default. You can use this method to disable GC manually
- Gc.collect () : Manually invoke the garbage collection mechanism to collect garbage
In fact, since we have chosen Python, performance is not the most important thing. I’m sure most Python engineers haven’t even encountered performance problems yet, because the current machine performance can compensate. For memory management and garbage collection, Python provides a hands-off way to focus on the business layer, which is more in line with the idea that life is short. Why use python if I need to be as careful with memory management as C++? I thought we were just doing it for his convenience. So go ahead and do it. The sooner you leave work, the better!
Writing is not easy, but read and cherish. If there are mistakes, please accept and contact the author to modify, the content is for reference, if there is infringement, please contact the author to delete. If the article is helpful to you, please move your little hands, your support is my biggest motivation.