Memory problems can occur if there are a large number of active objects in memory during program execution, especially if the total amount of available memory is limited. In this article, we’ll discuss ways to shrink objects, drastically reducing the memory required by Python.
For simplicity, let’s take a Python structure that represents points, including x, Y, and Z coordinate values, which can be accessed by name. Dict In small programs, especially in scripts, it’s easy and convenient to use Python’s Dict to represent structural information:
>>> ob = {'x':1.'y':2.'z':3}
>>> x = ob['x']
>>> ob['y'] = y
Copy the code
Dict is more compact and popular because of its implementation in Python 3.6 with an ordered set of keys. But let’s look at how much space dict takes up in content:
>>> print(sys.getsizeof(ob))
240
Copy the code
As shown above, dict takes up a lot of memory, especially if there is a sudden virtual need to create a large number of instances:
Number of instances | Object size |
---|---|
1 000 000 | 240 Mb |
10, 000, 000 | 2.40 Gb |
100 000 000 | 24 Gb |
Some people want to encapsulate everything in a class, preferring to define structures as classes that can be accessed by attribute names:
"Have a problem and no one to answer it? We have created a Python learning QQ group: 857662006 to find like-minded friends and help each other. There are also good video tutorials and PDF e-books in the group. ' ' '
class Point:
#
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
>>> ob = Point(1.2.3)
>>> x = ob.x
>>> ob.y = y
Copy the code
The structure of class instances is interesting:
field | Size (bits) |
---|---|
PyGC_Head | 24 |
PyObject_HEAD | 16 |
__ weakref__ | 8 |
__ dict__ | 8 |
Total: | 56 |
In the above table, __ Weakref__ is a reference to the list, which is called weak reference to the object. The field __ dict__ is a reference to the instance dictionary of the class, which contains the value of the instance attribute (note that it takes up 8 bytes in the 64-bit reference platform). As of Python 3.3, the keys of dictionaries for all class instances are stored in shared space. This reduces the size of instances in memory:
>>> print(sys.getsizeof(ob), sys.getsizeof(ob.__dict__))
56 112
Copy the code
As a result, a large number of class instances take up less memory than regular dictionaries:
Number of instances | The size of the |
---|---|
1 000 000 | 168 Mb |
10, 000, 000 | 1.68 Gb |
100 000 000 | 16.8 Gb |
As you can see, the instance still takes up a lot of memory because of its large dictionary.
Class instance with __ slots__
To significantly reduce the size of class instances in memory, consider killing __dict__ and __weakref__. To do this, we can use __ slots__ :
"Have a problem and no one to answer it? We have created a Python learning QQ group: 857662006 to find like-minded friends and help each other. There are also good video tutorials and PDF e-books in the group. ' ' '
class Point:
__slots__ = 'x'.'y'.'z'
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
>>> ob = Point(1.2.3)
>>> print(sys.getsizeof(ob))
64
Copy the code
In this way, objects in memory become significantly smaller:
field | Size (bits) |
---|---|
PyGC_Head | 24 |
PyObject_HEAD | 16 |
x | 8 |
y | 8 |
z | 8 |
A total of: | 64 |
With the use of slots in the class definition, the memory occupied by a large number of instances is significantly reduced:
Number of instances | The size of the |
---|---|
1 000 000 | 64 Mb |
10, 000, 000 | 640 Mb |
100 000 000 | 6.4 Gb |
Currently, this is the main way to reduce the memory footprint of class instances. In memory, the object’s title is followed by references to the object (that is, property values), which can be accessed using special descriptors in the class dictionary:
"Have a problem and no one to answer it? We have created a Python learning QQ group: 857662006 to find like-minded friends and help each other. There are also good video tutorials and PDF e-books in the group. ' ' '
>>> pprint(Point.__dict__)
mappingproxy(
....................................
'x': <member 'x' of 'Point' objects>,
'y': <member 'y' of 'Point' objects>,
'z': <member 'z' of 'Point' objects>})
Copy the code
To automate the process of creating classes with __ slots__, you can use the library namedList (pypi.org/project/nam… The slots__ function creates a class with slots__ :
>>> Point = namedlist('Point', ('x'.'y'.'z'))
Copy the code
There is also a package attrs (pypi.org/project/att… Slots__ can use this package to automatically create classes.
tuples
Python also has a built-in tuple type, which represents unmodifiable data structures. A tuple is a fixed structure or record, but it does not contain field names. You can use the field index to access the fields of a tuple. When a tuple instance is created, the tuple’s fields are associated with the value object at once:
"Have a problem and no one to answer it? We have created a Python learning QQ group: 857662006 to find like-minded friends and help each other. There are also good video tutorials and PDF e-books in the group. ' ' '
>>> ob = (1.2.3)
>>> x = ob[0]
>>> ob[1] = y # ERROR
Copy the code
Tuple instances are very compact:
>>> print(sys.getsizeof(ob))
72
Copy the code
Since the memory tuple also contains the number of fields, it takes up 8 bytes of memory, more than the class with slots:
field | Size (bytes) |
---|---|
PyGC_Head | 24 |
PyObject_HEAD | 16 |
ob_size | 8 |
[0] | 8 |
[1] | 8 |
[2] | 8 |
A total of: | 72 |
Naming a tuple
Because tuples are so widely used, one day you’ll need to access them by name. To meet this requirement, you can use the module collections.namedtuple. The namedTuple function generates this class automatically:
>>> Point = namedtuple('Point', ('x'.'y'.'z'))
Copy the code
The code above creates a subclass of tuples that also defines descriptors that access fields by name. For the example above, access is as follows:
"Have a problem and no one to answer it? We have created a Python learning QQ group: 857662006 to find like-minded friends and help each other. There are also good video tutorials and PDF e-books in the group. ' ' '
class Point(tuple):
#
@property
def _get_x(self):
return self[0]
@property
def _get_y(self):
return self[1]
@property
def _get_z(self):
return self[2]
#
def __new__(cls, x, y, z):
return tuple.__new__(cls, (x, y, z))
Copy the code
All instances of this class take up exactly the same amount of memory as tuples. However, a large number of instances can take up slightly more memory:
Number of instances | The size of the |
---|---|
1 000 000 | 72 Mb |
10, 000, 000 | 720 Mb |
100 000 000 | 7.2 Gb |
Record class: A mutable named tuple without a cyclic GC
Because tuples and their corresponding named tuple classes can produce immutable objects, object values like ob.x cannot be assigned to other values, so modifiable named tuples are sometimes required. Since Python has no built-in type equivalent to a tuple that supports assignment, many ideas have been tried. Here we discuss the recordclass (recordclass, pypi.org/project/rec… StackoverFlow acclaimed (stackoverflow.com/questions/2…
In addition, it can reduce the amount of memory used by objects to similar levels as tuple objects.
The recordClass package introduces the type recordClass.mutableTuple, which is almost equivalent to a tuple, but it supports assignment. It creates subclasses that are almost identical to NamedTuple, but allows assigning new values to attributes (without creating new instances). The recordClass function, like the namedTuple function, creates these classes automatically:
>>> Point = recordclass('Point', ('x'.'y'.'z'))
>>> ob = Point(1.2.3)
Copy the code
The structure of a class instance is also similar to a tuple, but without PyGC_Head:
field | Size (bytes) |
---|---|
PyObject_HEAD | 16 |
ob_size | 8 |
x | 8 |
y | 8 |
z | 8 |
A total of: | 48 |
By default, the recordClass function creates a class that does not participate in the garbage collection mechanism. In general, both NamedTuple and RecordClass can generate classes that represent records or simple data structures (that is, non-recursive structures). Using both correctly in Python does not create circular references. Thus, recordClass generated class instances do not by default contain the PyGC_Head fragment (which is a required field to support the circular garbage collection mechanism, or more precisely, in the PyTypeObject structure that creates the class, The FLAGS field is not set to Py_TPFLAGS_HAVE_GC by default.
A large number of instances take up less memory than a class instance with __ slots__ :
Number of instances | The size of the |
---|---|
1 000 000 | 48 Mb |
10, 000, 000 | 480 Mb |
100 000 000 | 4.8 Gb |
dataobject
The other basic idea behind the recordClass library is that the memory structure uses the same structure as the class instance with slots, but does not participate in the circular garbage collection mechanism. This class can be generated by the recordclass.make_dataclass function:
>>> Point = make_dataclass('Point', ('x'.'y'.'z'))
Copy the code
Classes created this way generate modifiable instances by default. Another way to do this is to inherit from recordClass.dataObject:
class Point(dataobject):
x:int
y:int
z:int
Copy the code
This method creates class instances that do not participate in the circular garbage collection mechanism. The in-memory instance has the same structure as the class with slots, but no PyGC_Head:
field | Size (bytes) |
---|---|
PyObject_HEAD | 16 |
ob_size | 8 |
x | 8 |
y | 8 |
z | 8 |
A total of: | 48 |
>>> ob = Point(1.2.3)
>>> print(sys.getsizeof(ob))
40
Copy the code
If you want to access a field, you need to use a special descriptor to represent the offset from the beginning of the object in the class dictionary:
"Have a problem and no one to answer it? We have created a Python learning QQ group: 857662006 to find like-minded friends and help each other. There are also good video tutorials and PDF e-books in the group. ' ' '
mappingproxy({'__new__': <staticmethod at 0x7f203c4e6be0>,...'x': <recordclass.dataobject.dataslotgetset at 0x7f203c55c690>,
'y': <recordclass.dataobject.dataslotgetset at 0x7f203c55c670>,
'z': <recordclass.dataobject.dataslotgetset at 0x7f203c55c410>})
Copy the code
The amount of memory consumed by a large number of instances is minimal in a CPython implementation:
Number of instances | The size of the |
---|---|
1 000 000 | 40 Mb |
10, 000, 000 | 400 Mb |
100 000 000 | 4.0 Gb |
Cython
There is also a Cython (cython.org/) based scheme. The advantages of the scheme… C language atomic type. Descriptors for access fields can be created in pure Python. Such as:
"Have a problem and no one to answer it? We have created a Python learning QQ group: 857662006 to find like-minded friends and help each other. There are also good video tutorials and PDF e-books in the group. ' ' '
cdef class Python:
cdef public int x, y, z
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
Copy the code
In this example, the instance takes up less memory:
>>> ob = Point(1.2.3)
>>> print(sys.getsizeof(ob))
32
Copy the code
The memory structure is as follows:
field | Size (bytes) |
---|---|
PyObject_HEAD | 16 |
x | 4 |
y | 4 |
z | 4 |
nycto | 4 |
A total of: | 32 |
The amount of memory consumed by a large number of copies is also small:
Number of instances | The size of the |
---|---|
1 000 000 | 32 Mb |
10, 000, 000 | 320 Mb |
100 000 000 | 3.2 Gb |
However, remember that when accessing from Python code, each access raises a conversion between an int and a Python object.
Numpy
Using multi-dimensional arrays or record arrays with a lot of data can take up a lot of memory. However, in order to effectively manipulate data with pure Python, you should use the functions provided by the Numpy package.
>>> Point = numpy.dtype(('x', numpy.int32), ('y', numpy.int32), ('z', numpy.int32)])
Copy the code
An array of N elements initialized to zero can be created using the following function:
>>> points = numpy.zeros(N, dtype=Point)
Copy the code
Memory footprint is minimal:
Number of instances | The size of the |
---|---|
1 000 000 | 12 Mb |
10, 000, 000 | 120 Mb |
100 000 000 | 1.2 Gb |
In general, accessing array elements and rows causes conversions between Python objects and C int values. If you get a row of results from the generated array that contains one element, the memory is less compact:
>>> sys.getsizeof(points[0])
68
Copy the code
Therefore, as mentioned above, you need to use the functions provided by the Numpy package to handle arrays in Python code.
In conclusion, we used a straightforward example to demonstrate that developers and users of the Python language (CPython) community can actually reduce the amount of memory that objects consume.