Python is inefficient!
Python takes up memory!
Python sucks!
.
As one of the most popular programming languages in recent years, Python has received as much controversy as praise.
Both Python and non-Python developers are used to passing problems off to Python as objective.
“Why is this project taking so long?”
“Python.”
I would say that Python is taking a lot of responsibility for a lot of developers.
Indeed, Python is not comparable in terms of memory utilization and execution efficiency compared to compile-based languages such as C/C++ and Java. But it is not as bad as most portray it.
Perhaps, there are a lot of students would like to, as for a little bit of memory cost so much effort?
As for!
For example, in the process of going out to spend money, you may think that 5 yuan or 10 yuan each time is insignificant and unimportant, but when you still spend money at the end of each month, you will find that you have spent thousands of yuan.
The same goes for development, where maybe only 50 bytes are saved between one instance, but 100,000,000 bytes? That saves up to 5 gigabytes of memory!
Here’s a step-by-step guide to Python’s memory benefits that will take you from bronze to king in just one article!
The dictionary
Dictionaries are a double-edged sword for Python.
Dictionaries have a much higher place in Python than they do in Java or C++. Dictionaries are popular in Python because they are easy to create, delete, modify, and read.
I’ve seen a lot of my colleagues and GitHub open source code, and dictionaries are used very often in Python.
However, many people forget that dictionaries have an obvious disadvantage — they consume memory.
Even many programmers who have been working on Python for years are unaware of the problem or looking for alternatives.
Let’s look at an example,
>>> exm = {"x": 1, "y": 2, "z": 3}
>>> print(sys.getsizeof(exm))
240
Copy the code
Use the dictionary directly, occupying 240 bytes of memory.
class
Many developers are used to using Python dictionaries or have no concept of the 240byte mentioned earlier.
So, let’s store the data as a class. This approach is common in many open source Java and C++ code.
Let’s use a class to implement the following dictionary equivalent structure,
class Shape:
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
exm = Shape(1, 2, 3)
Copy the code
In this case, exm.x functions like exm[“x”] in the dictionary and can be used to access the corresponding data.
So let’s see how much memory it takes up,
>>> print(sys.getsizeof(exm) + sys.getsizeof(exm.__dict__))
168
Copy the code
By implementing classes, one instance can save 72 bytes of memory over the dictionary!
The dictionary approach takes up 1.4 times more memory than the class approach.
Since __dict__ is used in Python classes to store the value of the class’s attributes, memory is heavy. There are 168 bytes of memory, 56 bytes from the exM instance and 112 bytes from __dict__.
namedtuple
In addition to dictionaries, tuples are common data structures in Python.
Tuples can be used to store data and can be evaluated by index instead of key, but they do not have key-value information. I think a lot of dictionary usage is due to the preference for key-value pairs in Python dictionaries, which make it easier to evaluate.
In fact, Python’s built-in Collections provides namedtuple as a dictionary replacement.
from collections import namedtuple
Shape = namedtuple('Shape', ['x', 'y', 'z'])
exm = Shape(1, 2, 3)
Copy the code
In this case, we can evaluate it by exm.x. Let’s look at its memory usage.
>>> print(sys.getsizeof(exm))
72
Copy the code
Namedtuple saves 96 bytes compared to the class-implemented method, and 168 bytes compared to the dictionary.
The dictionary approach takes up 3.3 times more memory than namedTuple.
slots
__slots__ is a relatively common method of memory optimization in Python. Compared with the previous class implementation, __slots__ can deterministic specify which class attributes can be accessed, and does not require __dict__ to store the value of the class attributes. Therefore, memory optimization can be further improved compared to the pure class implementation.
class Shape:
__slots__ = 'x', 'y', 'z'
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
exm = Shape(1, 2, 3)
Copy the code
Now let’s look at its memory footprint,
>>> print(sys.getsizeof(exm))
64
Copy the code
Compared to dictionaries, it saves a whopping 176 bytes.
The dictionary approach has 3.8 times the memory footprint of namedTuple.
recordclass
While the previous methods were implemented using Python built-in methods or modules, here’s a way to optimize memory using third-party packages.
The recordClass package introduces the recordClass.mutableTuple type, which is almost identical to tuple tuple, but supports assignment. From there, subclasses are created that are almost identical to Namedtuples, but allow new values to be assigned to fields (without creating new instances).
from recordclass import recordclass
Shape = recordclass('Shape', ('x', 'y', 'z'))
exm = Shape(1, 2, 3)
print(exm.x)
Copy the code
Let’s take a look at its memory footprint,
>>> print(sys.getsizeof(exm))
48
Copy the code
Compared to dictionaries, it saves 192 bytes.
The dictionary approach takes up five times the memory footprint of namedTuple.
Dataclass
Recordclass also provides another, more memory-saving solution.
Uses the same storage structure in memory as in class instances with __slots__, but does not participate in the circular garbage collection mechanism.
from recordclass import make_dataclass
Shape = make_dataclass('Shape', ('x', 'y', 'z'))
exm = Shape(1, 2, 3)
print(exm.x)
Copy the code
So let’s look at the memory footprint,
>>> print(sys.getsizeof(exm))
40
Copy the code
Compared to dictionaries, it saves up to 200 bytes.
The dictionary approach takes up 6 times more memory than namedTuple.
Cython
Finally, use Cython.
Cython is Python with C data types, so arguments and variables can be declared as C data types, which can be further optimized in memory.
However, the Cython approach is a bit more complicated than the previous approaches. First you need to write the core logic in a.pyx file, and then you need to compile it into a.so or.pyd file. Finally, import the module through import in another Python file.
Let’s look at an example.
# Example.pyx
cdef class Shape:
cdef public int x, y, z
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
Copy the code
And then, if I write the build file,
# setup.py
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize("cc.pyx")
)
Copy the code
Execute setup.py from the command line to compile the file,
$ python setup.py build_ext --inplace
Copy the code
A shape. pyd file is generated on Windows and a shape. so file is generated on Linux and macOS. Here is just an introduction, for use is not important.
Then call the compiled file.
import Shape
exm = Shape(1, 2, 3)
print(exm.x)
Copy the code
Let’s take a look at its memory footprint,
>>> print(sys.getsizeof(exm))
32
Copy the code
Compared to dictionaries, it saves up to 208 bytes.
The dictionary approach has 7.5 times the memory footprint of namedTuple.
A thousand miles is a short step.
This is also true in programming development, where we tend to think that 208 bytes are tiny and irrelevant to the memory of today’s computers. Perhaps, in some companies or departments with deep pockets, it is easy to provide a cluster for developers to use. However, memory is capped and cannot be used intemperately. However, every little makes a mickle, and in a large project with a large amount of data, if such optimization is carried out, memory optimization is very considerable.
In this example, many Python developers are used to using dictionaries as data structures and are not concerned with memory footprint. However, you will find that by using Cython, the memory of an instance can be optimized from 240 bytes to 32 bytes, reducing the memory footprint by up to 86.7%!
Dry recommended
In order to facilitate everyone, I spent half a month’s time to get over the years to collect all kinds of iso technical finishing together, content including but not limited to, Python, machine learning, deep learning, computer vision, recommendation system, Linux, engineering, Java, content of up to 5 t +, I put all the resources download link to a document, The directory is as follows:
All dry goods to everyone, hope to be able to support it!
https://http://pan.baidu.com/s/1eks7CUyjbWQ3A7O9cmYljA (code: 0000)