Why is Python so slow?

Python is hot right now, being used in DevOps, data science, Web development, and security — but it has no advantage in speed.

How fast is Java compared to C, C++, C#, or Python? The answer depends a lot on what kind of application you need to run. There is no perfect performance test, but the Computer Language Benchmarks Game is a good one.

I’ve been talking about computer language reviews for about a decade. Python is one of the slowest languages compared to Java, C#, Go, JavaScript, C++, etc. This includes JIT (Just In Time) language compilers (C#, Java) and AOT (Ahead Of Time) language compilers (C, C++), as well as interpreted languages such as JavaScript.

Note: In this article, “Python” refers to a concrete implementation of the language, CPython. Other runs will also be mentioned in this article.

I would like to answer the following question: if Python takes two to ten times as long to do the same task as other languages, why is it slower and can it be faster?

Here are a few common reasons:

“Because it’s a GIL (global interpreter lock)”

Because it’s an interpreted language, not a compiled language.

“Because it’s dynamically typed.”

Which cause has the biggest impact on performance?

01

“Because it’s GIL”

Modern computer cpus have multiple cores and sometimes even multiple processors. To take advantage of all the computing power, the operating system defines an underlying structure called threads, and a single process (such as Chrome) can generate multiple threads through which to execute system instructions. Thus, if a process is using a lot of CPUS, the computing load is split among multiple cores, ultimately allowing most applications to complete tasks more quickly.

At the time of writing this article, MY Chrome browser has 44 threads open. In addition, the threading structure and apis of POSIX-based operating systems, such as Mac OS and Linux, are different from Windows. The operating system is also responsible for scheduling threads.

If you have never written a multithreaded program, you should be familiar with the concept of locking. Unlike a single-threaded process, in multithreaded programming you want to make sure that when you change a variable in memory, multiple threads are not trying to modify or access the same memory address at the same time.

CPython allocates memory when creating a variable, and then uses a counter to count the number of references to that variable. This concept is called reference counting. If the number of references is zero, the variable can be released from the system. This way, creating “temporary” variables (such as in the context of a for loop) does not consume the application’s memory.

The subsequent problem is that CPython needs to lock the reference counter if the variable is shared between multiple threads. There is a “global interpreter lock” that carefully controls thread execution. No matter how many threads there are, the interpreter can only perform one operation at a time.

How does this affect the performance of Python applications?

If the application is single-threaded, single-interpreter, this does not affect speed at all. Removing the GIL also does not affect code performance.

But if you want to use an interpreter (a Python process) to achieve concurrency through threads, and threads are IO intensive (that is, lots of network input/output or disk input/output), then the following GIL competition occurs:



From David Beazley “graphic GIL” penny: dabeaz.blogspot.com/2010/01/pyt…

If a Web application (such as Django) uses WSGI, each request to the Web application is executed by a separate Python interpreter, so there is only one lock per request. Because the Python interpreter starts slowly, some WSGI implementations support “daemon mode” to keep Python processes running for a long time.

What about other Python runtimes?

PyPy’s GIL is usually more than three times faster than CPython’s.

Jython has no GIL because Python threads in Jython are represented by Java threads and thus enjoy the benefits of a JVM memory management system.

How does JavaScript handle this problem?

First, all JavaScript engines use mark-sweep garbage collection algorithms. As mentioned earlier, the need for a GIL is primarily caused by CPython’s memory management algorithm.

JavaScript doesn’t have a GIL, but it’s also single-threaded, so it doesn’t need one at all. JavaScript’s time-loop and Promise/Callback patterns enable asynchronous programming instead of concurrent programming. Python can implement a similar pattern through asyncio’s event loop.

02

“Because it’s interpretive language.”

I’ve heard this argument a lot, and I find it oversimplifies how CPython actually works. When you write Python myscript.py on a terminal, CPython initiates a long list of operations, including reading, parsing, parsing, compiling, interpreting, and executing.

The point of this process is that it generates a. Pyc file at compile time, and the bytecode is written to a file under __pycache__/ (if Python 3), or to the same directory as the source code (Python 2). This is true not only of the scripts you write, but of all the code you import, including third-party modules.

So most of the time (unless you write code that runs only once), Python is interpreting bytecode and executing it locally. Compare with Java and C#.NET:

Java compiles the source code into an “intermediate language,” which the Java virtual machine reads bytecode and compiles instantaneously into machine code. .net CIL is the same..NET’s common language runtime (CLR) uses just-in-time compilation to compile bytecode into machine code.

So, given that they both use virtual machines and some form of bytecode, why is Python so much slower in performance tests than Java and C#? The first is that.NET and Java are just-in-time compiled (JIT).

Just-in-time compilation, or JUST-in-time, requires an intermediate language to split code into smaller chunks, or frames. Ahead of Time (AOT) is when the compiler translates the source code into code that the CPU can understand.

JIT by itself does not make execution faster because it executes the same bytecode sequence. However, JIT can make optimizations at run time. A good GIT optimizer finds the parts of your application that execute the most, called “hot spots.” Those bytecodes are then optimized and replaced with more efficient code.

That said, if your application does something over and over again, it will be much faster. Also, don’t forget that Java and C# are strongly typed languages, so the optimizer can make more assumptions about the code.

As mentioned earlier, PyPy has a JIT, so it is much faster than CPython.

So why isn’t CPython JIT?

JIT also has disadvantages: the first is startup speed. CPython is already slow to start, while PyPy is two to three times slower than CPython. Java virtual machines are also notoriously slow to start. The.NET CLR starts at system startup and thus avoids this problem, but this is due to the fact that the CLR and the operating system were developed by the same developers.

If you have a Python process that takes a long time to run, and the code contains “hot spots” that can be optimized, then JIT is fine.

However, CPython is a generic implementation. So if you were developing a command line program in Python, it would be extremely slow to wait for the JIT to call the CLI every time.

CPython tries to meet this requirement in most cases. There is a project to implement JIT in CPython, but that project has been discontinued for a long time.

If you want to enjoy the benefits of JIT and work on tasks that are JIT appropriate, use PyPy.

03

“Because it’s dynamically typed.”

“Statically typed” languages, such as C, C++, Java, C#, and Go, require that the type of a variable be specified when it is defined.

In dynamically typed languages, although there is a concept of type, the type of a variable is dynamic.

a = 1
a = "foo"
Copy the code

In this example, Python defines the second variable with the same name and type STR, while freeing the memory occupied by the first instance of A.

Statically typed languages are not designed to torture people, they are designed because that’s how cpus work. If any operation is ultimately to be converted to a simple binary operation, both the object and the type need to be converted to low-level data structures.

Python does all this for you, but you never cared, and you didn’t need to.

Not having to define types is not the reason Python is slow. Python is designed so that you can make everything dynamic. You can replace object methods at run time, and you can patch the underlying system calls at run time. Almost anything is possible.

This design makes Python optimization difficult.

To illustrate this point, I used a system call tracing tool for Mac OS called Dtrace. The release of CPython does not support DTrace, so CPython needs to be recompiled. Python 3.6.6 was used in the demo:

Unzip v3.6.6.zip CD v3.6.6. /configure --with-dtrace makeCopy the code

The python. exe code now contains the trace code for Dtrace. Paul Ross has a great talk on DTrace. It measures function calls, execution time, CPU time, system calls, various functions, and so on.

Sudo dtrace -s Toolkit /<tracer>. D -c '.. Retaining/python. Exe script. P y 'Copy the code

The Py_CallFlow tracker displays all of the application’s function calls.

So, does Python’s dynamic typing make Python slower?

Comparing and converting types can be costly. The type is checked every time a face is read, written, or referenced

Dynamically typed languages are difficult to optimize. The reason many of the languages that replace Python are fast is that they trade convenience for performance.

Cython, for example, optimizes code by combining C’s static typing with Python’s way of making the type known, achieving an 84-fold performance increase.

04

conclusion

The main reason Python is slow is because of its dynamics and diversity. It can be used to solve a variety of problems, but most problems have better and faster solutions.

But there are many optimizations for Python applications, such as using asynchro, understanding performance testing tools, and using multiple interpreters.

Consider using PyPy for applications where startup time is not important and where the code might benefit from JIT.

For performance-critical parts of your code, consider using Cython if the variables are mostly statically typed.

This article is from CDA Data Analyst, a partner of the cloud community. For more information, please follow CDA Data Analyst.