Selected from pythonfiles

Heart of the machine compiles

Participation: Panda

Hunting Performance in Python Code The Python Files blog has published a series of articles entitled “Hunting Performance in Python Code” that explore ways to improve the Performance of Python Code. In each of these articles, the authors describe several tools and profilers available for Python code, and how they can help you better find bottlenecks in the front end (Python scripts) and/or back end (Python interpreters). Heart of the Machine has compiled and edited this series of articles into one long, in-depth article. The code for this article has been posted on GitHub.

Code address: github.com/apatrascu/h…


Part ONE looks at from environment Settings to memory analysis. Here is the second part of Python code optimization, which focuses on Python scripts and the Python interpreter. In this section we’ll start by looking at how to track CPU usage for Python scripts, focusing on cProfile, Line_profiler, pProfile, and vprof. The latter section focuses on tools and methods that can be used to analyze the interpreter’s performance while running Python scripts, focusing on CPython and PyPy, among others.


CPU analysis — Python scripts

In this section, I’ll introduce some tools that help us solve the conundrum of parsing CPU usage in Python.

CPU performance profiling means analyzing the performance of code by analyzing how it is executed by the CPU. That means finding the hot spots in our code and seeing what we can do with them.

Next we’ll look at how you can track the CPU usage of your Python scripts. We will focus on the following profilers:

  • cProfile
  • line_profiler
  • pprofile
  • vprof


Measuring CPU usage

In this section, I will use the same with the previous section to the script, you can also view on making: gist.github.com/apatrascu/8…

Also, remember that on PyPy2 you need to use a version of PIP that supports it:

Other things can be installed with the following command:

cProfile

One of the most commonly used tools for CPU performance analysis is cProfile, mainly because it is built into CPython2 and PyPy2. This is a deterministic analyzer, which means it collects a number of statistics as it runs our load, such as how many times or how long each part of the code executes. In addition, cProfile is less expensive to the system than other built-in profilers (Profile or Hotshot).

When using CPython2, its usage is fairly simple:

If you’re using PyPy2:

Its output is as follows:

Even with this text output, we can see directly that most of the time our script is calling the list.append method.

If we use Gprof2dot, we can graphically view the output of cProfile. To use this tool, we first have to install Graphviz. On Ubuntu, you can use the following commands:

Run our script again:

We should then get the following output.png file:

It makes it look so much easier. Let’s take a closer look at what it outputs. You can see the callgraph from the script. In each box, you can see it line by line:

  • First line: Python filename, line number, and method name
  • Line 2: The ratio of the time spent in this box to the global time
  • Line 3: In parentheses is the ratio of the time used by the method itself to the global time
  • Line 4: Number of calls

For example, in the third red box from top to bottom, the method primes takes up 98.28% of the time, 65.44% of the time doing something in the method, and it is called 40 times. The rest of the time is spent in Python’s list.append (22.33%) and range (11.51%) methods.

This is a simple script, so we just need to rewrite our script so that it doesn’t use as many append methods, resulting in the following:

The following tests the runtime of the script before and after using CPython2:

Measurement with PyPy2:

We got a 2.4-fold improvement on CPython2 and a 3.1-fold improvement on PyPy2. Very good, its cProfile call diagram:

You can also view cProfile programmatically:

This is useful in scenarios such as multi-process performance measurements. More details, please refer to: docs.python.org/2/library/p…


line_profiler

This analyzer can provide line-by-line load information. This is implemented in Cython via THE C language with less computational overhead than cProfile.

The source code is available at GitHub: github.com/rkern/line_… , PyPI page as: pypi.python.org/pypi/line_p… . Compared to cProfile, it has considerable overhead and takes 12 times longer to get an analysis result.

To use this tool you first need to add via PIP: PIP install PIP install Cython ipython==5.4.1 line_profiler (CPython2). A major disadvantage of this analyzer is that it does not support PyPy.

Just as with Memory_profiler, you need to add a decoration to the function you want to analyze. In our example, you need to prefix @profile to the definition of the primes function in 03.primes-v1.py. Then call it like this:

You get an output that looks like this:

We can see that two loops repeatedly call list.append, taking up most of the script’s time.


pprofile

Address: github.com/vpelletier/…

According to the authors, PProfile is a “line-grained, thread-aware deterministic and statistical pure Python parser.”

It’s inspired by Line_profiler and fixes a lot of bugs, but because it’s written entirely in Python, it can also be used with PyPy. The analysis takes 28 times longer with CPython and 10 times longer with PyPy than with cProfile, but with a much more granular level of detail.

It also supports PyPy! In addition, it supports thread analysis, which is useful in many situations.

To use this tool, you first need to install PIP: PIP install pprofile (CPython2)/pypy -m PIP install pprofile (pypy) and then call it like this:

The output differs from that of the previous tool as follows:

We can now see the finer details. Let’s look at this output a little bit. This is the entire output of the script, and on each line you can see the number of calls, the time it took to run it in seconds, the time per call, and the percentage of the global time. In addition, pProfile adds additional lines to our output (such as lines 44 and 50, preceded by (call)), which are cumulative metrics.

Also, we can see that there are two loops that call list.append repeatedly, taking up most of the script’s time.


vprof

Address: github.com/nvdv/vprof

Vprof is a Python profiler that provides rich interactive visualizations of various Python program features, such as runtime and memory usage. This is a graphical tool, based on Node.js, that displays results on a web page.

With this tool, you can view one or more of the following for the relevant Python script:

  • CPU flame graph
  • Code Profiling
  • Memory graph
  • Code HeatMap

To use this tool, you first need to install via PIP: PIP install vprof (CPython2)/pypy -m PIP install vprof (pypy) and then call it like this:

On CPython2, to display the code heat map (called on the first line below) and code analysis (called on the second line below) :

On PyPy, to display the code heat map (called on the first line below) and code analysis (called on the second line below) :

In both examples above, you will see the following code heat map:

And the following code analysis:

The results are presented graphically, and you can hover over or click on each line to see more information. Also, we can see that there are two loops that repeatedly call the list.append method, taking up most of the script’s time.


CPU analysis – Python interpreter

In this section, I’ll look at some of the tools and methods you can use to analyze the interpreter’s performance while running Python scripts.

As mentioned in the previous sections, the meaning of CPU performance analysis is the same, but for now we are not targeting Python scripts. We now want to know how the Python interpreter works and where Python scripts spend the most time running.

Next we’ll see how you can track CPU usage and find hot spots in the interpreter.


Measure CPU usage

Script used by this section basically and the memory and CPU usage script analysis used in the script, you can also refer to the code here: gist.github.com/apatrascu/4…

The optimized version see below or visit: gist.github.com/apatrascu/e…


CPython

CPython has a lot of functionality, written entirely in C, so it can be much easier to measure and/or analyze performance. You can find CPython resources hosted on GitHub: github.com/python/cpyt… . By default, you’ll see the latest branch, version 3.7+ at the time of this writing, but you can find branches all the way up to version 2.7.

In this article, we focus on CPython 2, but the latest version 3 can successfully apply the same steps.


1. Code Coverage Tool

The easiest way to see what part of C code is running is to use the code coverage tool.

First let’s clone this codebase:

Copy the script from that directory and run the following command:

The first line of code will use GCOV support (gcc.gnu.org/onlinedocs/…) Compile the interpreter, the second line runs the load and collects the analysis data in the.gcda file, and the third line parses the files that contain the analysis data and creates some HTML files in a folder called lCOV-Report.

If we open index.html in a browser, we see the location of the interpreter source code to execute in order to run our Python script. You’ll see something like the following:

In the previous layer, we can see each directory that makes up the source code and the amount of code covered. For example, let’s open the listObject.c.cov.html file from the Objects directory. Although we will not read all of these documents, we will analyze some of them. Look at this part down here.

How do you read the message? In the yellow column, you can see the number of lines of C file code. The next column is the number of times a particular line of code was executed. The right-hand column is the actual C source code.

In this example, the listiter_next method is called 60 million times.

How do we find this function? If we take a closer look at our Python script, we can see that it uses a lot of list iteration and append. (This is another place to start scripting optimizations.)

Let’s move on to some other specialized tools. On Linux systems, if we want more information, we can use perf. Official document can refer to: perf.wiki.kernel.org/index.php/M…

We rebuilt the CPython interpreter using the following code. You should download the Python script to the same directory. Also, make sure your system has PERF installed.

Run perf as follows. Use the perf more ways we can see Brendan Gregg wrote this: www.brendangregg.com/perf.html

After running the script, you should see the following:

To see the results, run sudo Perf Report to get metrics.

Only the most relevant calls will be retained. In the screenshot above, we can see that the one that takes the most time is PyEval_EvalFrameEx. This is the main interpreter loop, which we don’t care about in this case. We are interested in the next time-consuming function, listiter_next, which takes 10.70% of the time.

After running the optimized version, we can see the following results:

After our optimization, the time footprint of the listiter_next function was reduced to 2.11%. Readers can also explore further optimizations to the interpreter.


2. Valgrind/Callgrind

Another tool that can be used to find bottlenecks is Valgrind, which has a plug-in called CallGrind. For more details: valgrind.org/docs/manual…

We rebuilt the CPython interpreter using the following code. You should download the Python script to the same directory. Also, make sure valGrind is installed on your system.

Run ValGrind as follows:

The results are as follows:

We use KCacheGrind a visualization: kcachegrind.sourceforge.net/html/Home.h…

PyPy

On PyPy, the number of profilers that can be used successfully is very limited. PyPy vmprof developers to develop the tools: vmprof. Readthedocs. IO/en/latest /

First, you need to download PyPy: pypy.org/download.ht… . After that, enable PIP support for it.

Installing VMProf is as simple as running the following code:

Run the workload as follows:

Then open the link that appears in the console in your browser (the link that begins with vmprof.com/#/).


Original link:

Pythonfiles.wordpress.com/2017/06/01/…

Pythonfiles.wordpress.com/2017/08/24/…