Github address of Blog:Github.com/liuyan731/b…


Two Machine learning Python services have recently been launched, but the performance online is not particularly good, and the CPU load is very high. It is then discovered that the installed Tensorflow does not use the AVX instruction set. Today I will share debugging methods and solutions.

The AVX instruction set warning is not used

Recently, we found that the online Tensorflow Python service has high CPU consumption and poor service performance. Then we analyzed the cause and used the Timeline tool on the test server to check the predicted operation time as shown in the figure below:

It is found that the whole calculation process consumes too much time (30ms). Compared with the local machine (GPU acceleration, server machine is only used to provide services without GPU), the time is greatly different (1ms, as shown below).

At the same time, it is found that the names of op on the two timeline are different, and it is acted as _MklConv2DWithBias and _MklRelu on the test server, and as Conv2D and Relu on the local machine.

Import warning that Intel AVX and SSE instructions are not used when running Tensorflow

Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

TensorFlow Library compiled to use SSE4.1 instructions but these are available on your machine and could speed up CPU computations.

TensorFlow Library compiled to use SSE4.2 instructions but these are available on your machine and could speed up CPU computations.

The TensorFlow library wasn’t compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.

Because this is only warning and does not affect the operation of the program, I did not pay attention to it. At the same time, I was looking for a solution on the Internet, and there were many “suggestions” to ignore or directly turn off this warning.

Considering the AVX/SSE instruction set may improve performance, so to recompile Tensorflow installation, compile the installation can be reference: stackoverflow.com/questions/4… The process is still a bit complicated and requires bazel.

However, I didn’t actually compile and install orZ because WE found that Tensorflow 1.6.0 was already precompiled using the AVX instruction set, as shown below:

But why is there a warning?

PIP inconsistent with Tensorflow installed by Conda?

After discussion with another student in the group, I found that tensorflow1.6.0 he used did not have avx warning. After further questioning, I found that he used PIP (ali cloud) to install, while I used conda (tsinghua image) to install.

Since conda and PIP images may be inconsistent, uninstall Tensorflow from conda and re-install PIP. Avx warning disappears.

Re-run the service and print the timeline as shown below:

Overall time is down to 2ms, performance is greatly improved, while CPU load is greatly reduced! At this point the performance is optimized to a reasonable level.

So at least for Tensorflow 1.6.0 the PIP installation is inconsistent with the Conda installation.

ps

  • Error: ImportError: /usr/lib64/libstdc++. So.6: version cxxabi_1.3.7 ‘not found
  • AVX: Advanced Vector Extensions(AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later on by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.
  • SSE: In computing, Streaming SIMD Extensions (SSE) is an SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of processors shortly after the appearance of AMD’s 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.

supplement

TF Timeline module

Reference: Tensorflow Timeline introduction and simple use

PIP uses domestic images

mkdir .pip
vi .pip/pip.conf
 
[list]
format=columns
 
[global]
index-url = http://mirrors.aliyun.com/pypi/simple/
 
[install]
trusted-host=mirrors.aliyun.com
Copy the code

thinking

Warning in the program is also need to cause enough attention, maybe there is a pit…

2018/4/22 done

This article is also synchronized toPersonal Github blog