The article about the environment configuration can be calculated to go up is menstruation pasted, get online casually one search, have a lot of article. However, I felt it was worth noting my most recent setup of a deep learning environment, mainly because various software updates are fast, and the corresponding installation and configuration methods will change somewhat.

This deep learning environment configuration has two keywords, one is Docker VIRTUAL machine, the other is GPU acceleration.

Prior to the start

Docker virtual machine

Let’s start with the Docker virtual machine. Why do YOU need virtual machines? Have you ever seen an interesting open source project on Github, downloaded the code, compiled and run it according to the project’s instructions, and found that it never worked?

Or the other way around, you develop a great project, throw it on Github, and write out the build steps in as much detail as possible, and yet a bunch of developers issue issues saying that the code is compiling and running. You are also very innocent ah, clearly in my good here, how to others where the situation is 100?

Why does this happen? The main reason is that the software industry pays attention to fast iteration, fast moving forward, software will be constantly updated. In the case of TensorFlow, there have been multiple updates since its release. While as a software developer we do our best to ensure forward compatibility, it’s actually very difficult to achieve perfect compatibility. To address this compatibility issue, it is necessary to use virtual machines, and many open source projects now provide a virtual machine file that contains all the software packages and environments required for the project.

GPU acceleration

Let’s talk about GPU acceleration. Using Docker virtual machines solves the development environment problem, but introduces another problem: Virtual machines often can’t enable gpus. As we all know, deep learning is a computation-intensive application, especially in the training model stage, it often takes several hours or even dozens of days to train a model, and there is often a performance gap of dozens of times between open GPU and not open GPU. As a serious deep learning developer, it makes sense to use a high-performance computer with a GPU and turn on GPU support.

So the question is, how to enjoy the convenience of environmental isolation brought by Docker VIRTUAL machine while experiencing the performance improvement brought by GPU acceleration?

If there is a problem, someone will step up and offer a solution. Nvidia has a solution for its N card: the NVIDIa-Docker. Here’s how Nvidia’s configuration works.

The statement

Make the following statement before you begin:

  • This article is for Nvidia graphics card configuration instructions, if you are using ATI or other brands of graphics card, please go out to the right of Google

  • This article is for Ubuntu configuration instructions, this does not mean that other operating systems cannot be configured, if you are using another operating system, please search.

  • The practice environment of this article is Ubuntu 16.04 64-bit operating system and GTX 960 graphics card. Other versions of Ubuntu or other models of Nvidia graphics cards are also applicable in theory, but it is not 100% guaranteed, and some steps may need to be modified.

Install CUDA on the Host Host

Compute Unified Device Architecture (CUDA) is a general-purpose parallel computing Architecture introduced by NVIDIA, which enables Gpus to solve complex computing problems.

Does your graphics card support CUDA?

On Linux, you can run the lspci command:

Lspci | grep VGA 01:00. 0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1)Copy the code

As you can see, my graphics card is GeForce GTX 960. If you go to Nvidia’s CUDA GPUs page, you can find that almost all OF the N cards support CUDA. Naturally, my GeForce GTX 960 also supports CUDA.

Install the latest CUDA

CUDA versions are constantly being updated, and as of this writing, the latest version is 9.2. Of course, it’s ok to install the old version, but I always go with the latest version as a rule.

Follow the Nvidia installation instructions to perform the following operations:

Wget HTTP: / / https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.2.88-1_amd64.deb Sudo DPKG --install cuda-repo-ubuntu1604_9.2.88-1_amd64.deb sudo apt-key adv --fetch -- keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pubCopy the code

However, the following error is displayed during installation:

gpgkeys: protocol `https' not supportedCopy the code

The solution is also very simple, the required packaging on:

sudo apt install gnupg-curlCopy the code

Next, cudA can be installed like a normal Ubuntu package:

sudo apt-get update
sudo apt install cudaCopy the code

You can pour a cup of coffee and enjoy it. This step may take a little time, as there are approximately 3GB of software packages to download.

Updating environment variables

To avoid setting environment variables every time, you are advised to add the following environment variable Settings to the ~/.bashrc (or ~/.profile) file:

# for Nvidia CUDA export PATH=" /usr/local/cudA-9.2 /bin:$PATH" export LD_LIBRARY_PATH = "/ usr/local/cuda - 9.2 / lib64: $LD_LIBRARY_PATH." "Copy the code

For environment variables to take effect immediately, log out and then log in. Or run the following command:

source ~/.bashrcCopy the code

NVIDIA persistent daemon

This step does something I don’t quite understand, but basically the persistent daemon keeps the GPU initialized and CUDA tasks in state even if no client is connected to the GPU. The documentation calls for that, so we did it.

Start with /usr/lib/systemd/system:

sudo mkdir /usr/lib/systemd/systemCopy the code

Then add/usr/lib/systemd/system/nvidia – persistenced. Service file, its content is:

[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target

[Service]
Type=forking
PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid
Restart=always
ExecStart=/usr/bin/nvidia-persistenced --verbose
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced

[Install]
WantedBy=multi-user.targetCopy the code

Finally enable the service:

sudo systemctl enable nvidia-persistencedCopy the code

Disable certain UDEV rules

A certain udev rule (the interface between the physical device and the system) prevents NVIDIA drivers from working properly. To do this, edit /lib/udev/rules.d/40-vm-hotadd.rules to comment out the memory subsystem rules:

# SUBSYSTEM=="memory", ACTION=="add", DEVPATH=="/devices/system/memory/memory[0-9]*", TEST=="state", ATTR{state}="online"Copy the code

Verify that CUDA works

Restart the machine and try compiling CUDA examples to verify that CUDA is installed properly. Example CUDA code can be installed using the following command:

Cuda - install - samples - 9.1. Sh ~Copy the code

Where ~ means to install the code in the HOME directory, but you can install it anywhere else.

Next, compile the sample code:

CD ~ / NVIDIA_CUDA - 9.2 _Samples/makeCopy the code

You can have a cup of coffee again, which can take dozens of minutes, depending on your COMPUTER’s CPU.

Once compiled, run one of the sample programs:

./bin/x86_64/linux/release/deviceQuery | tail -n 1Copy the code

If Result = PASS is printed, CUDA is working properly.

Install NVIDIA Docker

First to join the list of NVIDIa-Docker packages:

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | \
    sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get updateCopy the code

Next, make sure you have the latest docker-CE installed on your machine, which means that if you have docker-engine, docker.io installed, you need to uninstall it first. Don’t worry, these are all members of the Docker family, just under different names at different times, and the latest Docker-CE is an update of these versions:

# remove all previous Docker versions sudo apt-get remove docker docker-engine docker.io # add Docker official GPG key curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - # Add Docker repository (for Ubuntu Xenial)  sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" sudo apt-get update  sudo apt install docker-ceCopy the code

With the latest Docker, finally to install Nvidia-Docker:

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerdCopy the code

Verify the nvidia – docker

Nvidia-docker is installed correctly, so how do you verify that nvidia-Docker is installed correctly?

We can launch the Docker image provided by Nvidia with a utility nvidia-SMI that monitors (and manages) the GPU:

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smiCopy the code

If the output looks like the following, it indicates that the Docker container GPU is enabled.

You can also do a test to see how far apart the CPU is from the GPU. Here is a benchmark script from LearningTensorflow.com:

import sys
import numpy as np
import tensorflow as tf
from datetime import datetime

device_name = sys.argv[1]  # Choose device from cmd line. Options: gpu or cpu
shape = (int(sys.argv[2]), int(sys.argv[2]))
if device_name == "gpu":
    device_name = "/gpu:0"
else:
    device_name = "/cpu:0"

with tf.device(device_name):
    random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
    dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
    sum_operation = tf.reduce_sum(dot_operation)

startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
        result = session.run(sum_operation)
        print(result)

# It can be hard to see the results on the terminal with lots of output -- add some newlines to improve readability.
print("\n" * 5)
print("Shape:", shape, "Device:", device_name)
print("Time taken:", str(datetime.now() - startTime))Copy the code

Create a Python file like this in your current directory :benchmark.py, and then launch the GPU-enabled TensorFlow Docker image. Run the tensorFlow program:

docker run \
    --runtime=nvidia \
    --rm \
    -ti \
    -v "${PWD}:/app" \
    tensorflow/tensorflow:latest-gpu \
    python /app/benchmark.py cpu 10000Copy the code

The preceding command is CPU version. After running the command, change the CPU parameter in the command to GPU and run the command again.

On my machine, the results are:

CPU: (' Time seems: ', '0:00:15. 342611') GPU: (' Time seems: ', '0:00:02. 957479')Copy the code

You might think that a difference of ten seconds is nothing? You know, that’s a seven-fold difference. It takes 24 hours to join your deep learning project with a GPU, but it takes a week without a GPU, which is a huge difference.

reference

  1. Using NVIDIA GPU within Docker Containers

  2. CUDA Quick Start Guide

  3. NVIDIA Container Runtime for Docker

  4. Docker for Ubuntu