This article provides a minimalist step – by – step installation of CUDA&Python&Pytorch based on Ubuntu18.04, Ubuntu16, and Windows10.

To prepare

Example environment:

Ubuntu18.04

The following tools need to be installed:

  1. Nvidia driver (connect GPU to host)
  2. CudaToolKit (GPU accelerated dependency)
  3. Miniconda (Installing Python and managing the Python environment)
  4. Pytorch, TensorFlow, and MXNet GPU versions installed

 

operation

 

Install the Nvidia drive

Ubuntu18.04: Bash run

sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub sudo apt-get update wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubunt U1804_1.0.0-1_amd64. deb sudo apt install./ nvidia-machine-learn-repo -ubuntu1804_1.0.0-1_amd64.deb sudo apt get update u1804_1.0.0-1_amd64.deb sudo apt get update u1804_1.0.0-1_amd64.deb sudo apt get install  sudo apt-get install --no-install-recommends nvidia-driver-450# Reboot. Check that GPUs are visible using the command: nvidia-smi
Copy the code

Ubuntu16.04: Bash run

sudo apt-get install gnupg-curl sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub sudo apt-get update wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubunt U1604_1.0.0 -1_amd64.deb sudo apt install./ nvidia-machine-learn-repo -ubuntu1604_1.0.0-1_amd64.deb sudo apt get update  sudo apt-get install --no-install-recommends nvidia-418# Reboot. Check that GPUs are visible using the command: nvidia-smi
Copy the code

Then restart the system and run nvidia-SMI to check whether the following interface is correctly displayed: The following interface is only an example, and the version later than Nvidia-SMI should be the same as the one installed above (Nvidia-driver-450).

 

Install CudaToolKit

Take cuda11.0 and Ubuntu18.04 as examples

1. Baidu search: CUDa11.0

Click the first link to enter (Internet address) if you cannot open please science: CUDA Toolkit 11.0 Download | NVIDIA Developer

Ubuntu-> x86_64-> deb(local)

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda-repo-ubuntu1604-11-0-local_11.0.2-450.51. 05-1_amd64.deb sudo DPKG -I cuda-repo- Ubuntu1604-11-0-local_11.0.2-450.51.05-1_amd64. deb sudo apt-key add /var/cuda-repo-ubuntu1604-11-0-local/7fa2af80.pub sudo apt-get update sudo apt-get -y install cudaCopy the code

Ubuntu18.04 ->deb(local); copy Installation Instructions using Bash.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.2-450.51. 05-1_amd64.deb sudo DPKG -I cuda-repo-ubuntu1804-11-0-local_11.0.2-450.51.05-1_amd64.deb sudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub sudo apt-get update sudo apt-get -y install cudaCopy the code

Song said: This step will be the main problem is the third step, this step is to download the complete installation package, about 2GB. Familiar with Bash commands a friend can know this is download http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda-repo-ubuntu1804-11-0 -local_11.0.2-450.51.05-1_amd64.deb This link file instruction. In this step, you can use tools such as Thunderbolt to copy the link to speed up the download, and then copy it to the bash running path (step 3 does not need to perform under bash), and then perform step 4 installation.

 

Miniconda installation

It is recommended to use tsinghua source, address: mirrors.tuna.tsinghua.edu.cn/anaconda/mi…

At the end of the page, download the latest version of the corresponding system (note to select the suffix x86_64) : miniconda3-PY38_4.9.2 – linux-x86_64. sh

Run the following command under bash:

Bash Miniconda3 - py38_4. 9.2 - Linux - x86_64. ShCopy the code

If the option is Yes, the other default is ok. After the installation is complete, create a Terminal and conda will take effect.

 

Pytorch installed with the GPU version of TensorFlow

See: PIP Conda replacement for Windows and Ubuntu

1. First, change conda and PIP to domestic source to speed up bash execution

#pip
pip install pip -U
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

#conda
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --append channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/fastai/
conda config --append channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --append channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda/
 

conda config --set show_channel_urls yes
Copy the code

CondaHTTPError: HTTP 000 CONNECTION FAILED

Solve the Problem that CondaHTTPError: HTTP 000 CONNECTION FAILED on Windows and Ubuntu

2. Create a deep learning Python environment using conda.

Conda create -n dl_py37 python=3.7Copy the code

If (base) root@b9fc5be9c7f1:~# -> (dl_py37) root@b9fc5be9c7f1:~#

conda activate dl_py37
Copy the code

Install Pytoch1.7 and bash execution (it is recommended that cudatoolkit use 10.1 to support TensorFlow2.3) :

Conda install PyTorch TorchVision TorchAudio CudatoolKit =10.1Copy the code

TensorFlow ==2.3, yes ==, TensorFlow 2.3 supports GPU by default, so do not specify:

PIP install tensorflow = = 2.3Copy the code

Cudatoolkit =10.1; cudatoolkit=10.1; cudatoolkit=10.1;

PIP install mxnet - cu101 = = 1.7Copy the code

6. Test Pytorch, TensorFlow and MXNet

See: “AI Practices” to test whether the GPU version of deep learning framework is correctly installed methods :TensorFlow, PyTorch, MXNet, PaddlePaddle

1) TensorFlow

Tensorflow1. x and tensorFlow2. x test method is the same, the code is as follows:

import tensorflow as tf

print(tf.test.is_gpu_available())
Copy the code

The above code is saved as a.py file, which can be run using the test environment. Output: above is the log information, the key is the last True, indicating that the test is successful

The 2020-09-28 15:43:03. 197710: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not Compiled to Use: AVX2 2020-09-28 15:43:03.204525: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll Found device 0 with properties: I tensorflow/core/common_runtime/ GPU /gpu_device. Cc :1618] Found device 0 with properties: Name: GeForce RTX 2070 with max-Q Design Major: 7 Minor: 5 memoryClockRate(GHz): 1.125 pciBusID: 0000:01:00. 0 2020-09-28 15:43:03. 235352: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll The 2020-09-28 15:43:03. 242823: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll The 2020-09-28 15:43:03. 261932: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll The 2020-09-28 15:43:03. 268757: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll The 2020-09-28 15:43:03. 297478: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll The 2020-09-28 15:43:03. 315410: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll The 2020-09-28 15:43:03. 330562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll Cc: 2020-09-28 15:43:03.332846: I Tensorflow /core/ COMMON_runtime/GPU /gpu_device.cc:1746] Adding Visible GPU Devices: 0 2020-09-28 15:43:05. 198465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-09-28 15:43:05.200423: I tensorflow/ Core/COMMON_Runtime/GPU /gpu_device.cc: 220-09-28 15:43:05.201540: I TENsorflow/Core/GPU/Gpu_device.cc: 220-09-28 15:43:05.201540: I tensorflow/core/ COMMON_Runtime/GPU /gpu_device.cc:1178] 0: N 2020-09-28 15:43:05.203863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 6306 MB memory) -> Physical GPU (Device :0, name: GeForce RTX 2070 with max-q Design, PCI bus ID: 0000:01:00.0, compute Capability: 7.5) TrueCopy the code

The last True indicates that the test is successful. In fact, we can find a lot of GPU information

GPU model: Name: GeForce RTX 2070 with max-Q Design

Cuda version: Successfully Opened Dynamic Library CUDart64_100.dll (10.0)

Cudnn versions: Successfully Opened Dynamic Library CUDNn64_7.dll (7.x)

Number of Gpus: Adding Visible GPU Devices: 0 (1)

GPU: / Device :GPU:0 with 6306 MB memory (8G)

 

2) PyTorch

PyTorch and TensorFlow are similar in that they both have a GPU test interface. PyTorch’s GPU test code is as follows:

import torch

print(torch.cuda.is_available())
Copy the code

The above code is saved as a.py file and can be run using the test environment. The output: True indicates that the test is successful

True
Copy the code

You can see that the PyTorch output is much cleaner. The log output of TensorFlow is also controllable.

 

3) MXNet

MXNet differs from PyTorch and TensorFlow testing methods because MXNet’ has no GPU testing interface (or I could not find it). Therefore, the GPU test code of MXNet adopts try-catch method to test exceptions, and the code is as follows:

import mxnet as mx

mxgpu_ok = False

try:
    _ = mx.nd.array(1,ctx=mx.gpu(0))
    mxgpu_ok = True
except:
    mxgpu_ok = False

print(mxgpu_ok)
Copy the code

The above code is saved as a.py file and can be run using the test environment. The output: True indicates that the test is successful

 

Appendix: How to Uninstall Nvidia_Driver and Cuda Toolkit

To remove NVIDIA Drivers:

sudo apt-get --purge remove "*nvidia*"
sudo apt autoremove
Copy the code

To remove CUDA Toolkit:

sudo apt-get --purge remove "*cublas*" "cuda*"
sudo apt autoremove
Copy the code

 

Welcome to xiao Song’s public account “Minimalist AI” to teach you deep learning:

Based on the sharing of theoretical learning and application development technology of deep learning, the author will often share the dry contents of deep learning. When learning or applying deep learning, you can also communicate with me on this page if you have any questions.

 

From CSDN blog expert & Zhihu deep learning columnist @Xiaosong yes