nvidia-smi
The NVIDIA System Management Interface (NVIDIa-SMI) is a command line utility based on the NVIDIA Management Library (NVML) designed to help manage and monitor NVIDIA GPU devices.
Viewing GPU Parameters
View the GPU running status
nvidia-smi
Copy the code
Sun Mar 28 02:40:38 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 | | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | GPU Name Persistence -m | Bus - Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... On | 00000000:2:00. Off | 0 N/A | | | 23% 29 c P8 9 w / 250 w 611 mib / 11178 mib | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... On | 00000000:03:00 Off | 0 N/A | | 23% 30 c | 0 mib P8 9 w / 250 w / 11178 mib 0% Default | | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 108... : 00000000-82 On | Off | 00.0 N/A | | 23% 30 c | 0 mib P8 9 w / 250 w / 11178 mib 0% Default | | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 108... : 00000000-83 On | Off | 00.0 N/A | | 23% 30 c | 0 mib P8 9 w / 250 w / 11178 mib 0% Default | | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 33777 C /usr/bin/python 601MiB | +-----------------------------------------------------------------------------+Copy the code
This is the GEFORCE GTX 1080 TiGPU server running information.
- The first line is command line tool version, GPU driver version, and CUDA version
- The first column is GPU(GPU card number, 0 ~ 4) and Fan(Fan speed, 0 ~ 100%).
- The second column is: Name(graphics card Name), Temp(temperature, degree Celsius)
- The third column is: Perf(performance status, P0 to P12, highest performance P0, lowest performance P12)
- The fourth column is: persistence-M (continuous mode, default is off, energy saving, if set to ON, it consumes a lot of energy, but it takes a shorter time to start a new GPU application), Pwr:Usage/Cap(energy consumption)
- The fifth column are: Bus – Id (GPU Bus, domain: Bus: device. The function)
- Column 6: Disp.A(whether GPU display is initialized), memory-usage (video Memory Usage)
- Column 7: Volatile GPU-util
- ECC(Error Correcting Code), Compute M.
- The following table shows the resource usage of each GPU process
Note: Graphics memory occupation and GPU occupation are two different things. Graphics card is composed of GPU and graphics memory. The relationship between graphics memory and GPU can be simply understood as the relationship between memory and CPU.
Obtain GPU ID information
nvidia-smi -L
Copy the code
The GPU card number, GPU model, and GPU physical UUID are displayed from left to right
GPU 0: GeForce GTX 1080 Ti (UUID: GPU-5da6e67e-fd5a-88fb-7a0e-109c3284f7bf)
GPU 1: GeForce GTX 1080 Ti (UUID: GPU-ce9189e4-2e58-3a19-4332-cb5c7fac1aa6)
GPU 2: GeForce GTX 1080 Ti (UUID: GPU-242b3020-8e5c-813a-42d9-475766d52f9d)
GPU 3: GeForce GTX 1080 Ti (UUID: GPU-8f3d825f-7246-3daf-eaa1-37845b03aa03)
Copy the code
The GPU card number is separately filtered
nvidia-smi -L | cut -d ' ' -f 2 | cut -c 1
Copy the code
GPU Common Settings
Boot Mode Setting
Solve the problem of GPU startup loading slowness
Set GPU Persistence mode: Persistence -m sudo nvidia- SMi-pm 1Copy the code
Distribution of nodes
To solve the problem of uneven card performance, if it is a four-card machine, only two nodes are preferred to choose 0 and 3, and the boundary card slot is conducive to heat dissipation
The appendix
- Developer.nvidia.com/nvidia-syst…