Tf.config, a common module in TensorFlow, is used to allocate video card resources and set video memory usage policies.

Specifies the GPU used by the current program

Many times a lab/company research group has many students/researchers who need to work on a multi-GPU workstation, and TensorFlow by default uses all the Gpus available to it, which requires proper allocation of graphics card resources.

. First of all, through the tf. Config. Experimental list_physical_devices, we can get the current host to a particular operation device type (such as the GPU or CPU) list, for example, Run the following code on a workstation with four Gpus and one CPU:

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
cpus = tf.config.experimental.list_physical_devices(device_type='CPU')
print(gpus, cpus)
Copy the code

Output:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'),
PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'),
PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'),
PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
Copy the code

The workstation has four Gpus: GPU:0, GPU:1, GPU:2, AND GPU:3, and one CPU: CPU:0. . Then, by tf. Config. Experimental set_visible_devices, can set the current program visible range of equipment (the current procedures can only use their visible equipment, invisible equipment not used by the current program). For example, if we need to limit the current program to use only two graphics cards with subscripts 0 and 1 (GPU:0 and GPU:1) on the above 4-card machine, we can use the following code:

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
tf.config.experimental.set_visible_devices(devices=gpus[0:2], device_type='GPU')
Copy the code

Using the environment variable CUDA_VISIBLE_DEVICES you can also control which GPU your program uses. Assume that graphics card 0,1 is in use, graphics card 2,3 is idle, and Linux terminal input:

export CUDA_VISIBLE_DEVICES=2.3
Copy the code

Or add it to your code

import os
os.environ['CUDA_VISIBLE_DEVICES'] = "2, 3"
Copy the code

You can specify that the program runs only on graphics card 2,3.

Set the video memory usage policy

By default, TensorFlow uses almost all available video memory to avoid the performance penalty of memory fragmentation. However, TensorFlow offers two video memory usage strategies that give you more flexibility over how your program uses video memory: You only apply for video memory when you need it (your program consumes very little video memory initially and dynamically applies for video memory as it runs);

Limit the consumption of video memory of a fixed size (the program will not exceed the specified size of video memory, if it exceeds an error).

Can use tf. Config. Experimental. Set_memory_growth sets the GPU memory use strategy to “only when required to apply for memory space”. The following code sets all gpus to apply for video memory only when needed:

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
for gpu in gpus:
	 tf.config.experimental.set_memory_growth(device=gpu, enable=True)
Copy the code

. The following code by tf. Config. Experimental set_virtual_device_configuration option and incoming tf. Config. Experimental. VirtualDeviceConfiguration For example, set TensorFlow to have a fixed 1GB GPU memory consumption of 0:

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
Copy the code

The tf.pat.v1. ConfigPhoto class can be passed in to instantiate a new session to set TensorFlow’s policy of using video memory. This is done by instantiating a tf.Configproto class, setting the parameters, and specifying the Config parameter when creating tf.pat.v1.session. The following code sets TensorFlow to apply for video memory only when needed with the allow_growth option:

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)
Copy the code

The following code sets TensorFlow to a fixed 40% GPU memory consumption with the per_process_gpu_memory_fraction option:

config = tf.compat.v1.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
tf.compat.v1.Session(config=config)
Copy the code

Single-gpu simulates a multi-GPU environment

When we have a local development environment with only one GPU, but need to write multi-GPU programs to perform training tasks on the workstation, TensorFlow provides a convenient function for us to create multiple simulated Gpus in the local development environment, which makes multi-GPU program debugging more convenient. The following code builds two virtual Gpus, each with 2GB of video memory, based on physical GPU GPU:0.

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048),
 tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)])
Copy the code

We are training in single-player multi-card (tf.wiki/en/ Appendix…) Before adding the above code, you can make the code originally designed for multiple Gpus run in a single GPU environment. When printing the number of devices, the program prints:

Number of devices: 2