Original: www.reddit.com/r/MachineLe…
A_few_helpful_Pytorch_tips_examples_included
Translation by KBSC13
Contact information:
Github:github.com/ccc013/AI_a…
Wechat official Account: AI algorithm notes
preface
This is the machine Learning section of Reddit, and someone has summarized about 7 useful PyTorch tips, along with colab’s code examples and videos. The code and video links are as follows:
Code: colab.research.google.com/drive/15vGz…
Video: youtu. Be/BoC8SGaT3GE
The video has also been uploaded to my B website with the following link:
www.bilibili.com/video/BV1YK…
In addition, the code and video can be obtained by replying “12” in the background of the public account.
1. Create Tensors directly on the target equipment
The first technique is to create a tensor directly on the target device using the device parameter.
The first is to create tensors on the CPU and then move them to the GPU using.cuda() as follows:
start_time = time.time()
for _ in range(100) :# Creating on the CPU, then transfering to the GPU
cpu_tensor = torch.ones((1000.64.64))
gpu_tensor = cpu_tensor.cuda()
print('Total time: {:.3f}s'.format(time.time() - start_time))
Copy the code
The second is to create the tensors directly on the target device as follows:
start_time = time.time()
for _ in range(100) :# Creating on GPU directly
cpu_tensor = torch.ones((1000.64.64), device='cuda')
print('Total time: {:.3f}s'.format(time.time() - start_time))
Copy the code
The running times of the two methods are as follows:
You can see that the speed of creating Tensors directly on the target device is very fast;
2. Use it whenever possibleSequential
层
The second trick is to use Sequential layers to make the code look more concise.
The code for the first network model is as follows:
class ExampleModel(nn.Module) :
def __init__(self) :
super().__init__()
input_size = 2
output_size = 3
hidden_size = 16
self.input_layer = nn.Linear(input_size, hidden_size)
self.input_activation = nn.ReLU()
self.mid_layer = nn.Linear(hidden_size, hidden_size)
self.mid_activation = nn.ReLU()
self.output_layer = nn.Linear(hidden_size, output_size)
def forward(self, x) :
z = self.input_layer(x)
z = self.input_activation(z)
z = self.mid_layer(z)
z = self.mid_activation(z)
out = self.output_layer(z)
return out
Copy the code
Its operating effect is as follows:
The writing method of using Sequential to build the network model is as follows:
class ExampleSequentialModel(nn.Module) :
def __init__(self) :
super().__init__()
input_size = 2
output_size = 3
hidden_size = 16
self.layers = nn.Sequential(
nn.Linear(input_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, output_size))
def forward(self, x) :
out = self.layers(x)
return out
Copy the code
Its operating effect is as follows:
You can see that the code that builds the network model in nn.sequential is much more concise.
3. Do not use lists to store the network layer
The third tip is that it is not recommended to use lists to hold created network layers because the nn.Module class cannot register them successfully. Instead, the list should be passed into nn.sequential.
The first is to show an example of a mistake:
class BadListModel(nn.Module) :
def __init__(self) :
super().__init__()
input_size = 2
output_size = 3
hidden_size = 16
self.input_layer = nn.Linear(input_size, hidden_size)
self.input_activation = nn.ReLU()
# Fairly common when using residual layers
self.mid_layers = []
for _ in range(5) : self.mid_layers.append(nn.Linear(hidden_size, hidden_size)) self.mid_layers.append(nn.ReLU()) self.output_layer = nn.Linear(hidden_size, output_size)def forward(self, x) :
z = self.input_layer(x)
z = self.input_activation(z)
for layer in self.mid_layers:
z = layer(z)
out = self.output_layer(z)
return out
bad_list_model = BadListModel()
print('Output shape:', bad_list_model(torch.ones([100.2])).shape)
gpu_input = torch.ones([100.2], device='cuda')
gpu_bad_list_model = bad_list_model.cuda()
print('Output shape:', bad_list_model(gpu_input).shape)
Copy the code
When you print the second sentence, you will find an error:
Correct way to write:
class CorrectListModel(nn.Module) :
def __init__(self) :
super().__init__()
input_size = 2
output_size = 3
hidden_size = 16
self.input_layer = nn.Linear(input_size, hidden_size)
self.input_activation = nn.ReLU()
# Fairly common when using residual layers
self.mid_layers = []
for _ in range(5) : self.mid_layers.append(nn.Linear(hidden_size, hidden_size)) self.mid_layers.append(nn.ReLU()) self.mid_layers = nn.Sequential(*self.mid_layers) self.output_layer = nn.Linear(hidden_size, output_size)def forward(self, x) :
z = self.input_layer(x)
z = self.input_activation(z)
z = self.mid_layers(z)
out = self.output_layer(z)
return out
correct_list_model = CorrectListModel()
gpu_input = torch.ones([100.2], device='cuda')
gpu_correct_list_model = correct_list_model.cuda()
print('Output shape:', correct_list_model(gpu_input).shape)
Copy the code
The printed result:
4. Use it welldistributions
The fourth technique is the torch for PyTorch. There are some nice objects and methods to implement distribution in the repository, but they are not very well used.
Pytorch.org/docs/stable…
Here’s an example of how to use it:
5. Use it for long-term indicatorsdetach
The fifth trick is to use.detach() to prevent memory leaks if you need to store tensor metrics between each epoch.
Let’s use a code example to illustrate this, starting with the initial configuration:
# Setup
example_model = ExampleModel()
data_batches = [torch.rand((10.2)) for _ in range(5)]
criterion = nn.MSELoss(reduce='mean')
Copy the code
Examples of incorrect code:
losses = []
# Training loop
for batch in data_batches:
output = example_model(batch)
target = torch.rand((10.3))
loss = criterion(output, target)
losses.append(loss)
# Optimization happens here
print(losses)
Copy the code
The print result is as follows:
The correct way to write it
losses = []
# Training loop
for batch in data_batches:
output = example_model(batch)
target = torch.rand((10.3))
loss = criterion(output, target)
losses.append(loss.item()) # Or `loss.item()`
# Optimization happens here
print(losses)
Copy the code
The print result is as follows:
The loss.item() method should be called here to hold the loss value in each epoch.
6. Tips for deleting models on the GPU
The sixth tip is that you can clean up the GPU cache using the torch.cuda.empty_cache() method. This method is useful when using notebook, especially if you want to delete and recreate a large model.
The following is an example:
import gc
example_model = ExampleModel().cuda()
del example_model
gc.collect()
# The model will normally stay on the cache until something takes it's place
torch.cuda.empty_cache()
Copy the code
7. Call before testeval()
Finally, don’t forget to call model.eval() before you start testing. This is simple but easy to forget. This operation will necessitates some changes to the network layer that are set up differently during the training and validation phases. The modules that will be affected include:
- Dropout
- Batch Normalization
- RNNs
- Lazy Variants
This can be reference: stackoverflow.com/questions/6…
The following is an example:
example_model = ExampleModel()
# Do training
example_model.eval(a)# Do testing
example_model.train()
# Do training again
Copy the code