On the 16th day of the November Gwen Challenge, check out the details: the last Gwen Challenge 2021
import torch
from torch import nn
from d2l import torch as d2l
Copy the code
n_train, n_test, num_inputs, batch_size = 20.100.200.5
true_w, true_b = torch.ones((num_inputs, 1)) * 0.01.0.05
train_data = d2l.synthetic_data(true_w, true_b, n_train)
train_iter = d2l.load_array(train_data, batch_size)
test_data = d2l.synthetic_data(true_w, true_b, n_test)
test_iter = d2l.load_array(test_data, batch_size, is_train=False)
Copy the code
The first is to generate a manual data set:
Deep learning 4.5 regularization weight decay code manual implementation – Digging gold (juejin. Cn)
def init_params() :
w = torch.normal(0.1, size=(num_inputs, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
return [w, b]
Copy the code
Random initialization.
def train_concise(wd) :
net = nn.Sequential(nn.Linear(num_inputs, 1))
for param in net.parameters():
param.data.normal_()
loss = nn.MSELoss()
num_epochs, lr = 100.0.003
The bias parameter is not decayed, only weight decay is set.
trainer = torch.optim.SGD([
{"params":net[0].weight,'weight_decay': wd},
{"params":net[0].bias}], lr=lr)
# Ignore this code for visualization
animator = d2l.Animator(xlabel='epochs', ylabel='loss', yscale='log',
xlim=[5, num_epochs], legend=['train'.'test'])
for epoch in range(num_epochs):
for X, y in train_iter:
with torch.enable_grad():
trainer.zero_grad()
l = loss(net(X), y)
l.backward()
trainer.step()
# Ignore this code for visualization
if (epoch + 1) % 5= =0:
animator.add(epoch + 1, (d2l.evaluate_loss(net, train_iter, loss),
d2l.evaluate_loss(net, test_iter, loss)))
print('L2 norm of W:', net[0].weight.norm().item())
Copy the code
The weight decay superparameter is specified directly with weight_decay.
By default, PyTorch attenuates both weights and offsets.
Here we set weight_decay only for weights, so the bias parameter B does not decay.
It may not seem that the code is much shorter than the weight decaying code implemented manually, but they run faster, are easier to implement, and are even more advantageous for more complex problems.
train_concise(0)
train_concise(3)
Copy the code
Training.
Take a look at the results:
Train_concise (0) :
Train_concise (3) :
The over-fitting phenomenon was alleviated after regularization.
You can read more about hands-on Deep Learning here: Hands-on Deep Learning – LolitaAnn’s Column – Nuggets (juejin. Cn)
Notes are still being updated …………