On the 16th day of the November Gwen Challenge, check out the details: the last Gwen Challenge 2021

import torch
from torch import nn
from d2l import torch as d2l
Copy the code
n_train, n_test, num_inputs, batch_size = 20.100.200.5
true_w, true_b = torch.ones((num_inputs, 1)) * 0.01.0.05
train_data = d2l.synthetic_data(true_w, true_b, n_train)
train_iter = d2l.load_array(train_data, batch_size)
test_data = d2l.synthetic_data(true_w, true_b, n_test)
test_iter = d2l.load_array(test_data, batch_size, is_train=False)
Copy the code

The first is to generate a manual data set:


y = 0.05 + i = 1 d 0.01 x i + ϵ  where  ϵ …… N ( 0 . 0.0 1 2 ) Y = 0.05 + \sum_{I = 1}^d 0.01 x_i + \epsilon \text{where} \epsilon \sim \mathcal{N}(0, 0.01^2)
  • Display set training set, test set, characteristic quantity, batch size
    • The training set is set to 20, but the feature is set to 200. Because the less training data, the more complex the model, the more prone to overfitting phenomenon. In this case, it’s equivalent to 20 training data, but you have to fit it with a function of order 200.
    • Set the test set to 100, but it doesn’t really matter here, it doesn’t matter how big it is. A little more data makes overfitting more obvious.
    • Batch size means mini-batch of a batch size.
  • Set up real W, b
  • Synthetic functions are previously inHands-on deep learning 3.2- Manual linear regression from zeroIt’s implemented in there. Use it to generate training sets and test sets.
    Generate y = Xw + b + noise. ""
    def synthetic_data(w, b, num_examples) :  
        # torch. Normal (means, STD, out=None) generates a normally distributed random number tensor with means and standard deviation STD
        X = torch.normal(0.1, (num_examples, len(w))) 
        y = torch.matmul(X, w) + b
        # Add gaussian noise
        y += torch.normal(0.0.01, y.shape) 
        # Take y from a tensor of 1,000 to a tensor of 1,000 by 1
        return X, y.reshape((-1.1)) 
    Copy the code
  • Load_array is inHands-on deep learning 3.3- Linear regression concise implementationHas been implemented in. Used to load data sets.
    def load_array(data_arrays, batch_size, is_train=True) :
    Construct a PyTorch data iterator. "" 
        dataset = data.TensorDataset(*data_arrays) 
        return data.DataLoader(dataset, batch_size, shuffle=is_train)
    Copy the code
def init_params() :
    w = torch.normal(0.1, size=(num_inputs, 1), requires_grad=True)
    b = torch.zeros(1, requires_grad=True)
    return [w, b]
Copy the code

Random initialization.

def l2_penalty(w) :
    return torch.sum(w.pow(2)) / 2
Copy the code

L2L_2L2 norm penalty

def train(lambd) :
    w, b = init_params()
    net, loss = lambda X: d2l.linreg(X, w, b), d2l.squared_loss
    num_epochs, lr = 100.0.003
    
    # for visualization, can be ignored
    animator = d2l.Animator(xlabel='epochs', ylabel='loss', yscale='log',
                            xlim=[5, num_epochs], legend=['train'.'test'])
                            
    for epoch in range(num_epochs):
        for X, y in train_iter:
            with torch.enable_grad():
                # add L2 norm penalty, broadcast mechanism makes L2_penalty (w) a vector of length 'batch_size'.
                l = loss(net(X), y) + lambd * l2_penalty(w)
            l.sum().backward()
            d2l.sgd([w, b], lr, batch_size)
            
        if (epoch + 1) % 5= =0: # for visualization, can be ignored
            animator.add(epoch + 1, (d2l.evaluate_loss(net, train_iter, loss),
                                     d2l.evaluate_loss(net, test_iter, loss)))
                                     
    print(The L2 norm of 'w 'is:, torch.norm(w).item())
Copy the code

Net, loss = lambda X: d2l.linreg(X, w, b), d2l.squared_loss Calling net(X) is equivalent to calling d2l.linreg(X, w, b).

For details on lambda anonymous inline functions, see python lambda anonymous Inline functions – Nuggets (juejin. Cn).

train(lambd=0)
Copy the code

Training! Lambd = 0 disables weight attenuation after running this code. Notice that the training error has decreased, but the test error has not decreased. This implies a serious overfitting.

train(lambd=3)
Copy the code

Weight decay to run the code. Note that here the training error increases, but the test error decreases. This is exactly what we expect from regularization.


You can read more about hands-on Deep Learning here: Hands-on Deep Learning – LolitaAnn’s Column – Nuggets (juejin. Cn)

Notes are still being updated …………