On the 16th day of the November Gwen Challenge, check out the details: the last Gwen Challenge 2021

import torch
from torch import nn
from d2l import torch as d2l
Copy the code
dropout1, dropout2 = 0.2.0.5

net = nn.Sequential(nn.Flatten(),
        nn.Linear(784.256),
        nn.ReLU(),
        Add a Dropout layer after the first fully connected layer
        nn.Dropout(dropout1),
        nn.Linear(256.256),
        nn.ReLU(),
        Add a Dropout layer after the second full connection layer
        nn.Dropout(dropout2),
        nn.Linear(256.10))

def init_weights(m) :
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights)

Copy the code

The model looks like this:

Sequential( (0): Flatten(start_dim=1, end_dim=-1) (1): Linear(in_features=784, out_features=256, bias=True) (2): Dropout(p=0.2, inplace=False) (4): Linear(IN_features =256, out_features=256, bias=True) (5): ReLU() (6): Dropout(P =0.2, inplace=False) (4): Linear(IN_features =256, out_features=256, bias=True) Dropout(P =0.5, inplace=False) (7): Linear(IN_features =256, out_features=10, bias=True)Copy the code
  • (0) : The input is processed, that is, the vector of 28*28 flattens into 784
  • (1) : The first layer is the input layer
  • (2): Add the activation function ReLU() to the first layer
  • (3): Make the first layer Dropout, where P =0.2
  • (4): Hidden layer 1 is 256 vector
  • (5): Add activation function ReLU() to hidden layer 1
  • (6): Make Dropout of hidden layer 1, where P =0.5
  • (7): The hidden layer 2 is 256 vectors, and the output layer is 10 categories

For p of the hidden layer, it follows this formula:


E [ x i ] = p 0 + ( 1 p ) x i 1 p = x i \begin{aligned} E\left[x_{i}’\right] &=p \cdot 0+(1-p) \frac{x_{i}}{1-p} =x_{i} \end{aligned}

For input layer to hidden layer 1:


E ( h 11 ) = 0.2 x 0 + ( 1 0.2 ) h 11 1 0.2 = h 11 E \ left (h_ {11} ^ {\ prime} \ right) = 0.2 \ times 0 + (1-0.2) \ frac {h_ {11}} {1-0.2} = h_ {11}

For hidden layer 1 to hidden layer 2:


E ( h 21 ) = 0.5 x 0 + ( 1 0.5 ) h 21 1 0.5 = h 11 E \ left (h_ {21} ^ {\ prime} \ right) = 0.5 \ times 0 + (1-0.5) \ frac {h_ {21}} {1-0.5} = h_ {11}

num_epochs, lr, batch_size = 10.0.5.256
loss = nn.CrossEntropyLoss()
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
trainer = torch.optim.SGD(net.parameters(), lr=lr)

trainer = torch.optim.SGD(net.parameters(), lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
Copy the code

This code is exactly the same as the manual implementation for Hands-on Deep Learning 4.6 Dropout.

  • Set the training iteration times, learning rate and batch size
  • Loss uses cross entropy
  • Read data set
  • training

Again, it may seem that manual implementation is not very different from this, just making your own parallel, but when dealing with massive amounts of data, the parallel library is faster and more efficient than writing by hand.

Manual implementation is to understand the principle, concise implementation is to improve efficiency.


You can read more about hands-on Deep Learning here: Hands-on Deep Learning – LolitaAnn’s Column – Nuggets (juejin. Cn)

Notes are still being updated …………