ML 2021 Spring HW1: Regression

ML 2021 Spring (ntu.edu.tw)

library

# PyTorch
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# For data preprocess
import numpy as np
import csv
import os

# For plotting
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

myseed = 42069  # set a random seed for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)
Copy the code

The data processing

1. Data download

Kaggle: ML2021Spring-hw1 | Kaggle

2. Review the data

The data set is divided into training data and testing data.

Overview data:

3. The preprocessing

Three data sets:

train: the training set
dev: the validation set
test: Test set (no target)

Pretreatment:

Reading a CSV file
Feature extraction
willcovid.train.csvIt is divided into training set and test set
Normalized data

Read the data

Let’s demonstrate this with a simplified data set.

path = 'DataExample.csv'

with open(path, 'r') as fp:
            data = list(csv.reader(fp))
            data = np.array(data[1: []) :1:].astype(float)
Copy the code

Let’s demonstrate this with a simplified data set.

DataExample

id	AL	cli	ili	hh_cmnty_cli	tested_positive
0	1	0.81461	0.7713562	25.6489069	19.586492
1	1	0.8389952	0.8077665	25.6791006	20.1518381
2	1	0.8978015	0.8878931	26.0605436	20.7049346
3	1	0.9728421	0.9654959	25.7540871	21.2929114
4	1	0.9553056	0.9630788	25.9470152	21.1666563

Convert the data to a list store

data = list(csv.reader(fp))
print(data)
Copy the code

out:

[['id'.'AL'.'AK'.'AZ'.'cli'.'ili'.'hh_cmnty_cli'.'tested_positive'], 
 ['0'.'1'.'0'.'0'.'0.81461'.'0.7713562'.'25.6489069'.'19.586492'], 
 ['1'.'1'.'0'.'0'.'0.8389952'.'0.8077665'.'25.6791006'.'20.1518381'], 
 ['2'.'1'.'0'.'0'.'0.8978015'.'0.8878931'.'26.0605436'.'20.7049346'], 
 ['3'.'1'.'0'.'0'.'0.9728421'.'0.9654959'.'25.7540871'.'21.2929114'], 
 ['4'.'1'.'0'.'0'.'0.9553056'.'0.9630788'.'25.9470152'.'21.1666563']]
Copy the code

But we don’t need row 1 and column 1

data = np.array(data[1:)# delete the first line
print(data)
Copy the code

out:

[['0' '1' '0' '0' '0.81461' '0.7713562' '25.6489069' '19.586492']
 ['1' '1' '0' '0' '0.8389952' '0.8077665' '25.6791006' '20.1518381']
 ['2' '1' '0' '0' '0.8978015' '0.8878931' '26.0605436' '20.7049346']
 ['3' '1' '0' '0' '0.9728421' '0.9654959' '25.7540871' '21.2929114']
 ['4' '1' '0' '0' '0.9553056' '0.9630788' '25.9470152' '21.1666563']]
Copy the code

data = data[:, 1:].astype(float) Delete the first column and change the data type to float
print(data)
Copy the code

out:

[[ 1.         0.         0.         0.81461    0.7713562 25.6489069
  19.586492 ]
 [ 1.         0.         0.         0.8389952  0.8077665 25.6791006
  20.1518381]
 [ 1.         0.         0.         0.8978015  0.8878931 26.0605436
  20.7049346]
 [ 1.         0.         0.         0.9728421  0.9654959 25.7540871
  21.2929114]
 [ 1.         0.         0.         0.9553056  0.9630788 25.9470152
  21.1666563]]
Copy the code

Score according to the set

DataExample

1	0.81461	0.7713562	25.6489069	19.586492	0.8389952	0.8077665	25.6791006	20.1518381	0.8978015	0.8878931	26.0605436	20.7049346
1	0.8389952	0.8077665	25.6791006	20.1518381	0.8978015	0.8878931	26.0605436	20.7049346	0.9728421	0.9654959	25.7540871	21.2929114
1	0.8978015	0.8878931	26.0605436	20.7049346	0.9728421	0.9654959	25.7540871	21.2929114	0.9553056	0.9630788	25.9470152	21.1666563
1	0.9728421	0.9654959	25.7540871	21.2929114	0.9553056	0.9630788	25.9470152	21.1666563	0.9475134	0.9687637	26.3505008	19.8966066
1	0.9553056	0.9630788	25.9470152	21.1666563	0.9475134	0.9687637	26.3505008	19.8966066	0.8838331	0.8930201	26.4806235	20.1784284

For training data

feats = list(range(14)) # 14 = 3 + 4 + 4 + 3

target = data[:, -1]
data = data[:, feats]

print(target)
print(data)
Copy the code

out:

[20.7049346 21.2929114 21.1666563 19.8966066 20.1784284] # targets

[[ 1.         0.         0.         0.81461    0.7713562 25.6489069
  19.586492   0.8389952  0.8077665 25.6791006 20.1518381  0.8978015
   0.8878931 26.0605436]
 [ 1.         0.         0.         0.8389952  0.8077665 25.6791006
  20.1518381  0.8978015  0.8878931 26.0605436 20.7049346  0.9728421
   0.9654959 25.7540871]
 [ 1.         0.         0.         0.8978015  0.8878931 26.0605436
  20.7049346  0.9728421  0.9654959 25.7540871 21.2929114  0.9553056
   0.9630788 25.9470152]
 [ 1.         0.         0.         0.9728421  0.9654959 25.7540871
  21.2929114  0.9553056  0.9630788 25.9470152 21.1666563  0.9475134
   0.9687637 26.3505008]
 [ 1.         0.         0.         0.9553056  0.9630788 25.9470152
  21.1666563  0.9475134  0.9687637 26.3505008 19.8966066  0.8838331
   0.8930201 26.4806235]]
Copy the code

Now we have a total of 5 pieces of data. Next, we divide the training data into training set and test set

# for train set
indices = [i for i in range(len(data)) if i % 3! =0] 
print(indices)
Copy the code

out:

[1.2.4]
Copy the code

That is, the subscripts of the training set data are 1, 2, 4

All that’s left is the test set data

# for dev set
indices_2 = [i for i in range(len(data)) if i %3= =0]
print(indices_2)
Copy the code

out:

[0.3]
Copy the code

And then we’ll translate data and target into tensor

data = torch.FloatTensor(data[indices])
target = torch.FloatTensor(target[indices])

print(data)
print(target)
Copy the code

out:

tensor([[ 1.0000.0.0000.0.0000.0.8390.0.8078.25.6791.20.1518.0.8978.0.8879.26.0605.20.7049.0.9728.0.9655.25.7541],
        [ 1.0000.0.0000.0.0000.0.8978.0.8879.26.0605.20.7049.0.9728.0.9655.25.7541.21.2929.0.9553.0.9631.25.9470],
        [ 1.0000.0.0000.0.0000.0.9553.0.9631.25.9470.21.1667.0.9475.0.9688.26.3505.19.8966.0.8838.0.8930.26.4806]])

tensor([21.2929.21.1667.20.1784])
Copy the code

Data normalization

It can be seen that the data sizes of different features are greatly different. In order to balance their influence on the model, it is necessary to normalize the data. The method is:

It is common to normalize all data so that it falls between [-1,1] or [0,1]. Take the latter for example.

Linear function normalization (min-max Scaling) :

For a group of data, the minimum value is m and the maximum value is m, then for any data X, its normalization formula is:

X_{norm} = \frac{X – m}{M- m}

Note: This method implements equal scale scaling of the original data.

0 Z-Score Standardization:

0 mean normalization method normalized the original data set into a data set with a mean of 0 and variance of 1, and the normalization formula is as follows:

z = \frac{x-\mu}{\sigma}

Note: this kind of normalization method requires that the distribution of original data can be approximately gaussian distribution, otherwise the normalization effect will become very bad!

So here we’re using the zero mean normalization.

data[:, 3:] =( data[:, 3:] - data[:, 3:].mean(dim=0, keepdim=True))\
             / data[:, 3:].std(dim=0, keepdim=True) # STD = standard deviation
print(data)
Copy the code

out:

tensor([[ 1.0000.0.0000.0.0000, -1.0037, -1.0104, -1.1051, -1.0286, -1.0893,
         -1.1540.0.0184.0.1048.0.7532.0.6065, -0.8144],
        [ 1.0000.0.0000.0.0000.0.0075.0.0212.0.8424.0.0599.0.8764.0.5413, -1.0091.0.9435.0.3813.0.5477, -0.3017],
        [ 1.0000.0.0000.0.0000.0.9962.0.9892.0.2628.0.9687.0.2129.0.6127.0.9907, -1.0483, -1.1346, -1.1542.1.1161]])
Copy the code

In the homework, I tried two methods, and found that the convergence speed of linear function normalization is obviously slower than that of 0 mean normalization, and the final accuracy is lower.

Load the data

A DataLoader loads data from a given Dataset into batches.

Look at the relationship between a DataLoader and a DataSet and learn what Batch is.

Note: Shuffle must be set to false during the test. Otherwise, the sequence of each training set will be different, causing errors.

4. Complete code

class COVID19Dataset(Dataset) :
    ''' Dataset for loading and preprocessing the COVID19 dataset '''
    def __init__(self,
                 path,
                 mode='train',
                 target_only=False) :
        self.mode = mode

        # Read data into numpy arrays
        with open(path, 'r') as fp:
            data = list(csv.reader(fp))
            data = np.array(data[1: []) :1:].astype(float)
        
        if not target_only:
            * * * * * * * * * * * * * * * * * * * * * * * * *
            feats = list(range(93)) # 93 = 40 states + day 1 (18) + day 2 (18) + day 3 (17)
        else:
            # TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
            pass

        if mode == 'test':
            # Testing data
            # data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
            data = data[:, feats]
            self.data = torch.FloatTensor(data)
        else:
            # Training data (train/dev sets)
            # data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
            target = data[:, -1]
            data = data[:, feats]
            
            # Splitting training data into train & dev sets
            if mode == 'train':
                indices = [i for i in range(len(data)) if i % 10! =0]
            elif mode == 'dev':
                indices = [i for i in range(len(data)) if i % 10= =0]
            
            # Convert data into PyTorch tensors
            self.data = torch.FloatTensor(data[indices])
            self.target = torch.FloatTensor(target[indices])

        # Normalize features (you may remove this part to see what will happen)
        self.data[:, 40:] = 
            (self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) 
            / self.data[:, 40:].std(dim=0, keepdim=True)

        self.dim = self.data.shape[1]

        print('Finished reading the {} set of COVID19 Dataset ({} samples found, each dim = {})'
              .format(mode, len(self.data), self.dim))

    def __getitem__(self, index) :
        # Returns one sample at a time
        if self.mode in ['train'.'dev'] :# For training
            return self.data[index], self.target[index]
        else:
            # For testing (no target)
            return self.data[index]

    def __len__(self) :
        # Returns the size of the dataset
        return len(self.data)
    
# DataLoader

def prep_dataloader(path, mode, batch_size, n_jobs=0, target_only=False) :
    ''' Generates a dataset, then is put into a dataloader. '''
    dataset = COVID19Dataset(path, mode=mode, target_only=target_only)  # Construct dataset
    dataloader = DataLoader(
        dataset, batch_size,
        shuffle=(mode == 'train'), drop_last=False,
        num_workers=n_jobs, pin_memory=True)                            # Construct dataloader
    return dataloader
Copy the code

Network building

NeuralNet is an nn.Module designed for regression. The DNN consists of 2 fully-connected layers with ReLU activation. This module also included a function cal_loss for calculating loss.

1. ReLU

Nonlinear activation function

If the excitation function is not used, then in this case the output of each layer is a linear function of the input of the upper layer, then no matter how many layers the neural network has, the output is a linear combination of the input, which is equivalent to no hidden layer. This is the original perceptron.

Therefore, we decide to introduce nonlinear functions as excitation functions so that deep neural networks make sense. The output is no longer a linear combination of inputs, but it can approximate any function, and the earliest idea was to use sigmoid or tanh, and the output is bounded, and it’s easy to serve as the input for the next level.

Why ReLU?

Sigmoid and other functions are used to calculate the activation function (exponential operation), which requires a large amount of calculation. When calculating the error gradient by back propagation, the derivation involves division, which requires a large amount of calculation. Using Relu activation function saves a lot of calculation in the whole process.
For the deep network, when the sigmoID function is propagated back, it is easy for the gradient to disappear (when the SigmoID function is close to the saturation region, the change is too slow and the derivative tends to 0, which will cause information loss), so the training of the deep network cannot be completed.
Relu will make the output of some neurons be 0, thus resulting in the sparsity of the network, reducing the interdependence of parameters and alleviating the occurrence of over-fitting problems.

2. nn.MSEloss()

loss({\hat{y},y})=\frac1 n \displaystyle\sum(\hat{y}_i-y_i)^2

Pytorch MSEloss() has two Boolean arguments:

parameter	role
size_average	Whether to sum and average
reduce	Whether the output is scalar
reduction	Whether the output is scalar

Illustrate!

input = torch.randn(2.3,requires_grad=True) # prediction
target = torch.ones(2.3) # ground truth
print(f'input: {input}\n target: {target})
Copy the code

Out:

input: tensor([[-0.0733, -2.2085, -0.6919], [...1.1417, -1.1327, -1.5466]], requires_grad=True)
target: tensor([[1..1..1.],
        [1..1..1.]])
Copy the code

default

The default size_average=True, reduce=True. Finally, return a scalar.

loss_1 = nn.MSELoss()
output_1 = loss_1(input,target)

print(f'loss_1: {output_1}')
Copy the code

Out:

loss_1: 1.8783622980117798
Copy the code

size_average=False

I don’t divide by n factorial.

loss_1 = nn.MSELoss(size_average=False)
output_1 = loss_1(input,target)

print(f'loss_1: {output_1}')
Copy the code

Out:

loss_1: 11.819371223449707
Copy the code

reduce=False

Returns the tensor.

loss_1 = nn.MSELoss(reduce=False)
output_1 = loss_1(input,target)

print(f'loss_1: {output_1}')
Copy the code

Out:

loss_1: tensor([[0.0039.0.2338.3.5550],
        [0.1358.2.1851.0.1533]], grad_fn=<MseLossBackward0>)
Copy the code

For reduction, it is a combination of size_average and reduce!

This is part of the legacy_get_string function:

    if size_average is None:
        size_average = True
    if reduce is None:
        reduce = True

    if size_average and reduce:
        ret = 'mean'
    elif reduce:
        ret = 'sum'
    else:
        ret = 'none'
    if emit_warning:
        warnings.warn(warning.format(ret))
    return ret
Copy the code

3. Regularization

Original linear model:

f(\mathbf{x}) =b+\mathbf{w}^\mathrm{T}\mathbf{x}\\

That is:

f(\mathbf{x})=b+w_1x_1+w_2x_2+… +w_nx_n

But how do we determine n? Or how do we know how many features to set for x? If the X dimension is too high, it will be easily overfitting, while if the X dimension is too low, it will be underfitting. In order to weaken the influence of some dimensions and make the function smooth, regularization can be applied to the function.

L(\mathbf{w,},b)=\displaystyle\sum(y_i-(b+\mathbf{w}^\mathrm{T}\mathbf{x_i}))^2+\lambda\displaystyle\sum_i (w_i)^2

The point to note here is that overfitting and underfitting occur on test sets. For the training set, the higher the dimension, the better the fitting effect, as shown in the figure:

The higher the dimension, the larger the function field, then of course can cover the best function on the training set.

4. Complete code

class NeuralNet(nn.Module) :
    ''' A simple fully-connected deep neural network '''
    def __init__(self, input_dim) :
        super(NeuralNet, self).__init__()

        # Define your neural network here
        # TODO: How to modify this model to achieve better performance?
        self.net = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64.1))# Mean squared error loss
        self.criterion = nn.MSELoss(reduction='mean')

    def forward(self, x) :
        ''' Given input of size (batch_size x input_dim), compute output of the network '''
        return self.net(x).squeeze(1)

    def cal_loss(self, pred, target) :
        ''' Calculate loss '''
        # TODO: you may implement L1/L2 regularization here
        return self.criterion(pred, target)
Copy the code

training

1. Basic functions

getattr()

This is equivalent to the “.” operation. The parameters are as follows:

Object: indicates the instance of an object
Name: string, the name of the object’s member function or its member variable
Default: Returns the default value if the property does not exist in the object
Exception: Throws “AttrbuteError” when there is no default return value for this property

getattr(object, name) = object.name

model.train() v.s. model.eval()

model.train

Enable Batch Normalization and Dropout.

If you have Batch Normalization and Dropout layers in the model, you need to add model.train() during training. Model.train () is the mean and variance of each batch of data that is guaranteed to be used by BN layer. For Dropout, model.train() takes a random number of network connections to train and update the parameters.

model.eval()

Batch Normalization and Dropout are not enabled.

If there is a Batch Normalization and Dropout layer in the model, add Model.eval () during testing. Model.eval () is to ensure that the BN layer can use the mean and variance of all training data, that is, to ensure that the mean and variance of BN layer remain unchanged during the test. For Dropout, Model.eval () leverages all network connections without randomly dropping neurons.

After the train sample is trained, the generated model model will be used to test the sample. Model.eval () needs to be added before model(test), otherwise it will change weights if there is input data. This is the property of having a BN layer and Dropout in the model.

detach().cpu()

detach()

Function: Block back propagation
Return value: tensor, but the variable is still on the GPU

cpu()

Action: Moves data to the CPU
Return value: tensor

item()

What it does: Get the value of a tensor.

numpy()

Translate the tensor into numpy array

2. A basic routine

When iterating over epochs during training, we often use optimizer.zero_grad(), Loss.backward () and optimizer.step() in sequence.

Such as:

while epoch < n_epochs:
        model.train()                           # set model to training mode
        for x, y in tr_set:                     # iterate through the dataloader
            optimizer.zero_grad()               # set gradient to zero
            x, y = x.to(device), y.to(device)   # move data to device (cpu/cuda)
            pred = model(x)                     # forward pass (compute output)
            mse_loss = model.cal_loss(pred, y)  # compute loss
            mse_loss.backward()                 # compute gradient (backpropagation)
            optimizer.step()                    # update model with optimizer
            loss_record['train'].append(mse_loss.detach().cpu().item())
            ....
Copy the code

In general, their functions are as follows:

optimizer.zero_grad(): Gradient returns to zero
- The training process usually uses the mini-batch method, so if the gradient is not cleared, the gradient will be related to the data of the previous batch, so the function should be written before the back propagation and gradient descent.
loss.backward(): Back propagation calculates the gradient of each parameter
- If you don’t have a tensor. Backward (), the gradient will be None, so loss. Backward () is written before optimizer.step().
optimizer.step(): Gradient descent updates parameters
- The step() function performs an optimization step, updating the parameter values by gradient descent. Since gradient descent is gradient-based, the loss.Backward () function should be performed to calculate the gradient before executing the Optimizer.step () function.

Common parameter variables in functions:

Param_groups: When instantiated, the Optimizer class creates a param_groups list in its constructor that contains a param_groups dictionary of length 6 (num_groups depends on how many groups of parameters were passed in when the Optimizer was defined). Each param_group contains six key value pairs [‘params’, ‘LR ‘, ‘momentum’, ‘dampening’, ‘weight_decay’, ‘nesterov’].
Param_group [‘ params’] : By incoming list of model parameters, namely instantiation Optimizer classes into the group of parameters, if the parameter group, is for the entire model parameters of the model. The parameters (), each parameter is a torch. The nn. The parameter. The parameter object.

3. Complete code

def train(tr_set, dv_set, model, config, device) :
    ''' DNN training '''

    n_epochs = config['n_epochs']  # Maximum number of epochs

    # Setup optimizer
    optimizer = getattr(torch.optim, config['optimizer'])(
        model.parameters(), **config['optim_hparas'])

    min_mse = 1000.
    loss_record = {'train': [].'dev': []}      # for recording training loss
    early_stop_cnt = 0
    epoch = 0
    while epoch < n_epochs:
        model.train()                           # set model to training mode
        for x, y in tr_set:                     # iterate through the dataloader
            optimizer.zero_grad()               # set gradient to zero
            x, y = x.to(device), y.to(device)   # move data to device (cpu/cuda)
            pred = model(x)                     # forward pass (compute output)
            mse_loss = model.cal_loss(pred, y)  # compute loss
            mse_loss.backward()                 # compute gradient (backpropagation)
            optimizer.step()                    # update model with optimizer
            loss_record['train'].append(mse_loss.detach().cpu().item())

        # After each epoch, test your model on the validation (development) set.
        dev_mse = dev(dv_set, model, device)
        if dev_mse < min_mse:
            # Save model if your model improved
            min_mse = dev_mse
            print('Saving model (epoch = {:4d}, loss = {:.4f})'
                .format(epoch + 1, min_mse))
            torch.save(model.state_dict(), config['save_path'])  # Save model to specified path
            early_stop_cnt = 0
        else:
            early_stop_cnt += 1

        epoch += 1
        loss_record['dev'].append(dev_mse)
        if early_stop_cnt > config['early_stop'] :# Stop training if your model stops improving for "config['early_stop']" epochs.
            break

    print('Finished training after {} epochs'.format(epoch))
    return min_mse, loss_record
Copy the code

validation

Dev () is very similar to train(), but note that the mode of model is eval(), that is, no BN and Dropout.

The complete code

def dev(dv_set, model, device) :
    model.eval(a)# set model to evalutation mode
    total_loss = 0
    for x, y in dv_set:                         # iterate through the dataloader
        x, y = x.to(device), y.to(device)       # move data to device (cpu/cuda)
        with torch.no_grad():                   # disable gradient calculation
            pred = model(x)                     # forward pass (compute output)
            mse_loss = model.cal_loss(pred, y)  # compute loss
        total_loss += mse_loss.detach().cpu().item() * len(x)  # accumulate loss
    total_loss = total_loss / len(dv_set.dataset)              # compute averaged loss

    return total_loss
Copy the code

test

1. torch.cat()

Cat () can splice together multiple tensor sequences.

Parameters:

Inputs: Sequence of tensors to be connected. It can be any sequence of the same tensor type
Dim: Select the dimension extension that must be in0tolen(inputs[0])The sequence of tensors is connected along this dimension

Pay attention to

The input has to be a sequence, and the sequence has to be the same tensor of any of the same shapes
The dimension may not exceed the dimension of any tensor of the input data

Such as:

t1 = torch.Tensor([1.2.3])
t2 = torch.Tensor([4.5.6])
t3 = torch.Tensor([7.8.9])
list = [t1,t2,t3]
t = torch.cat(list,dim=0)
print(t)
Copy the code

Out:

tensor([1..2..3..4..5..6..7..8..9.])
Copy the code

Modify dim:

t1 = torch.Tensor([[1.2.3], [3.2.1]])
t2 = torch.Tensor([[4.5.6], [6.5.4]])
t3 = torch.Tensor([[7.8.9], [9.8.7]])
list = [t1,t2,t3]
t = torch.cat(list,dim=1)
print(t)
Copy the code

Out:

tensor([[1..2..3..4..5..6..7..8..9.],
        [3..2..1..6..5..4..9..8..7.]])
Copy the code

2. Complete code

def test(tt_set, model, device) :
    model.eval(a)# set model to evalutation mode
    preds = []
    for x in tt_set:                            # iterate through the dataloader
        x = x.to(device)                        # move data to device (cpu/cuda)
        with torch.no_grad():                   # disable gradient calculation
            pred = model(x)                     # forward pass (compute output)
            preds.append(pred.detach().cpu())   # collect prediction
    preds = torch.cat(preds, dim=0).numpy()     # concatenate all predictions and convert to a numpy array
    return preds
Copy the code

Setting hyperparameters

1. Complete code

device = get_device()                 # get the current available device ('cpu' or 'cuda')
os.makedirs('models', exist_ok=True)  # The trained model will be saved to ./models/
target_only = False                   # TODO: Using 40 states & 2 tested_positive features

# TODO: How to tune these hyper-parameters to improve your model's performance?
config = {
    'n_epochs': 3000.# maximum number of epochs
    'batch_size': 270.# mini-batch size for dataloader
    'optimizer': 'SGD'.# optimization algorithm (optimizer in torch.optim)
    'optim_hparas': {                # hyper-parameters for the optimizer (depends on which optimizer you are using)
        'lr': 0.001.# learning rate of SGD
        'momentum': 0.9              # momentum for SGD
    },
    'early_stop': 200.# early stopping epochs (the number epochs since your model's last improvement)
    'save_path': 'models/model.pth'  # your model will be saved here
}
Copy the code

Load the data and model

1. Complete code

tr_set = prep_dataloader(tr_path, 'train', config['batch_size'], target_only=target_only)
dv_set = prep_dataloader(tr_path, 'dev', config['batch_size'], target_only=target_only)
tt_set = prep_dataloader(tt_path, 'test', config['batch_size'], target_only=target_only)
Copy the code

To start!

model_loss, model_loss_record = train(tr_set, dv_set, model, config, device)
Copy the code

Data visualization

1. training

Let’s take a look at MSEloss as the number of sessions increases.

plot_learning_curve(model_loss_record, title='deep model')
Copy the code

And look at the predictive effect.

del model
model = NeuralNet(tr_set.dataset.dim).to(device) 
ckpt = torch.load(config['save_path'], map_location='cpu')  # Load your best model
model.load_state_dict(ckpt)
plot_pred(dv_set, model, device)  # Show prediction on the validation set
Copy the code

The dots on the blue line indicate that the predicted value equals the actual value.

2. testing

def save_pred(preds, file) :
    ''' Save predictions to specified file '''
    print('Saving results to {}'.format(file))
    with open(file, 'w') as fp:
        writer = csv.writer(fp)
        writer.writerow(['id'.'tested_positive'])
        for i, p in enumerate(preds):
            writer.writerow([i, p])

preds = test(tt_set, model, device)  # predict COVID-19 cases with your model
save_pred(preds, 'pred.csv')         # save prediction file to pred.csv
Copy the code

Preds results:

Improvements

We need to modify the sample code to make the model better!

Public leaderboard

Simple baseline: 2.04826
Medium baseline: 1.36937
Strong baseline: 0.89266

Hints

Feature selection (what other features are useful?)
DNN architecture (layers? dimension? activation function?)
Training (mini-batch? optimizer? learning rate?)
L2 regularization
There are some mistakes in the sample code, can you find them?

TODO1: Modify the characteristics used for training

In the COVID-19 19Dataset, we can modify extracted features.

 if not target_only:
            * * * * * * * * * * * * * * * * * * * * * * * * *
            feats = list(range(93)) # 93 = 40 states + day 1 (18) + day 2 (18) + day 3 (17)
        else:
            # TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
            # Use only 42 features
            feats = list(range(40))
            feats.append(57)
            feats.append(75)
            pass
Copy the code

Come and see the final result!

Convergence speed greatly improved!

Score:

TODO2: Adds regularization

When setting hyperparameters, add regularization.

config = {
    'n_epochs': 3000.# maximum number of epochs
    'batch_size': 270.# mini-batch size for dataloader
    'optimizer': 'SGD'.# optimization algorithm (optimizer in torch.optim)
    'optim_hparas': {                # hyper-parameters for the optimizer (depends on which optimizer you are using)
        'lr': 0.001.# learning rate of SGD
        'momentum': 0.9              # momentum for SGD
        'weight_decay': 0.1			 # regularization
    },
Copy the code

The result is an improvement, but not a significant one, and the rate of convergence is slower (reasonable)…

TODO3: Modify neural network structure

Add more layers

class NeuralNet(nn.Module) :
    ''' A simple fully-connected deep neural network '''
    def __init__(self, input_dim) :
        super(NeuralNet, self).__init__()

        # Define your neural network here
        # TODO: How to modify this model to achieve better performance?
        self.net = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128.64),
            nn.ReLU(),
            nn.Linear(64.1))...Copy the code

Just add a linear layer and a ReLU layer…

Some effect, but not very significant…

PReLU

Change ReLU layer to PReLU.

TODO4: optimizer

Adam

It’s a long story, slightly worse than SGD.

And MSEloss seems to fluctuate a bit…

TODO5: Fixes errors

For the standardization of test sets, the mean and variance of training sets should be selected, because test sets may be small, and the mean and variance cannot reflect the characteristics of a large number of data.

if mode == 'test':
            # Testing data
            # data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
            data = data[:, feats]
            self.data = torch.FloatTensor(data)
            self.data[:, 40:] = \
            (self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) \
            / self.data[:, 40:].std(dim=0, keepdim=True)
        else:
            # Training data (train/dev sets)
            # data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
            target = data[:, -1]
            data = data[:, feats]
            indices_train = [i for i in range(len(data)) if i % 10! =0]
            tr_data = self.data = torch.FloatTensor(data[indices_train])
            tr_mean = tr_data[:, 40:].mean(dim=0, keepdim=True)
            tr_std = tr_data[:, 40:].std(dim=0, keepdim=True)
            # Splitting training data into train & dev sets
            if mode == 'train':
                indices = indices_train
                self.data = tr_data
                self.target = torch.FloatTensor(target[indices])
            elif mode == 'dev':
                indices = [i for i in range(len(data)) if i % 10= =0]
                self.data = torch.FloatTensor(data[indices])
                self.target = torch.FloatTensor(target[indices])
            self.data[:, 40:] = \
            (self.data[:, 40:] - tr_mean) \
            / tr_std
Copy the code

But the result was strange, the score was much worse…

feeling

It took two afternoons to adjust the parameter, and the result was not as good as the first time… I feel that I am not familiar with the model and have no experience in tuning. This homework will come to an end for the time being, the knowledge reserve may be able to cross the strong baseline……

Some useful links

Colab: ML2021Spring – HW1.ipynb – Colaboratory (google.com)

Pytorch Totorial P1: ML2021 Pytorch tutorial part 1 – YouTube

Kaggle: ML2021Spring-hw1 | Kaggle

1: Introduction, Colab & PyTorch Tutorials, HW1_Gods silent Blog -CSDN Blog

Regularization: Pytorch method for Regularization Regularization of L2 and L1

Activation Function -CSDN blog – Activation Function

HWExample: 2021 Machine Learning course assignment 1 – Jianshu.com

Cross Baseline: ML2021Spring hw1-liZHI334

Reference

ReLU: What ReLU does _KAMITA’s blog -CSDN Blog

Train () and eval() : Pytorch: model.train() and model.eval(), and model.eval() and torch.no_grad(

Optimizer.step () : Understand the functions and principles of optimizer.zero_grad(), loss.backward(), optimizer.step()

Source: Heng-Jui Chang @ NTUEE (github.com/ga642381/ML…)

library

The data processing

1. Data download

2. Review the data

3. The preprocessing

Read the data

Score according to the set

Data normalization

Load the data

4. Complete code

Network building

1. ReLU

Nonlinear activation function

Why ReLU?

2. nn.MSEloss()

3. Regularization

4. Complete code

training

1. Basic functions

getattr()

model.train() v.s. model.eval()

detach().cpu()

2. A basic routine

3. Complete code

validation

The complete code

test

1. torch.cat()

2. Complete code

Setting hyperparameters

1. Complete code

Load the data and model

1. Complete code

To start!

Data visualization

1. training

2. testing

Improvements

Hints

TODO1: Modify the characteristics used for training

TODO2: Adds regularization

TODO3: Modify neural network structure

Add more layers

PReLU

TODO4: optimizer

Adam

TODO5: Fixes errors

feeling

Some useful links

Reference

Related Posts

Integrated learning for machine learning

Coursera Deep Learning 2: Improving Deep Neural Networks: Hyperparametric Debugging, Regularization, and Optimization

Optimization of PCB component layout based on MATLAB GUI genetic algorithm