ML 2021 Spring (ntu.edu.tw)
library
# PyTorch
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
# For data preprocess
import numpy as np
import csv
import os
# For plotting
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
myseed = 42069 # set a random seed for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(myseed)
Copy the code
The data processing
1. Data download
Kaggle: ML2021Spring-hw1 | Kaggle
2. Review the data
The data set is divided into training data and testing data.
Overview data:
3. The preprocessing
Three data sets:
train
: the training setdev
: the validation settest
: Test set (no target)
Pretreatment:
- Reading a CSV file
- Feature extraction
- will
covid.train.csv
It is divided into training set and test set - Normalized data
Read the data
Let’s demonstrate this with a simplified data set.
path = 'DataExample.csv'
with open(path, 'r') as fp:
data = list(csv.reader(fp))
data = np.array(data[1: []) :1:].astype(float)
Copy the code
Let’s demonstrate this with a simplified data set.
DataExample
id | AL | AK | AZ | cli | ili | hh_cmnty_cli | tested_positive |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 0.81461 | 0.7713562 | 25.6489069 | 19.586492 |
1 | 1 | 0 | 0 | 0.8389952 | 0.8077665 | 25.6791006 | 20.1518381 |
2 | 1 | 0 | 0 | 0.8978015 | 0.8878931 | 26.0605436 | 20.7049346 |
3 | 1 | 0 | 0 | 0.9728421 | 0.9654959 | 25.7540871 | 21.2929114 |
4 | 1 | 0 | 0 | 0.9553056 | 0.9630788 | 25.9470152 | 21.1666563 |
Convert the data to a list store
data = list(csv.reader(fp))
print(data)
Copy the code
out:
[['id'.'AL'.'AK'.'AZ'.'cli'.'ili'.'hh_cmnty_cli'.'tested_positive'],
['0'.'1'.'0'.'0'.'0.81461'.'0.7713562'.'25.6489069'.'19.586492'],
['1'.'1'.'0'.'0'.'0.8389952'.'0.8077665'.'25.6791006'.'20.1518381'],
['2'.'1'.'0'.'0'.'0.8978015'.'0.8878931'.'26.0605436'.'20.7049346'],
['3'.'1'.'0'.'0'.'0.9728421'.'0.9654959'.'25.7540871'.'21.2929114'],
['4'.'1'.'0'.'0'.'0.9553056'.'0.9630788'.'25.9470152'.'21.1666563']]
Copy the code
But we don’t need row 1 and column 1
data = np.array(data[1:)# delete the first line
print(data)
Copy the code
out:
[['0' '1' '0' '0' '0.81461' '0.7713562' '25.6489069' '19.586492']
['1' '1' '0' '0' '0.8389952' '0.8077665' '25.6791006' '20.1518381']
['2' '1' '0' '0' '0.8978015' '0.8878931' '26.0605436' '20.7049346']
['3' '1' '0' '0' '0.9728421' '0.9654959' '25.7540871' '21.2929114']
['4' '1' '0' '0' '0.9553056' '0.9630788' '25.9470152' '21.1666563']]
Copy the code
data = data[:, 1:].astype(float) Delete the first column and change the data type to float
print(data)
Copy the code
out:
[[ 1. 0. 0. 0.81461 0.7713562 25.6489069
19.586492 ]
[ 1. 0. 0. 0.8389952 0.8077665 25.6791006
20.1518381]
[ 1. 0. 0. 0.8978015 0.8878931 26.0605436
20.7049346]
[ 1. 0. 0. 0.9728421 0.9654959 25.7540871
21.2929114]
[ 1. 0. 0. 0.9553056 0.9630788 25.9470152
21.1666563]]
Copy the code
Score according to the set
DataExample
1 | 0 | 0 | 0.81461 | 0.7713562 | 25.6489069 | 19.586492 | 0.8389952 | 0.8077665 | 25.6791006 | 20.1518381 | 0.8978015 | 0.8878931 | 26.0605436 | 20.7049346 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | 0.8389952 | 0.8077665 | 25.6791006 | 20.1518381 | 0.8978015 | 0.8878931 | 26.0605436 | 20.7049346 | 0.9728421 | 0.9654959 | 25.7540871 | 21.2929114 |
1 | 0 | 0 | 0.8978015 | 0.8878931 | 26.0605436 | 20.7049346 | 0.9728421 | 0.9654959 | 25.7540871 | 21.2929114 | 0.9553056 | 0.9630788 | 25.9470152 | 21.1666563 |
1 | 0 | 0 | 0.9728421 | 0.9654959 | 25.7540871 | 21.2929114 | 0.9553056 | 0.9630788 | 25.9470152 | 21.1666563 | 0.9475134 | 0.9687637 | 26.3505008 | 19.8966066 |
1 | 0 | 0 | 0.9553056 | 0.9630788 | 25.9470152 | 21.1666563 | 0.9475134 | 0.9687637 | 26.3505008 | 19.8966066 | 0.8838331 | 0.8930201 | 26.4806235 | 20.1784284 |
For training data
feats = list(range(14)) # 14 = 3 + 4 + 4 + 3
target = data[:, -1]
data = data[:, feats]
print(target)
print(data)
Copy the code
out:
[20.7049346 21.2929114 21.1666563 19.8966066 20.1784284] # targets
[[ 1. 0. 0. 0.81461 0.7713562 25.6489069
19.586492 0.8389952 0.8077665 25.6791006 20.1518381 0.8978015
0.8878931 26.0605436]
[ 1. 0. 0. 0.8389952 0.8077665 25.6791006
20.1518381 0.8978015 0.8878931 26.0605436 20.7049346 0.9728421
0.9654959 25.7540871]
[ 1. 0. 0. 0.8978015 0.8878931 26.0605436
20.7049346 0.9728421 0.9654959 25.7540871 21.2929114 0.9553056
0.9630788 25.9470152]
[ 1. 0. 0. 0.9728421 0.9654959 25.7540871
21.2929114 0.9553056 0.9630788 25.9470152 21.1666563 0.9475134
0.9687637 26.3505008]
[ 1. 0. 0. 0.9553056 0.9630788 25.9470152
21.1666563 0.9475134 0.9687637 26.3505008 19.8966066 0.8838331
0.8930201 26.4806235]]
Copy the code
Now we have a total of 5 pieces of data. Next, we divide the training data into training set and test set
# for train set
indices = [i for i in range(len(data)) if i % 3! =0]
print(indices)
Copy the code
out:
[1.2.4]
Copy the code
That is, the subscripts of the training set data are 1, 2, 4
All that’s left is the test set data
# for dev set
indices_2 = [i for i in range(len(data)) if i %3= =0]
print(indices_2)
Copy the code
out:
[0.3]
Copy the code
And then we’ll translate data and target into tensor
data = torch.FloatTensor(data[indices])
target = torch.FloatTensor(target[indices])
print(data)
print(target)
Copy the code
out:
tensor([[ 1.0000.0.0000.0.0000.0.8390.0.8078.25.6791.20.1518.0.8978.0.8879.26.0605.20.7049.0.9728.0.9655.25.7541],
[ 1.0000.0.0000.0.0000.0.8978.0.8879.26.0605.20.7049.0.9728.0.9655.25.7541.21.2929.0.9553.0.9631.25.9470],
[ 1.0000.0.0000.0.0000.0.9553.0.9631.25.9470.21.1667.0.9475.0.9688.26.3505.19.8966.0.8838.0.8930.26.4806]])
tensor([21.2929.21.1667.20.1784])
Copy the code
Data normalization
It can be seen that the data sizes of different features are greatly different. In order to balance their influence on the model, it is necessary to normalize the data. The method is:
It is common to normalize all data so that it falls between [-1,1] or [0,1]. Take the latter for example.
Linear function normalization (min-max Scaling) :
For a group of data, the minimum value is m and the maximum value is m, then for any data X, its normalization formula is:
Note: This method implements equal scale scaling of the original data.
0 Z-Score Standardization:
0 mean normalization method normalized the original data set into a data set with a mean of 0 and variance of 1, and the normalization formula is as follows:
Note: this kind of normalization method requires that the distribution of original data can be approximately gaussian distribution, otherwise the normalization effect will become very bad!
So here we’re using the zero mean normalization.
data[:, 3:] =( data[:, 3:] - data[:, 3:].mean(dim=0, keepdim=True))\
/ data[:, 3:].std(dim=0, keepdim=True) # STD = standard deviation
print(data)
Copy the code
out:
tensor([[ 1.0000.0.0000.0.0000, -1.0037, -1.0104, -1.1051, -1.0286, -1.0893,
-1.1540.0.0184.0.1048.0.7532.0.6065, -0.8144],
[ 1.0000.0.0000.0.0000.0.0075.0.0212.0.8424.0.0599.0.8764.0.5413, -1.0091.0.9435.0.3813.0.5477, -0.3017],
[ 1.0000.0.0000.0.0000.0.9962.0.9892.0.2628.0.9687.0.2129.0.6127.0.9907, -1.0483, -1.1346, -1.1542.1.1161]])
Copy the code
In the homework, I tried two methods, and found that the convergence speed of linear function normalization is obviously slower than that of 0 mean normalization, and the final accuracy is lower.
Load the data
A
DataLoader
loads data from a givenDataset
into batches.
Look at the relationship between a DataLoader and a DataSet and learn what Batch is.
Note: Shuffle must be set to false during the test. Otherwise, the sequence of each training set will be different, causing errors.
4. Complete code
class COVID19Dataset(Dataset) :
''' Dataset for loading and preprocessing the COVID19 dataset '''
def __init__(self,
path,
mode='train',
target_only=False) :
self.mode = mode
# Read data into numpy arrays
with open(path, 'r') as fp:
data = list(csv.reader(fp))
data = np.array(data[1: []) :1:].astype(float)
if not target_only:
* * * * * * * * * * * * * * * * * * * * * * * * *
feats = list(range(93)) # 93 = 40 states + day 1 (18) + day 2 (18) + day 3 (17)
else:
# TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
pass
if mode == 'test':
# Testing data
# data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
data = data[:, feats]
self.data = torch.FloatTensor(data)
else:
# Training data (train/dev sets)
# data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
target = data[:, -1]
data = data[:, feats]
# Splitting training data into train & dev sets
if mode == 'train':
indices = [i for i in range(len(data)) if i % 10! =0]
elif mode == 'dev':
indices = [i for i in range(len(data)) if i % 10= =0]
# Convert data into PyTorch tensors
self.data = torch.FloatTensor(data[indices])
self.target = torch.FloatTensor(target[indices])
# Normalize features (you may remove this part to see what will happen)
self.data[:, 40:] =
(self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True))
/ self.data[:, 40:].std(dim=0, keepdim=True)
self.dim = self.data.shape[1]
print('Finished reading the {} set of COVID19 Dataset ({} samples found, each dim = {})'
.format(mode, len(self.data), self.dim))
def __getitem__(self, index) :
# Returns one sample at a time
if self.mode in ['train'.'dev'] :# For training
return self.data[index], self.target[index]
else:
# For testing (no target)
return self.data[index]
def __len__(self) :
# Returns the size of the dataset
return len(self.data)
# DataLoader
def prep_dataloader(path, mode, batch_size, n_jobs=0, target_only=False) :
''' Generates a dataset, then is put into a dataloader. '''
dataset = COVID19Dataset(path, mode=mode, target_only=target_only) # Construct dataset
dataloader = DataLoader(
dataset, batch_size,
shuffle=(mode == 'train'), drop_last=False,
num_workers=n_jobs, pin_memory=True) # Construct dataloader
return dataloader
Copy the code
Network building
NeuralNet
is annn.Module
designed for regression. The DNN consists of 2 fully-connected layers with ReLU activation. This module also included a functioncal_loss
for calculating loss.
1. ReLU
Nonlinear activation function
If the excitation function is not used, then in this case the output of each layer is a linear function of the input of the upper layer, then no matter how many layers the neural network has, the output is a linear combination of the input, which is equivalent to no hidden layer. This is the original perceptron.
Therefore, we decide to introduce nonlinear functions as excitation functions so that deep neural networks make sense. The output is no longer a linear combination of inputs, but it can approximate any function, and the earliest idea was to use sigmoid or tanh, and the output is bounded, and it’s easy to serve as the input for the next level.
Why ReLU?
-
Sigmoid and other functions are used to calculate the activation function (exponential operation), which requires a large amount of calculation. When calculating the error gradient by back propagation, the derivation involves division, which requires a large amount of calculation. Using Relu activation function saves a lot of calculation in the whole process.
-
For the deep network, when the sigmoID function is propagated back, it is easy for the gradient to disappear (when the SigmoID function is close to the saturation region, the change is too slow and the derivative tends to 0, which will cause information loss), so the training of the deep network cannot be completed.
-
Relu will make the output of some neurons be 0, thus resulting in the sparsity of the network, reducing the interdependence of parameters and alleviating the occurrence of over-fitting problems.
2. nn.MSEloss()
Pytorch MSEloss() has two Boolean arguments:
parameter | role |
---|---|
size_average | Whether to sum and average |
reduce | Whether the output is scalar |
reduction | Whether the output is scalar |
Illustrate!
input = torch.randn(2.3,requires_grad=True) # prediction
target = torch.ones(2.3) # ground truth
print(f'input: {input}\n target: {target})
Copy the code
Out:
input: tensor([[-0.0733, -2.2085, -0.6919], [...1.1417, -1.1327, -1.5466]], requires_grad=True)
target: tensor([[1..1..1.],
[1..1..1.]])
Copy the code
default
The default size_average=True, reduce=True. Finally, return a scalar.
loss_1 = nn.MSELoss()
output_1 = loss_1(input,target)
print(f'loss_1: {output_1}')
Copy the code
Out:
loss_1: 1.8783622980117798
Copy the code
size_average=False
I don’t divide by n factorial.
loss_1 = nn.MSELoss(size_average=False)
output_1 = loss_1(input,target)
print(f'loss_1: {output_1}')
Copy the code
Out:
loss_1: 11.819371223449707
Copy the code
reduce=False
Returns the tensor.
loss_1 = nn.MSELoss(reduce=False)
output_1 = loss_1(input,target)
print(f'loss_1: {output_1}')
Copy the code
Out:
loss_1: tensor([[0.0039.0.2338.3.5550],
[0.1358.2.1851.0.1533]], grad_fn=<MseLossBackward0>)
Copy the code
For reduction, it is a combination of size_average and reduce!
This is part of the legacy_get_string function:
if size_average is None:
size_average = True
if reduce is None:
reduce = True
if size_average and reduce:
ret = 'mean'
elif reduce:
ret = 'sum'
else:
ret = 'none'
if emit_warning:
warnings.warn(warning.format(ret))
return ret
Copy the code
3. Regularization
Original linear model:
That is:
But how do we determine n? Or how do we know how many features to set for x? If the X dimension is too high, it will be easily overfitting, while if the X dimension is too low, it will be underfitting. In order to weaken the influence of some dimensions and make the function smooth, regularization can be applied to the function.
The point to note here is that overfitting and underfitting occur on test sets. For the training set, the higher the dimension, the better the fitting effect, as shown in the figure:
The higher the dimension, the larger the function field, then of course can cover the best function on the training set.
4. Complete code
class NeuralNet(nn.Module) :
''' A simple fully-connected deep neural network '''
def __init__(self, input_dim) :
super(NeuralNet, self).__init__()
# Define your neural network here
# TODO: How to modify this model to achieve better performance?
self.net = nn.Sequential(
nn.Linear(input_dim, 64),
nn.ReLU(),
nn.Linear(64.1))# Mean squared error loss
self.criterion = nn.MSELoss(reduction='mean')
def forward(self, x) :
''' Given input of size (batch_size x input_dim), compute output of the network '''
return self.net(x).squeeze(1)
def cal_loss(self, pred, target) :
''' Calculate loss '''
# TODO: you may implement L1/L2 regularization here
return self.criterion(pred, target)
Copy the code
training
1. Basic functions
getattr()
This is equivalent to the “.” operation. The parameters are as follows:
- Object: indicates the instance of an object
- Name: string, the name of the object’s member function or its member variable
- Default: Returns the default value if the property does not exist in the object
- Exception: Throws “AttrbuteError” when there is no default return value for this property
getattr(object, name) = object.name
model.train() v.s. model.eval()
model.train
Enable Batch Normalization and Dropout.
If you have Batch Normalization and Dropout layers in the model, you need to add model.train() during training. Model.train () is the mean and variance of each batch of data that is guaranteed to be used by BN layer. For Dropout, model.train() takes a random number of network connections to train and update the parameters.
model.eval()
Batch Normalization and Dropout are not enabled.
If there is a Batch Normalization and Dropout layer in the model, add Model.eval () during testing. Model.eval () is to ensure that the BN layer can use the mean and variance of all training data, that is, to ensure that the mean and variance of BN layer remain unchanged during the test. For Dropout, Model.eval () leverages all network connections without randomly dropping neurons.
After the train sample is trained, the generated model model will be used to test the sample. Model.eval () needs to be added before model(test), otherwise it will change weights if there is input data. This is the property of having a BN layer and Dropout in the model.
detach().cpu()
detach()
- Function: Block back propagation
- Return value: tensor, but the variable is still on the GPU
cpu()
- Action: Moves data to the CPU
- Return value: tensor
item()
- What it does: Get the value of a tensor.
numpy()
- Translate the tensor into numpy array
2. A basic routine
When iterating over epochs during training, we often use optimizer.zero_grad(), Loss.backward () and optimizer.step() in sequence.
Such as:
while epoch < n_epochs:
model.train() # set model to training mode
for x, y in tr_set: # iterate through the dataloader
optimizer.zero_grad() # set gradient to zero
x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
pred = model(x) # forward pass (compute output)
mse_loss = model.cal_loss(pred, y) # compute loss
mse_loss.backward() # compute gradient (backpropagation)
optimizer.step() # update model with optimizer
loss_record['train'].append(mse_loss.detach().cpu().item())
....
Copy the code
In general, their functions are as follows:
- optimizer.zero_grad(): Gradient returns to zero
- The training process usually uses the mini-batch method, so if the gradient is not cleared, the gradient will be related to the data of the previous batch, so the function should be written before the back propagation and gradient descent.
- loss.backward(): Back propagation calculates the gradient of each parameter
- If you don’t have a tensor. Backward (), the gradient will be None, so loss. Backward () is written before optimizer.step().
- optimizer.step(): Gradient descent updates parameters
- The step() function performs an optimization step, updating the parameter values by gradient descent. Since gradient descent is gradient-based, the loss.Backward () function should be performed to calculate the gradient before executing the Optimizer.step () function.
Common parameter variables in functions:
-
Param_groups: When instantiated, the Optimizer class creates a param_groups list in its constructor that contains a param_groups dictionary of length 6 (num_groups depends on how many groups of parameters were passed in when the Optimizer was defined). Each param_group contains six key value pairs [‘params’, ‘LR ‘, ‘momentum’, ‘dampening’, ‘weight_decay’, ‘nesterov’].
-
Param_group [‘ params’] : By incoming list of model parameters, namely instantiation Optimizer classes into the group of parameters, if the parameter group, is for the entire model parameters of the model. The parameters (), each parameter is a torch. The nn. The parameter. The parameter object.
3. Complete code
def train(tr_set, dv_set, model, config, device) :
''' DNN training '''
n_epochs = config['n_epochs'] # Maximum number of epochs
# Setup optimizer
optimizer = getattr(torch.optim, config['optimizer'])(
model.parameters(), **config['optim_hparas'])
min_mse = 1000.
loss_record = {'train': [].'dev': []} # for recording training loss
early_stop_cnt = 0
epoch = 0
while epoch < n_epochs:
model.train() # set model to training mode
for x, y in tr_set: # iterate through the dataloader
optimizer.zero_grad() # set gradient to zero
x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
pred = model(x) # forward pass (compute output)
mse_loss = model.cal_loss(pred, y) # compute loss
mse_loss.backward() # compute gradient (backpropagation)
optimizer.step() # update model with optimizer
loss_record['train'].append(mse_loss.detach().cpu().item())
# After each epoch, test your model on the validation (development) set.
dev_mse = dev(dv_set, model, device)
if dev_mse < min_mse:
# Save model if your model improved
min_mse = dev_mse
print('Saving model (epoch = {:4d}, loss = {:.4f})'
.format(epoch + 1, min_mse))
torch.save(model.state_dict(), config['save_path']) # Save model to specified path
early_stop_cnt = 0
else:
early_stop_cnt += 1
epoch += 1
loss_record['dev'].append(dev_mse)
if early_stop_cnt > config['early_stop'] :# Stop training if your model stops improving for "config['early_stop']" epochs.
break
print('Finished training after {} epochs'.format(epoch))
return min_mse, loss_record
Copy the code
validation
Dev () is very similar to train(), but note that the mode of model is eval(), that is, no BN and Dropout.
The complete code
def dev(dv_set, model, device) :
model.eval(a)# set model to evalutation mode
total_loss = 0
for x, y in dv_set: # iterate through the dataloader
x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
with torch.no_grad(): # disable gradient calculation
pred = model(x) # forward pass (compute output)
mse_loss = model.cal_loss(pred, y) # compute loss
total_loss += mse_loss.detach().cpu().item() * len(x) # accumulate loss
total_loss = total_loss / len(dv_set.dataset) # compute averaged loss
return total_loss
Copy the code
test
1. torch.cat()
Cat () can splice together multiple tensor sequences.
Parameters:
- Inputs: Sequence of tensors to be connected. It can be any sequence of the same tensor type
- Dim: Select the dimension extension that must be in
0
tolen(inputs[0])
The sequence of tensors is connected along this dimension
Pay attention to
- The input has to be a sequence, and the sequence has to be the same tensor of any of the same shapes
- The dimension may not exceed the dimension of any tensor of the input data
Such as:
t1 = torch.Tensor([1.2.3])
t2 = torch.Tensor([4.5.6])
t3 = torch.Tensor([7.8.9])
list = [t1,t2,t3]
t = torch.cat(list,dim=0)
print(t)
Copy the code
Out:
tensor([1..2..3..4..5..6..7..8..9.])
Copy the code
Modify dim:
t1 = torch.Tensor([[1.2.3], [3.2.1]])
t2 = torch.Tensor([[4.5.6], [6.5.4]])
t3 = torch.Tensor([[7.8.9], [9.8.7]])
list = [t1,t2,t3]
t = torch.cat(list,dim=1)
print(t)
Copy the code
Out:
tensor([[1..2..3..4..5..6..7..8..9.],
[3..2..1..6..5..4..9..8..7.]])
Copy the code
2. Complete code
def test(tt_set, model, device) :
model.eval(a)# set model to evalutation mode
preds = []
for x in tt_set: # iterate through the dataloader
x = x.to(device) # move data to device (cpu/cuda)
with torch.no_grad(): # disable gradient calculation
pred = model(x) # forward pass (compute output)
preds.append(pred.detach().cpu()) # collect prediction
preds = torch.cat(preds, dim=0).numpy() # concatenate all predictions and convert to a numpy array
return preds
Copy the code
Setting hyperparameters
1. Complete code
device = get_device() # get the current available device ('cpu' or 'cuda')
os.makedirs('models', exist_ok=True) # The trained model will be saved to ./models/
target_only = False # TODO: Using 40 states & 2 tested_positive features
# TODO: How to tune these hyper-parameters to improve your model's performance?
config = {
'n_epochs': 3000.# maximum number of epochs
'batch_size': 270.# mini-batch size for dataloader
'optimizer': 'SGD'.# optimization algorithm (optimizer in torch.optim)
'optim_hparas': { # hyper-parameters for the optimizer (depends on which optimizer you are using)
'lr': 0.001.# learning rate of SGD
'momentum': 0.9 # momentum for SGD
},
'early_stop': 200.# early stopping epochs (the number epochs since your model's last improvement)
'save_path': 'models/model.pth' # your model will be saved here
}
Copy the code
Load the data and model
1. Complete code
tr_set = prep_dataloader(tr_path, 'train', config['batch_size'], target_only=target_only)
dv_set = prep_dataloader(tr_path, 'dev', config['batch_size'], target_only=target_only)
tt_set = prep_dataloader(tt_path, 'test', config['batch_size'], target_only=target_only)
Copy the code
To start!
model_loss, model_loss_record = train(tr_set, dv_set, model, config, device)
Copy the code
Data visualization
1. training
Let’s take a look at MSEloss as the number of sessions increases.
plot_learning_curve(model_loss_record, title='deep model')
Copy the code
And look at the predictive effect.
del model
model = NeuralNet(tr_set.dataset.dim).to(device)
ckpt = torch.load(config['save_path'], map_location='cpu') # Load your best model
model.load_state_dict(ckpt)
plot_pred(dv_set, model, device) # Show prediction on the validation set
Copy the code
The dots on the blue line indicate that the predicted value equals the actual value.
2. testing
def save_pred(preds, file) :
''' Save predictions to specified file '''
print('Saving results to {}'.format(file))
with open(file, 'w') as fp:
writer = csv.writer(fp)
writer.writerow(['id'.'tested_positive'])
for i, p in enumerate(preds):
writer.writerow([i, p])
preds = test(tt_set, model, device) # predict COVID-19 cases with your model
save_pred(preds, 'pred.csv') # save prediction file to pred.csv
Copy the code
Preds results:
Improvements
We need to modify the sample code to make the model better!
Public leaderboard
- Simple baseline: 2.04826
- Medium baseline: 1.36937
- Strong baseline: 0.89266
Hints
- Feature selection (what other features are useful?)
- DNN architecture (layers? dimension? activation function?)
- Training (mini-batch? optimizer? learning rate?)
- L2 regularization
- There are some mistakes in the sample code, can you find them?
TODO1: Modify the characteristics used for training
In the COVID-19 19Dataset, we can modify extracted features.
if not target_only:
* * * * * * * * * * * * * * * * * * * * * * * * *
feats = list(range(93)) # 93 = 40 states + day 1 (18) + day 2 (18) + day 3 (17)
else:
# TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
# Use only 42 features
feats = list(range(40))
feats.append(57)
feats.append(75)
pass
Copy the code
Come and see the final result!
Convergence speed greatly improved!
Score:
TODO2: Adds regularization
When setting hyperparameters, add regularization.
config = {
'n_epochs': 3000.# maximum number of epochs
'batch_size': 270.# mini-batch size for dataloader
'optimizer': 'SGD'.# optimization algorithm (optimizer in torch.optim)
'optim_hparas': { # hyper-parameters for the optimizer (depends on which optimizer you are using)
'lr': 0.001.# learning rate of SGD
'momentum': 0.9 # momentum for SGD
'weight_decay': 0.1 # regularization
},
Copy the code
The result is an improvement, but not a significant one, and the rate of convergence is slower (reasonable)…
TODO3: Modify neural network structure
Add more layers
class NeuralNet(nn.Module) :
''' A simple fully-connected deep neural network '''
def __init__(self, input_dim) :
super(NeuralNet, self).__init__()
# Define your neural network here
# TODO: How to modify this model to achieve better performance?
self.net = nn.Sequential(
nn.Linear(input_dim, 128),
nn.ReLU(),
nn.Linear(128.64),
nn.ReLU(),
nn.Linear(64.1))...Copy the code
Just add a linear layer and a ReLU layer…
Some effect, but not very significant…
PReLU
Change ReLU layer to PReLU.
TODO4: optimizer
Adam
It’s a long story, slightly worse than SGD.
And MSEloss seems to fluctuate a bit…
TODO5: Fixes errors
For the standardization of test sets, the mean and variance of training sets should be selected, because test sets may be small, and the mean and variance cannot reflect the characteristics of a large number of data.
if mode == 'test':
# Testing data
# data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
data = data[:, feats]
self.data = torch.FloatTensor(data)
self.data[:, 40:] = \
(self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) \
/ self.data[:, 40:].std(dim=0, keepdim=True)
else:
# Training data (train/dev sets)
# data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
target = data[:, -1]
data = data[:, feats]
indices_train = [i for i in range(len(data)) if i % 10! =0]
tr_data = self.data = torch.FloatTensor(data[indices_train])
tr_mean = tr_data[:, 40:].mean(dim=0, keepdim=True)
tr_std = tr_data[:, 40:].std(dim=0, keepdim=True)
# Splitting training data into train & dev sets
if mode == 'train':
indices = indices_train
self.data = tr_data
self.target = torch.FloatTensor(target[indices])
elif mode == 'dev':
indices = [i for i in range(len(data)) if i % 10= =0]
self.data = torch.FloatTensor(data[indices])
self.target = torch.FloatTensor(target[indices])
self.data[:, 40:] = \
(self.data[:, 40:] - tr_mean) \
/ tr_std
Copy the code
But the result was strange, the score was much worse…
feeling
It took two afternoons to adjust the parameter, and the result was not as good as the first time… I feel that I am not familiar with the model and have no experience in tuning. This homework will come to an end for the time being, the knowledge reserve may be able to cross the strong baseline……
Some useful links
Colab: ML2021Spring – HW1.ipynb – Colaboratory (google.com)
Pytorch Totorial P1: ML2021 Pytorch tutorial part 1 – YouTube
Kaggle: ML2021Spring-hw1 | Kaggle
1: Introduction, Colab & PyTorch Tutorials, HW1_Gods silent Blog -CSDN Blog
Regularization: Pytorch method for Regularization Regularization of L2 and L1
Activation Function -CSDN blog – Activation Function
HWExample: 2021 Machine Learning course assignment 1 – Jianshu.com
Cross Baseline: ML2021Spring hw1-liZHI334
Reference
ReLU: What ReLU does _KAMITA’s blog -CSDN Blog
Train () and eval() : Pytorch: model.train() and model.eval(), and model.eval() and torch.no_grad(
Optimizer.step () : Understand the functions and principles of optimizer.zero_grad(), loss.backward(), optimizer.step()
Source: Heng-Jui Chang @ NTUEE (github.com/ga642381/ML…)