preface
Today, I would like to introduce a relatively relaxed topic, simply discussing the influence of data set size on deep learning training. I don’t know if you’ve seen this article:Don’t use deep learning your data isn’t that big
Yes, there is evidence for the limitations of deep learning: when you have less data, deep learning is no better than other traditional methods.
The author who put forward this statement carried out two tests using MNIST data set to test whether the computer can correctly recognize 0 and 1. The methods adopted are as follows:
- In the 5-layer deep neural network, the active function is hyperbolic tangent function.
- The other method is Li Gasso variable selection method. The idea of this method is to select 10 pixels with the smallest marginal P value (it is ok to use these values for regression).
And then came to a conclusion:
Ligasso’s method outperforms neural networks. What?
The body of the
So back to the topic, is the above statement true? Can we correctly conduct deep learning training when the data is relatively small so as to achieve satisfactory results?
As we all know, a neural network is a universal function of infinite depth, and we put in x and we get y, and we go through a lot of complicated calculations. Theoretically, all the problems that can be solved by traditional algorithms can be solved by deep learning. However, if the neural network is deep enough, although the function of the network is very powerful, if the data is not enough, it is easy to achieve the phenomenon of over-fitting, thus failing to achieve the desired effect.
So whether the data set is too small can be solved by deep learning, let’s test it.
A one-dimensional signal
Our test data is very simple, not our usual three-channel RGB diagram (3 x 256 x 256), but the ordinary one-channel one-dimensional signal (1 x 168).
Above is our one-dimensional signal, 532nm and 1064Mn correspond to two different signals respectively, and we only need to process one signal. The format of the signal is a.mat file, which is a MATLAB file.
In the file above, the train data set is 161 x 168. The first line is the X-axis coordinates. We just need the Y-axis data, and then every 40 data groups are classified into four categories, namely 2-41, 42-81, 82-121 and 122-161. The test dataset is 81×168, and the first row is the same x-coordinate, we don’t care. Every 20 data groups are of the same category. So we have four types of signals to classify.
The values of label are 0, 1, 2, and 3.
We only had 160 sets of training data and 80 sets of test data.
Data is read
The deep learning library we used was Pytorch, using Python’s Scipy library, which is a linear function processing library. Of course, we only used it to read mat files.
Create a file to read.py, importing the following header file.
import torch
import torch.utils.data as data
import scipy.io
import os
import os.path as osp
Copy the code
Then we write the file reading class.py:
Convert raw data to the data format required for training
def to_tensor(data) :
data = torch.from_numpy(data).type(torch.float32)
data = data.unsqueeze(0)
return data
Read the data class
class LineData(data.Dataset) :
def __init__(self, root, name=532, train=True, transform=to_tensor) :
self.root = os.path.expanduser(root)
self.name = name
self.train = train
self.transform = transform
self.classes = [0.1.2.3]
if not osp.exists('datasets') :raise FileExistsError('Missing Datasets')
if self.train:
self.train_datas = []
self.train_labels = []
dataset_dir = osp.join(self.root, 'train_{}nm.mat'.format(self.name))
train_data = scipy.io.loadmat(dataset_dir)['lineIntensity']
data_length = len(train_data) - 1 # 161-1 = 160
if self.transform:
for i in range(data_length): # 0-159
self.train_datas.append(transform(train_data[i+1])) # i+1 => 1 - 160
self.train_labels.append(self.classes[int(i / 40)])
else:
raise ValueError('We need tranform function! ')
if not self.train:
self.test_datas = []
self.test_labels = []
dataset_dir = osp.join(self.root, 'test_{}nm.mat'.format(self.name))
test_data = scipy.io.loadmat(dataset_dir)['lineIntensity']
data_length = len(test_data) - 1 # 81-1 = 80
if self.transform:
for i in range(data_length): # 0-79
self.test_datas.append(transform(test_data[i+1])) # i+1 => 1 - 80
self.test_labels.append(self.classes[int(i / 20)])
else:
raise ValueError('We need tranform function! ')
def __getitem__(self, index) :
""" Args: index (int): Index Returns: tuple: (image, target) where target is index of the target class. """
if self.train:
data, target = self.train_datas[index], self.train_labels[index]
else:
data, target = self.test_datas[index], self.test_labels[index]
return data, target
def __len__(self) :
if self.train:
return len(self.train_datas)
else:
return len(self.test_datas)
Copy the code
Programming neural network
After writing the file reading code, we will design the neural network, because the amount of data is very small, so the number of layers of our neural network should also be reduced. Otherwise, it is easy to overfit.
We first design a neural network with five layers, two convolution layers, one pooling layer and two linear layers. The activation function uses Relu:
The length of each data is168Model: two convolution layers and two linear layersclass Net(nn.Module) :
def __init__(self) :
super(Net, self).__init__()
self.conv1 = nn.Conv1d(1.10.3) # (168-3)/1 + 1 = 166 (* 10)
self.pool = nn.MaxPool1d(2.2) # (166-2)/2 + 1= 83 (* 10)
self.conv2 = nn.Conv1d(10.20.3) # (83-3)/1 + 1 = 81 (* 20)
self.fc1 = nn.Linear(81*20.100)
self.fc2 = nn.Linear(100.4)
def forward(self, x) :
x = self.pool(F.relu(self.conv1(x)))
x = F.relu(self.conv2(x))
x = x.view(-1.81*20)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
Copy the code
Training and testing
After designing the neural network, let’s train!
Firstly, we write the code of training block. The optimization strategy we use is SGD random descent method (with kinetic energy), and the default learning rate is set at 0.001. The validation method is CrossEntropyLoss, a classical verification method commonly used in classification.
We do the training first and then we verify the accuracy, which is the ratio of the number of times each of these four signals is correct to the total number of times.
# Main application page
import torch
import torch.nn as nn
import torch.utils.data
import torch.optim as optim
from model import Net
from data_utils import LineData
root = 'datasets' The directory where the data resides, relative to the directory address
train_name = '532' # 或者 '1064'
# device = torch.device('cuda:0')
Read the file class
trainset = LineData(root, name=train_name)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True)
testset = LineData(root, name=train_name, train=False)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=True)
net = Net()
# net = net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# Change the following epoch to 2 can achieve 100% accuracy
epoch_sum = 1
# training
for epoch in range(epoch_sum):
loss_sum = 0.0
for i, data in enumerate(trainloader, 0) : inputs, labels = data optimizer.zero_grad() outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() loss_sum += loss.item()if i % 10= =9:
print('[epoch:{} num:{}] loss:{}'.format(epoch, i, loss_sum / 20))
loss_sum = 0.0
print('Finished Training')
# validation
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
inputs, labels = data
outputs = net(inputs)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 80 test images: %d %%' % (
100 * correct / total))
pass
Copy the code
Ok, let’s start training. Since there’s very little data, we can just train on the CPU.
For the first training, epoch was 1 and LR was 0.001:
[epoch:0 num:9] loss:1.693671927541118 e+16
[epoch:0 num:19] loss:53694975.30745087
[epoch:0 num:29] loss:6.2672371854667905 e+28
[epoch:0 num:39] loss:51403236.52956776
Finished Training
Accuracy of the network on the 80 test images: 25 %
Copy the code
It doesn’t look good… The accuracy rate is 25%, it seems to be a guess, and the loss fluctuates very violently. It should be that our learning rate is adjusted too high, so we should adjust the learning rate next.
For the second training, the epoch was 1 and lr was 0.0001:
[epoch:0 num:9] loss:133432.54784755706
[epoch:0 num:19] loss:67940.00796541572
[epoch:0 num:29] loss:109.18773172795773
[epoch:0 num:39] loss:1.1358043849468231
Finished Training
Accuracy of the network on the 80 test images: 25 %
Copy the code
The decline of loss is very gentle, but the epoch seems to be insufficient, so the decline of loss is not complete and the accuracy is still low. Let’s adjust the epoch.
For the third training, epoch is 5 and LR is 0.0001:
[epoch:0 num:9] loss:3024598166.2773805
[epoch:0 num:19] loss:3117157163.829549
[epoch:0 num:29] loss:258.4028107881546
[epoch:0 num:39] loss:0.6990358293056488
[epoch:1 num:9] loss:0.6830220401287079
[epoch:1 num:19] loss:66.56461009383202
[epoch:1 num:29] loss:0.7117315053939819
[epoch:1 num:39] loss:0.6977931916713714
[epoch:2 num:9] loss:0.6974189281463623
[epoch:2 num:19] loss:0.6898959457874299
[epoch:2 num:29] loss:0.7101178288459777
[epoch:2 num:39] loss:0.6914324820041656
[epoch:3 num:9] loss:0.686737447977066
[epoch:3 num:19] loss:0.6972651600837707
[epoch:3 num:29] loss:0.7028001189231873
[epoch:3 num:39] loss:0.6998239696025849
[epoch:4 num:9] loss:0.6997098863124848
[epoch:4 num:19] loss:0.6969940900802613
[epoch:4 num:29] loss:0.696108078956604
[epoch:4 num:39] loss:0.6910847663879395
Finished Training
Accuracy of the network on the 80 test images: 25 %
Copy the code
Loss has declined to a certain level, but the accuracy rate is still very confusing. It may still be because the learning rate is too high that loss has been stuck in a range and cannot be completely reduced. We will try to reduce the learning rate again.
For the fourth training, epoch was 2 and LR was 0.00001:
[epoch:0 num:9] loss:200.58453428081702
[epoch:0 num:19] loss:5.724525341391564
[epoch:0 num:29] loss:0.2976263818090047
[epoch:0 num:39] loss:0.05558242934057489
[epoch:1 num:9] loss:0.0004892532759185996
[epoch:1 num:19] loss:0.00012833428763769916
[epoch:1 num:29] loss:9.479262493137242 e-05
[epoch:1 num:39] loss:3.948449189010717 e-05
Finished Training
Accuracy of the network on the 80 test images: 100 %
Copy the code
Perfect, it seems that we have found out the appropriate learning rate (0.00001), after 10 tests, the accuracy rate is: 100%, 100%, 100%, 100%, 100%, 100%, 100%, 100%, 100%, 100%, 98%.
If I changed the epoch from 2 to 1, it would be 100%, 77%, 100%, 100%, 100%, 100%, 86%, 100%, 100%, 100%, 100%, 100%.
The epoch changed from 1 to 3, and it was: 100%, 100%, 100%, 100%, 100%, 100%, 100%, 100%, 100%, 100%, 100%, 100%, 100%, 100%.
If we modify the neural network layer to three fully connected layers, the effect will be very poor if LR is 0.00001. Even if we train more than 10 EPOchy, the accuracy will not reach 100%. However, if LR is reduced to 0.000001, the accuracy will reach 100% :
class Net(nn.Module) :
def __init__(self) :
super(Net, self).__init__()
self.fc1 = nn.Linear(168.1000)
self.fc2 = nn.Linear(1000.100)
self.fc3 = nn.Linear(100.4)
def forward(self, x) :
x = x.view(-1.168)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Copy the code
conclusion
Through the above test, it seems that the accuracy rate can achieve a good effect as long as the design layer is reasonable and the learning rate is appropriate for the data set less than 200.
Actually said fitting often because we design the neural network layers deep, but not so much data, the neural network can fully “squeeze” the training data, the excessive absorption of the training set information, lead to less accurate at the time of testing, said if the data set is too little, can reduce the number of layers for ways to reduce the error.
However, if the data contains a lot of information, but the amount of data is very small, then the number of light adjustment layers is not enough, we need some data enhancement technology to expand the data set, so as to “feed” the neural network, so as not to make the neural network abnormal. Of course, data set expansion is for the information containing very rich information to achieve, if the information is like the one-dimensional signal we used before, generally there is no need to expand.
Pulled me
- If you are like-minded with me, Lao Pan is willing to communicate with you;
- If you like Lao Pan’s content, welcome to follow and support.
- If you like my article, please like 👍 collect 📁 comment 💬 three times ~
If you want to know how Lao Pan learned to tread pits, and if you want to talk with me about his problems, please follow the public account “Oldpan blog”. Lao Pan will also sort out some of his own private stash, hoping to help you, click on the mysterious portal to obtain.