Make writing a habit together! This is the fourth day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.
preface
In order to ensure the comparability of the experiment, we used the control variable method to compare the influence of LR =0.1, LR =0.01, LR =0.001 and LR =0.0001 on image classification. In order to simplify the experiment, we used Lenet-5 network structure as the master so that everyone was familiar with it. Similarly, our data set was the same. Lenet-5 network structure detailed introduction you can refer to the Internet can also refer to: juejin.cn/post/707478… In view of the unsatisfactory results of the three kinds of convolution kernel training in my last blog, the convolution kernel with a size of 3 x 3 was selected as the experimental subject without changing the convolution structure. (Previous blog address: juejin.cn/post/708236…
One. Before comparison
The explicit invariants are as follows: 1.1 The data set remains unchanged (total category is 10, handwritten digital data set, single category is 500) 1.2 The division of training set and verification set remains unchanged (training: verification =7: 3) 1.3 Network structure except convolution kernel is the same 1.4 Number of training rounds is the same 1.5 loss function is the same 1.6 verification frequency is the same 1.7 hardware devices are the same in order to avoid unexpected phenomena, we train these five learning rates for three times respectively and take the mean values as the final statistical values
options = trainingOptions('sgdm',... 'InitialLearnRate, 0.01,... 'maxEpochs', 20, ... 'ValidationData', imdsValidation, ... 'ValidationFrequency',5,... 'Verbose',false,... 'Plots','training-progress'); % shows training progressCopy the code
2. The learning rate is 0.1
The following information can be obtained from the following figure: 2.1 Verification accuracy: 91.423% 2.2 training duration: 39s 2.2 Convergence of loss curve (normal without over-fitting or under-fitting) \
3. The learning rate is 0.01
The following information can be obtained from the following figure: 3.1 Verification accuracy: 91.91% 3.2 Training history 39s 3.2 Convergence of loss curve (normal without over-fitting or under-fitting) \
4. The learning rate is 0.001
The following information can be obtained from the figure below: 4.1 Verification accuracy: 89.176% 4.2 Training duration: 39s 4.2 Loss curve is not completely convergent and there is still room for decline (at this time, the number of iteration rounds should be increased)
5. Learning rate is 0.0001
The following information can be obtained from the figure below: 5.1 Verification accuracy: 79.823% 5.2 Training duration: 39s 5.2 Loss curve is obviously not converging and there is a lot of room for decline (at this time, the number of iteration rounds should be increased)
6. Summary
For the model with the same network structure running on the same hardware, we found that as the learning rate gradually approaches from 0.1 to 0.0001, the convergence of the function will also become slow. If the accuracy needs to be increased or loss decreased, the number of training rounds should be appropriately increased. With proper learning rate and number of training rounds, the optimal solution of the model can be achieved in the shortest time, saving the time cost of training and avoiding unnecessary waste of computing power.