Original link:tecdat.cn/?p=7954

Original source:Tuo End number according to the tribe public number

 

This example shows how to apply Bayesian optimization to deep learning and how to find the best network hyperparameters and training options for convolutional neural networks.

To train a deep neural network, you must specify the neural network architecture and training algorithm options. Selecting and adjusting these hyperparameters can be difficult and take time. Bayesian optimization is a very suitable algorithm for optimizing hyperparameters of classification and regression models.

 

To prepare data

Download the CIFAR-10 dataset [1]. The dataset contains 60,000 images, each with a size of 32 x 32 and three color channels (RGB). The entire dataset is 175 MB in size.

Load the CIFAR-10 dataset as training images and tags, and test the images and tags.

[XTrain,YTrain,XTest,YTest] = loadCIFARData(datadir);

idx = randperm(numel(YTest),5000);
XValidation = XTest(:,:,:,idx);
XTest(:,:,:,idx) = [];
YValidation = YTest(idx);
YTest(idx) = [];
Copy the code

You can use the following code to display a sample of the training image.

figure; idx = randperm(numel(YTrain),20); For I = 1:numel(idx) subplot(4,5, I); imshow(XTrain(:,:,:,idx(i))); endCopy the code

Select the variables to optimize

Select the variables to be optimized using Bayesian optimization, and specify the scope to search. In addition, specify whether the variable is an integer and whether the interval is searched in the logarithmic space. Optimize the following variables:

  • The depth of the network segment. This parameter controls the depth of the network. The network has three parts, each withSectionDepthSame convolution layer. Therefore, the total number of convolution layers is3*SectionDepth. The objective function later in the script will be proportional to the number of convolutional filters in each layer1/sqrt(SectionDepth). As a result, the number of parameters and the amount of computation required for each iteration are roughly the same for different section depths.
  • The best learning rate depends on your data and the network you are training.
  • Random gradient descent momentum.
  • L2 regularizes strength.
Optimizablevars = [optimizableVariable ('SectionDepth', [1 3], 'Type', 'integer') [1E-21], 'Transform', 'log') optimizableVariable ('Momentum', [0.8 0.98]) optimizableVariable ('L2Regularization', [1e-10 1E-2], 'Transform', 'log')];Copy the code

Perform Bayesian optimization

Create an objective function for the Bayesian optimizer using training and validation data as input. The objective function trains the convolutional neural network and returns the classification error on the verification set.

MakeObjFcn = makeObjFcn (XTrain, YTrain, XValidation, YValidation);Copy the code

Bayesian optimization is performed by minimizing the classification error on the validation set. To take full advantage of the power of Bayesian optimization, you should perform at least 30 objective function evaluations.

After each network completes the training, Bayesopt prints the results to the command window. Then this function returns the filename BayesObject UserDataTrace. The target function saves the network and returns the file name to Bayesopt.

| = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = ============| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | SectionDepth | InitialLearn-| Momentum | L2Regulariza-| | | result | | runtime | (observed) | (estim.) | | Rate | | tion | | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | | | 1 Best | | 2201 | | | 3 | 0.19 0.19 0.19 0.012114 0.8354 0.0010624 | | |Copy the code
|    2 | Accept |      0.3224 |      1734.1 |        0.19 |     0.19636 |            1 |     0.066481 |      0.88231 |    0.0026626 |
Copy the code
| 3 | Accept | | | | | 2 | 0.19374 0.19 1688.7 0.2076 0.022346 0.91149 | | 8.242 e-10 |Copy the code
| | 4 Accept | | | | | 3 | 0.1904 0.19 2167.2 0.1908 0.97586 0.83613 | | 4.5143 e-08 |Copy the code
| | may Accept | | | | | 3 | 0.19274 0.19 2157.4 0.1972 0.21193 0.97995 | | 1.4691 e-05 |Copy the code
| | 6 Accept | | | | | 3 | 0.19 0.19 2152.8 0.2594 0.98723 0.97931 | | 2.4847 e-10 |Copy the code
7 | | Best | | | | | 3 | 0.18819 0.1882 2257.5 0.1882 0.1722 0.8019 | | 4.2149 e-06 |Copy the code
|    8 | Accept |      0.8116 |      1989.7 |      0.1882 |     0.18818 |            3 |      0.42085 |      0.95355 |    0.0092026 |
Copy the code
| | 9 Accept | | 1836 | | | 2 | 0.18821 0.1882 0.1986 0.030291 0.94711 | | 2.5062 e-05 |Copy the code
| | 10 Accept | | | | | 2 | 0.18816 0.1882 1909.4 0.2146 0.013379 0.8785 | | 7.6354 e-09 |Copy the code
11 | | Accept | | 1562 | | | | 1 0.18815 0.1882 0.2194 0.14682 0.86272 | | 8.6242 e-09 |Copy the code
12 | | Accept | | | | | | 1 0.18813 0.1882 1591.2 0.2246 0.70438 0.82809 | | 1.0102 e-06 |Copy the code
| | 13 Accept | | | | | | 1 0.18824 0.1882 1621.8 0.2648 0.010109 0.89989 | | 1.0481 e-10 |Copy the code
14 | | Accept | | 1562 | | | | 1 0.18812 0.1882 0.2222 0.11058 0.97432 | | 2.4101 e-07 |Copy the code
15 | | Accept | | | | | | 1 0.18813 0.1882 1625.7 0.2364 0.079381 0.8292 | | 2.6722 e-05 |Copy the code
16 | | Accept | | | | | | 1 0.18815 0.1882 1706.2 0.26 0.010041 0.96229 | | 1.1066 e-05 |Copy the code
| | 17 Accept | | | | | 3 | 0.18635 0.1882 2188.3 0.1986 0.35949 0.97824 | | 3.153 e-07 |Copy the code
18 | | Accept | | | | | 3 | 0.18817 0.1882 2169.6 0.1938 0.024365 0.88464 0.00024507 | | |Copy the code
19 | | Accept | | | | | | 1 0.18216 0.1882 1713.7 0.3588 0.010177 0.89427 0.0090342 | | |Copy the code
20 | | Accept | | | | | | 1 0.18193 0.1882 1721.4 0.2224 0.09804 0.97947 | | 1.0727 e-10 |Copy the code
| = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = ============| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | SectionDepth | InitialLearn-| Momentum | L2Regulariza-| | | result | | runtime | (observed) | (estim.) | | Rate | | tion | | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | | | 21 Accept | | | | | 3 | 0.18498 0.1882 2184.7 0.1904 0.017697 0.95057 0.00022247 | | |Copy the code
22 | | Accept | | | | | 3 | 0.18527 0.1882 2184.4 0.1928 0.06813 0.9027 | | 1.3521 e-09 |Copy the code
23 | | Accept | | | | | 3 | 0.1882 0.1882 2183.6 0.1934 0.018269 0.90432 0.0003573 | | |Copy the code
24 | | Accept | | | | | | 1 0.18809 0.1882 1707.9 0.303 0.010157 0.88226 0.00088737 | | |Copy the code
25 | | Accept | | | | | 3 | 0.18808 0.1882 2189.1 0.194 0.019354 0.94156 | | 9.6197 e-07 |Copy the code
| | 26 Accept | | | | | | 1 0.18809 0.1882 1752.2 0.2192 0.99324 0.91165 | | 1.1521 e-08 |Copy the code
27 | | Accept | | 2185 | | | 3 | 0.18813 0.1882 0.1918 0.05292 0.8689 | | 1.2449 e-05 |Copy the code
__________________________________________________________
Copy the code

Optimization completed. MaxTime of 50400 seconds reached. Total function evaluations: 27 Total elapsed time: Total objective function evaluation time: 51942.8833 Best observed point: SectionDepth InitialLearnRate Momentum L2Regularization ____________ hold ________________ _____ ________________ 3 0.1722 0.8019 4.2149E-06 Observed Objective Function value = 0.1882 Estimated Objective Function value = 0.18813 function Evaluation time = 2257.4627 Best estimated feasible point (according to models): SectionDepth InitialLearnRate Momentum L2Regularization ____________ hold ________________ _____ ________________ 3 0.1722 0.8019 4.2149E-06 Estimated Objective Function Value = 0.18813 Estimated Function evaluation time = 2166.2402Copy the code

Evaluate the final network

The best network found in load optimization and its verification accuracy.

ValError = 0.1882Copy the code

Predict labels for test sets and calculate test errors. The classification of each image in the test set is regarded as an independent event with a certain probability of success, which means that the number of misclassified images follows a binomial distribution. It is used to calculate approximately 95% confidence intervals for standard error (testErrorSE) and testError95CI generalized error rates (). This approach is often called the Wald approach.

TestError = 0.1864Copy the code
TestError95CI = 1×2 0.1756 0.1972Copy the code

Draw an obfuscation matrix to get test data. Display the accuracy and recall rate for each class by using column and row summaries.

 
Copy the code

 

You can use the following code to display the test image and its predicted classes, as well as the probabilities of those classes.

Optimization objective function

Define the objective function for optimization.

Define convolutional neural network architecture.

  • Fill in the convolution layer so that the spatial output size is always the same as the input size.
  • Increase the number of filters by 2 times each time a spatial dimension is downsampled by 2 times using the maximum pooling layer. Doing so ensures that each convolution layer requires roughly the same amount of computation.
  • Select the number of filters proportional to1/sqrt(SectionDepth)The network with different depths has roughly the same number of parameters, and the computation required for each iteration is roughly the same. To increase the number of network parameters and overall network flexibility, increasenumF. To train deeper networks, please changeSectionDepthThe range of variables.
  • useconvBlock(filterSize,numFilters,numConvLayers)To create the piecenumConvLayersConvolution layers, each with a designationfilterSizeandnumFiltersFilters, and each subsequent batching normalizes layer and RELU layer. theconvBlockThe function is defined at the end of this example.

Specify validation data, and then select a ‘ValidationFrequency’ value so that trainNetwork validates the network once per period. Train for a fixed number of periods and reduce your learning rate by a factor of 10 during the last period. This reduces the noise of parameter updating and makes the settlement of network parameters closer to the minimum of the loss function.

Use data enhancement to randomly flip training images along the vertical axis and convert them randomly horizontally and vertically to four pixels.

Training network and draw training progress in the training process.

 

The trained network is evaluated on the validation set, the predicted image labels are calculated, and the error rate is calculated on the validation data.

Create a file name that contains the validation error, and then save the network, validation error, and training options to disk. Objective function fileName bayesopt returns as output parameters, and returns all of the file name BayesObject. UserDataTrace.

The convBlock function creates a numConvLayers convolution layer block. Each convolution layer has a specified filterSize and numFilters filter. Each convolution layer is followed by a batch normalization layer and a ReLU layer.

 
Copy the code

reference

[1] Kryzewski, Alex. “Learning multiple layers of functionality from tiny images.” (2009). www.cs.toronto.edu/~kriz/learn…


Most welcome insight

1. Improved Nelson-Siegel model fitting yield curve analysis with r language using neural network

2. R language to achieve fitting neural network prediction and result visualization

3. Python uses genetic algorithm-neural network-fuzzy logic control algorithm for lottery analysis

4. Python for NLP: Classification using Keras’s multi-label text LSTM neural network

5. Use R language to realize the neural network to predict the stock example

6.R language deep learning image classification based on Keras small data set

7. An example of SEQ2SEQ model for NLP uses Keras for neural machine translation

8. Deep learning model analysis of sugar based on grid search algorithm optimization in Python

9. Matlab uses Bayesian optimization for deep learning