For computer vision tasks, image classification is one of the main tasks, such as image recognition, target detection and so on, these tasks are involved in image classification. Convolutional neural network (CNN) is one of the most widely used and successful networks in computer vision tasks. Most deep learning researchers first start from CNN, and the first project they get started with should be handwritten MNIST digital image recognition, through which they can roughly master the basic operation process of image classification. However, because the project is too mature, they may only know what it is and not know why it is. Therefore, when confronted with other image classification tasks, researchers may not know how to start, or how to select the pre-training network model, or how to adjust the existing mature model, how to design the layers of the model, how to improve the accuracy, etc. These problems should be considered when choosing to use convolutional neural model to complete image classification tasks. When CNN is selected for image classification task, three main indexes need to be optimized: accuracy, simulation speed and memory consumption. These performance metrics are closely related to the design model. Different networks weigh these performance metrics, such as VGG, Inception, and ResNets. It is common to fine-tune these mature model frameworks to perform image classification tasks by adding and removing layers, using extended layers, and using different network training techniques. This paper is a guide for the optimization design of image classification task using CNN, which is convenient for readers to quickly grasp the problems and experience encountered in the design of image classification model. This paper focuses on the three performance indicators of accuracy, speed and memory consumption to expand, introduces different CNN classification methods, and discusses the performance of these methods on these three performance indicators. In addition, you can see the various modifications to these mature CNN methods and the performance of the modifications. Finally, we will learn how to optimize and design a CNN network model for a specific image classification task.
Network type
Reduce running time and memory consumption with intelligent convolution design
Recent advances in THE OVERALL design of CNN have produced some amazing alternatives that can speed up the running time of CNN simulations and reduce memory consumption without losing too much accuracy. All of the following can be easily integrated into the ABOVE CNN maturity model:
- MobileNets: The use of deep separable convolution technology greatly reduces the amount of computation and memory consumption under the condition of sacrificing only 1%-5% of the accuracy. The degree of precision reduction is proportional to the decrease of the amount of computation and memory consumption.
- Xnor-net: binary convolution is used, that is, the convolution kernel has only two values: -1 or 1. This design allows the network to be very sparse, so network parameters can be easily compressed without taking up too much memory.
- ShuffleNet: Using pointwise group convolution and channel shuffle greatly reduces computation cost, and the accuracy of network model is better than MobileNets.
- Network Pruning: Remove part of the structure of CNN model to reduce the simulation running time and memory consumption, but also reduce the accuracy. In order to maintain accuracy, it is best to remove parts of the structure that do not have much effect on the final result.
The depth of the network
For CNN, there are some common methods to increase the number of channels and depth to increase the accuracy, but the simulation running speed and memory will be sacrificed. However, it should be noted that the effect of increasing the number of layers on the improvement of accuracy is decreasing, that is, the more layers are added, the less effect of the subsequent layers on the improvement of accuracy, and even the over-fitting phenomenon will occur.
The activation function
For neural network model, activation function is essential. Traditional activation functions, such as Softmax and Tanh, are no longer applicable to CNN model. Some relevant researchers have proposed some new activation functions, such as ReLU activation function proposed by Hinton. ReLU activation function is usually used to get some good results. You do not need to tweak parameters as you would with the ELU, PReLU or LeakyReLU functions. Once it is determined that good results can be obtained with ReLU, other parts of the network can be optimized and parameters adjusted to expect better accuracy.
Convolution kernel size
It may be widely assumed that using large convolution kernels (e.g., 5×5, 7×7) will always yield the highest accuracy, however, this is not always the case. Researchers have found that using larger convolution kernels makes the network difficult to separate, and it is best to use smaller kernels like 3×3, which has been well demonstrated by ResNet and VGGNet. In addition, convolution kernel like 1×1 can also be used to reduce the number of Feature maps.
Empty convolution
Dilated Convolutions use the spacing between them in order to be able to use pixels far from the center, an operation that allows the network to increase its receptive field without increasing network parameters, that is, without increasing memory consumption. Relevant papers show that the use of empty convolution can increase the accuracy of network, but also increase the time of simulation.
Data expansion
Deep learning relies on big data, and using more data has been shown to further improve model performance. With the expansion of processing, will be free to obtain more data, using the expansion method depends on the specific tasks, such as, what are you doing self-driving cars task, may not have an inverted tree, cars and buildings, so for the vertical flip is meaningless, however, when the weather changes and the whole scene, It makes sense to change the light and flip the image horizontally. There’s a great data repository.
Training to optimize
There are several optimization algorithms to choose from when optimizing network training process. The commonly used algorithm is stochastic gradient Descent algorithm (SGD), but this algorithm needs to adjust the learning rate and other parameters, which is a little boring process; In addition, adaptive learning rate gradient descent algorithm, such as Adam, Adagrad or Adadelta algorithm, is relatively easy to implement, but may not obtain the best accuracy of gradient descent algorithm. The best approach is to follow a similar approach to activation functions, using simple training methods to see if the designed model works well, and then tweaking and optimizing it in more complex ways. My recommendation is to start with Adam. This method is very easy to use: just set a low learning rate, usually default to 0.0001, and you will generally get very good results, which can then be fine-tuned using the SGD algorithm.
Class balance
In many cases, you may encounter data imbalances. What do I mean by unbalanced data? A simple example: Suppose you are training a network model to predict whether someone in a video is armed with a deadly weapon. But the training data included only 50 videos with weapons, compared with 1,000 videos without weapons. If the training is done using this data set, the model definitely tends to predict that there are no weapons in the video. Some things can be done to solve this problem:
- Weight usage in loss functions: Add a higher weight to the loss function for categories with a small amount of data, such that any incorrect classification of that particular category will result in a very high error output from the loss function.
- Oversampling: Repeated training instances with underrepresented categories can help improve model accuracy.
- Undersampling: Sampling a category with a large amount of data to reduce the imbalance between the two.
- Data expansion: To expand a category with a small amount of data.
Optimized transfer learning
For most data, it is common practice to use transfer learning rather than train the network model from scratch. Transfer learning is based on some mature models, using part of the network structure parameters, training only some new network components. The problem with this process is that which model to choose for migration learning, which network layers to keep, and which network parts to retrain, all depend on what your data set looks like. If your data is more similar to a pre-trained network (which is typically trained by ImageNet data sets), there are fewer parts of the network that need to be retrained, and vice versa. For example, suppose that whether an image is trying to area contains grapes, then the data set is composed of contain grape image and does not include the image, the image and the image is very similar in the ImageNet, so only need retraining the selection model for the last several layers, perhaps just training the final full connection layer, because ImageNet is 1000 class, For this task, there are only two categories — the package in the image does not contain grapes, so you only need to change the final full connection layer parameter. Suppose you are trying to classify whether the outer space image contains a planet or not. This kind of data is very different from ImageNet’s data set, so you need to retrain the convolution layer behind the model.
conclusion
This paper is a guide for the optimization design of using CNN to complete the image classification task, and gives some common optimization methods for beginners to adjust and optimize the network model according to the given rules.
Dozens of ari cloud products limited time discount, quickly click on the coupon to start cloud practice!
The author information
George ran, machine learning engineer, personal homepage: www.linkedin.com/in/georgese… This article is translated by Ali Yunqi Community Organization. A Comprehensive Design Guide for Image Classification CNNs The article is a brief translation. For more details, please refer to the original text.