Take you through a paper series on computer vision –Inception V4
Always on the road, not in running, is walking.
preface
Thesis: Inception- V4, Inception-ResNet and the Impact of Residual Connections on Learning
CNN is very strong, for example our Inception; Resnet has also been strong recently. So what happens when the powers join forces?
- In terms of speed, residual learning can accelerate inception convergence;
- Accuracy: residual learning only brings a small part of improvement;
- Propose a new model;
- The activation value scaling technique is proposed to train the model.
- SOTA(State Of The Art) was achieved in The 2015ILSVRC challenge, which has performance similar to The latest generation Of Inception v3 network;
- Using three residuals and a set of Inception- V4, a 3.08% top-5 error was achieved on a test set of ImageNet Classification (CLS) challenges.
A residual Inception network performs better than the same Inception network without a residual connection.
Review the Res.net
The main idea is to combine the residual and inception structures to get the benefits of residual.
The paper details
Because Inception networks tend to be very deep, the filter cascade phase of Inception architecture is replaced with residual connections.
ResNet Network highlights:
- Supergod network structure (breaking 1000 layers);
- Present the Residual module;
- Use Batch Normalization to accelerate training (dropping Dropput);
Thus Inception gains all the benefits of a residual approach while maintaining its computational efficiency.
Take you through a paper series on computer vision -GoogLeNet
Take you to read paper series on computer vision –Inception V2 /BN-Inception
Computer Vision -GoogLeNet V3
The parameters and computational complexity of the model limit Inception V3’s performance. Inception V4 has a more uniform simplified architecture and more Inception modules than Inception V3.
Inception V4 and Inception-ResNet V2 perform similarly, surpassing the most advanced single-frame performance ImageNet validation set.
Introduction of residual modules.
ResNet residual connection.
V4 makes a series of neat optimizations for V3.
Inception’s structure is easy to adjust, meaning that changing a few Fitlers will not ultimately affect the result. However, the authors carefully adjusted the size of each layer to optimize the training speed. Now, due to the optimization function of TensorFlow, the author believes that it is no longer necessary to carefully adjust each layer according to experience as before, and the parameters of each layer can be set in a more standardized way. While Inception-V4 is proposed, and the network structure is as follows:
Stem mode of pure Inception- V4 and Inception- resnet-V2 networks. This is the input part of these networks.
Inception V4 Specific blocks are shown below:
35×35 grid module mode for pure Inception- V4 network. This is the Inception-A block in the Inception-V4 network architecture.
Mode of 17×17grid module for pure Inception- V4 network. This is the Inception-B block in the Inception-V4 network architecture.
8×8 grid module mode for pure Inception- V4 network. This is the Inception- C block in the Inception-V4 network architecture.
35 x 35 to 17 x 17 Reduction module mode.
17×17 to 8×8 Grid-Reduction module mode. This is the reduction module used by the pure Inception-V4 network in the Inception-V4 network architecture.
A model of 35×35 grid module (Inception- resnet-a) for the Inception- resnet-V1 network.
The mode of 17×17 grid (Inception- Resnet-b) module in the Network-v1 network.
The module used for the smaller network-v1 network is the “Reduction-B”17×17 to 8×8 grid restore module.
A mode of 8×8 grid module for the Inception- resnet-V1 network.
Inception- resnet-v1 is the backbone of the network.
Architecture of network-V1 and Network-V2 networks. This architecture works for both networks, but the underlying components are different.
The specific Block of Inception- resnet-v2 is shown below:
The mode of 35×35 Grid (Inception- Resnet-a) module for Inception- Resnet-V2 network.
The mode of 17×17 Grid (Inception- Resnet-b) module of the Inception- Resnet-V2 network.
17×17 to 8×8 grid restore module mode. The Reduction-B module used by the more extensive Inception- resnet-v1network shown here.
The mode of 8×8 grid (Inception- Resnet-c) module of the Inception- Resnet-V2 network.
The overall structure of minnetus-V4 and minnetus-resnet-V2 is similar, they are both a STEM with multiple iterations of Inception or Inception block, followed by reduction, Then repeat the structure several times.
Number of filters for the Reduction-A module for the three Inception variants.
K represents 1✖️1 Conv, L represents 3✖️3 Conv, M represents 3✖️3 Conv stride is 2, n represents 3✖️3 Conv stride is 2.
When there are more than 1000 convolution kernels, “dead” neurons appear. At the end of the average. An output value of 0 will appear before the pooling layer. The solution is to either reduce the learning rate or add additional Batch normalization to these layers.
If the residuals are scaled and then added to the layers to be added, the network will be more stable in the training process. So the scaling block simply scales the final linear activation with an appropriate constant, usually around 0.1, using this scaling factor to scale the residual network, and then adding. Scale before summation for stable training. The scaling coefficient is between 0.1 and 0.3.
Similar instability is also solved by warm-up proposed by Resnet in Resnet. When there are many convolution kernels, even a small learning rate (0.00001) does not make the training stable.
Scaling is not necessary! Whether some application scenario can be found to make scaling necessary is an open area of study.
The experiment
Inception- V3 Top-1Error during training compared to residual Inception with similar computational cost. Evaluation was performed on a single crop of non-blacklisted images from the ILSVRC-2012 validation set.The Residual version training was much faster and ended up with slightly higher accuracy than the conventional Inception v4.
Experimental results of Single crop-Single model. A non-blacklisted subset of the ILSVRC2012 validation set is reported.
It can be seen that there is not much difference between Inception-V4 and Inception- resnet-V2, but both are much better than Inception-V3 and Inception- resnet-v1.
conclusion
- Inception- resnet-v1: A hybrid version of Inception
- Inception- Resnet-V2: A higher-cost hybrid version of Inception with significantly improved recognition capabilities.
- Inception-V4: A pure Inception variant, with no residual connection and approximately the same recognition capability as Inception- resnet-V2.
This paper mainly studies how to use residual learning to improve the training speed of inception (close to the topic, residual learning can only accelerate the training, not improve the accuracy). In addition, our latest model (with and without residual connections) is superior to all of our previous networks simply because of the increase in model size.