This paper reviews all kinds of representative CNN models in deep learning, and analyzes their characteristics and design ideas in detail. Of course, this series can’t contain all the models, but we can get a sense of the core ideas. If necessary, we will supplement it later.
The author has a three \ | said
Edit | says there are three
01
From LeNet5 to VGG
LeNet5 is not the starting point of CNN, but it is its Hello World, which makes us see the commercial prospect of convolutional neural network.
AlexNet is CNN to large-scale commercial fired the first shot, won ImageNet 2012 classification champion, declared the return of the king of neural network. With its simple structure, VGG has become the most widely used benchmark in various computer vision fields in the past few years. \
They all have simple and elegant structures, and they all come out the same door. This paper explains how increasing depth improves the performance of deep learning models. Detailed interpretation is as follows:
From LeNet to VGG, see the network structure of convolution + pooled series \
02
1 * 1 convolution
1 * 1 just N * N convolution convolution convolution itself a special case of the nuclear radius size degradation to 1, but because of its less computational cost nonlinear expression ability, strengthens the network to network structure in the horizontal and vertical development provides a very good tool, is often used in litres of peacekeeping dimension reduction operation, especially in the deep web and have higher request for computational efficiency is widely used in the network.
Detailed interpretation is as follows:
Network in network 1*1 convolution, do you understand
03
GoogLeNet
GoogLeNet won ImageNet’s 2014 classification champion, also known as Inception V1. Inception V1 has a depth of 22 layers with a number of 5M parameters. VGGNet of the same period has similar performance as Inception V1, but far more parameters than Inception V1. Inception benefits from Inception Module structure as shown below:
The operation results of four parallel channels, 1*1 convolution, 3*3 convolution, 5*5 convolution and 3*3 maximum pooling, were fused to extract the information of different scales of images. If VGG wins for depth, GoogLeNet wins for width, and of course 1*1 convolution plays a big role, which is also critical in SqueezeNet. Detailed interpretation is as follows:
Inception structure in GoogLeNet, do you understand it
04
MobileNets
MobileNets uses Depthwise Convolution(Deep Separable Convolution) to construct a lightweight 28-layer neural network, which is a high performance benchmark model for mobile devices.
A Depthwise convolution is dedicated to spatial information in the channel, and a Pointwise convolution is dedicated to information fusion across the channel. The two work together and become powerful. A series of models based on this, such as Shufflenet, are the following. Detailed interpretation is as follows:
Talk about the mobile benchmark model MobileNets\
05
Residual network
When the deep network cannot effectively train the deeper network due to the problems such as gradient disappearance, the residual network derived from highway Network arises at the right moment, with the academic halo of MSRA and He Keming, explaining the simple truth that it is effective because it is simple, but you may not be able to think of and do it.
Detailed interpretation is as follows:
【 Model Interpretation 】 Residual connection in Resnet, are you sure you really understand? \
06
Abnormal convolution
Who says convolution has to be neat and square? MSRA is always a place for new ideas. With the foundation of Spatial Transform Network and active convolution, deformable convolution network arrives as scheduled.
The article is still very simple. This is a model dedicated to improving CNN’s ability to recognize objects with different geometric deformations. The key is the variable receptive field.
【 Model interpretation 】 “Naughty” convolutional neural network \
07
Densely connected network
Speaking of which, DenseNet is just an upgraded version of residual network. Each layer in the network is directly connected with the previous layer, which makes the residual to the extreme and improves the utilization rate of features. Because each layer of the network can be designed to be very narrow, improving computing performance.
But still, even if you can think of it, you may not be able to do it, we will separate detailed interpretation as follows:
What’s good about fully connected convolutional networks? \
08
Nonlocal neural networks
Convolutional neural networks are successful because of local connections and weight sharing, but their receptive fields are limited. To do this, we have to use deeper networks, which brings up three problems. (1) The computational efficiency is not high. (2) Low perceptual efficiency. (3) Increase the difficulty of optimization. This time, the team led by The god of Learning Keming started to learn from the traditional noise reduction algorithm non-local.
It’s not really mainstream, but it doesn’t hurt to know.
From “locally connected” back to “fully connected” neural network \
09
Multiple input network
We are used to input an image or video sequence, output classification, segmentation, target detection and other results of the network, whether it will think of input two or more pictures to complete some tasks, this is the multi-input network structure.
From searching, matching, to sorting, tracking, there are a lot of things it can do that you should know about.
Deep learning networks can only have one input
10
3 d convolution
I’m tired of playing with 2D convolution, so it’s time to jump to higher dimensional convolution, and the common one is 3D convolution.
Although 3D brings about the computation of inflation, it is also a little exciting to think that it can be used for video classification and segmentation, 3D point clouds.
What is the difference between 2D convolution and 3D convolution
11
RNN and LSTM
Not all input is a picture, there are a lot of information is not fixed length or size, such as video, voice, then it is the turn of RNN, LSTM.
Don’t say much, study hard:
Analysis of RNN to LSTM\
12
GAN
In recent years, the biggest progress in the field of unsupervised learning and even in the field of deep learning is generative adversarial network GAN, known as the next generation of deep learning. In terms of research heat and number of papers, GAN has approached or even surpassed the traditional discriminant CNN architecture. With the enthusiasm of researchers, GAN has evolved from a generator and a discriminator to multiple generators and discriminators.
Get in the car, because it’s getting late.
conclusion
I hope that after this series, friends can have a better understanding of the structure of CNN, from only using other people’s models to learning how to design and tune by themselves.
Please pay more attention to the author’s column “There are three AI Schools” on Zhihu
Please follow and share ↓↓↓\
ID: 92416895\
Currently ranked no.1 in machine learning knowledge planet (3300+ users)
Past wonderful review \
-
Conscience Recommendation: Introduction to machine learning and learning recommendations (2018 edition) \
-
Github Image download by Dr. Hoi Kwong (Machine learning and Deep Learning resources)
-
Printable version of Machine learning and Deep learning course notes \
-
Machine Learning Cheat Sheet – understand Machine Learning like reciting TOEFL Vocabulary
-
Introduction to Deep Learning – Python Deep Learning, annotated version of the original code in Chinese and ebook
-
The mathematical foundations of machine learning
-
Machine learning essential treasure book – “statistical learning methods” python code implementation, ebook and courseware
-
Blood vomiting recommended collection of dissertation typesetting tutorial (complete version)
-
Installation of Python (Anaconda+Jupyter Notebook +Pycharm)
-
What if Python code is ugly? Recommend a few artifacts to save you
-
Blockbuster | complete AI learning course, the most detailed resources arrangement!
\