DenseNet takes you through a paper series on computer vision
If I could control myself, I would keep my foodie heart down.
The introduction
Dense Connected Convolutional Networks
code
This paper reconsiders the meaning of short path and feature reuse and introduces the idea of dense join.
DenseNet based on ResNet (review ResNet🐂, ResNet Classics!!) , to further expand the network connection. For any layer of the network, the feature map of all layers in front of this layer is the input of this layer, and the feature map of this layer is the input of all layers in the back.
Abstract:
-
Residual connection makes CNN deeper, stronger and more efficient;
-
DenseNet is proposed in this paper, which is characterized by the front layer as the connection of all the back layers.
-
Usually L layer has L connections, DenseNet has L* (L+1) /2;
-
The front layer serves as the input for all the back layers;
-
DenseNet has the advantages of reducing gradient disappearance, enhancing feature propagation, strengthening feature reuse and reducing weight parameters.
-
Validation was performed on four data sets, representing SOTA.
If the layers close to the input and output are short-circuited, convolutional neural networks can be deeper, more precise, and more efficient. Finally, dense convolutional network (DenseNet) is proposed. Each layer of DenseNet is connected to the following layer in the forward feedback mode. Unlike L layer traditional convolutional neural network, which has L connections, each layer of DenseNet is connected to the following layer, so the DenseNet of L layer has L(L+1)/2 connections.
Our structure was evaluated on four benchmark sets of target recognition (CIFAR-10, CIFAR-100, SVHN, and ImageNet) and it was found that DenseNet achieved better performance with less computation.
The paper details
Dense connectivity
Describes the layout of the generated DenseNet. In each block, the output of each layer is directly connected to the input of all subsequent layers. To ensure maximum information flow between layers in the network, we connect all layers (with matching feature map sizes) directly to each other. To preserve the feedforward nature, each layer takes additional input from all the preceding layers and passes its own characteristic graph to all subsequent layers.
ResNets and Highway Networks bypass signals from one layer to the next via identity connections. Random depth shortens ResNet by randomly removing layers during training to allow better information and gradient flow.
To ensure maximum information flow between layers in the network, we connect all layers (with matching feature map sizes) directly to each other. To preserve the feedforward nature, each layer takes additional input from all the preceding layers and passes its own characteristic graph to all subsequent layers.
Most importantly, unlike ResNet, we never combine features by summation before passing them to a layer; Instead, we combine features by concatenation.
- Dense connection can obtain more feature images with fewer parameters.
- The layer of traditional NN can be regarded as a state. NN modifies or retains the state.
- ResNet retains information through identity mapping;
- Many feature maps in ResNet are of little use;
- The state of ResNet (characteristic diagram) is like RNN network;
- DenseNet does not add features and retain information.
- The DenseNet network is narrow.
The success of Highway Network and ResNet both benefits from By passing Paths.
- In addition to deepening the network, there is a wider way to improve the network;
- Such as GoogLenet, widened ResNet.
With sufficient depth, ResNet can improve its performance. FractalNet also achieved competitive results using an extensive network structure on several data sets.
- DenseNet differs from other networks in increasing width and depth;
- DenseNet uses feature reuse to get trainable and efficient network.
DenseNet does not derive characterization capabilities from very deep or very broad architectures, but uses feature reuse to harness the potential of networks to produce condensed models that are easy to train and parameter efficient. By connecting the feature maps learned from different layers in series, the input changes of subsequent layers are increased and the efficiency is improved. This constitutes a major difference between DenseNet and ResNet. DenseNet is simpler and more efficient than Inception Networks, which also connects features of different layers.
There are other noteworthy innovations in network architecture that have produced competitive results. The network of networks (NIN) structure incorporates miniature multilayer perceptrons into the filter of the convolutional layer to extract more complex features. In Deeply Supervised net-work (DSN), each internal layer is directly Supervised by an auxiliary classifier, which can strengthen the gradient received by each layer in the early stage. Trapezoidal networks introduce horizontal connections in autoencoders, yielding impressive accuracy in semi-supervised learning tasks. The Deep Convergence network (DFN) is proposed to improve information flow by combining the middle layers of different underlying networks.
-
Shortcut Connection is easier to optimize;
-
The disadvantage is that the summative form impeded communication of a message.
-
CNN needs to downsample but the resolution in Denseblock does not change
-
Feature map resolution is reduced between blocks
-
Use the Transition Layer to do this
-
BN + 11 conv + 22 pooling
- If k feature graphs are obtained after calculation for each layer, then the l layer will have K0 + K *(L -1) features
Graph, so k can’t be too big; 2. Each layer of DenseNet is very narrow and thin, for example, K =12; 3. K is hyperparametric Growth Rat; 4. The smaller k is, the better the result is; 5. Explain DenseNet from the perspective of state;
This article is another clear explanation of bottleneck.
Before each 3×3 convolution, 1×1 convolution is introduced as a bottleneck to reduce the number of input feature graphs and improve computational efficiency.
In order to further improve the compactness of the model, we can reduce the number of characteristic maps of the transition layer. If a dense block contains m feature plots, we let the transition layer below generate bθ MC output feature plots, where 0<θ≤1 is called the compression factor. When θ=1, the number of characteristic maps across the transition layer remains constant. We call DenseNet with θ<1 densenet-C, and we set θ=0.5 in the experiment. When using both a Bottleneck and a transition layer with θ<1, we call our model Densenet-BC.
On all datasets except ImageNet, the DenseNet used in our experiment has three dense blocks, each with the same number of layers. Before entering the first dense block, convolution with 16 output channels (or twice the densenet-BC growth rate) is performed on the input image. For the convolution layer with a kernel size of 3×3, each side of the input is filled with a pixel to keep the size of the feature graph fixed. We use 1×1 convolution, followed by 2×2 average pooling as a transition layer between two continuous dense blocks. At the end of the last dense block, global average pooling is done, and then a SoftMax classifier is attached. The size of feature maps in the three dense blocks were 32×32, 16×16 and 8×8, respectively. We experiment with the basic DenseNet structure with configurations {L=40,k=12}, {L=100,k= 12}, and {L=100,k=24}. For densenet-BC, evaluate the network with configurations {L=100,k=12},{L=250,k=24}, and {L=190,k=40}.
In our experiments on ImageNet, we use a densenet-BC structure with four dense blocks on a 224×224 input image. The initial convolution layer consists of 2k convolution with a size of 7×7 and a step of 2. The number of feature graphs in all other layers is also from setting K. The exact network configuration we used on ImageNet is shown below:Densenet-121 refers to the network has 121 layers :(6+12+24+16)*2 + 3(transition layer) +1 (7×7 Conv) +1 (Classification layer) = 121.
The transition layer; 1 by 1 convolution, and then 2 by 2 pooling evenly.
The experiment
L indicates the network depth and K indicates the growth rate. The blue font indicates the optimal result, and + indicates data augmentation of the original data set. DenseNet has a lower error rate and fewer parameters than ResNet.
Discuss the advantages of DenseNet
Conclusion: add B and C to save parameters.
With the same precision, densenet-BC uses only 1/3 parameter.
The same precision, more than ten times the number of parameters.
As a direct result of concatenating inputs, the feature maps learned at each layer of DenseNets can be used by any subsequent layer. This method is helpful to reuse network features, and thus a more simplified model is obtained. Densenet-bc achieved similar accuracy using only about one-third of the number of ResNets
Heat maps of the three dense blocks are shown below:The average absolute filtering weight of the convolution layer in a well-trained dense network. The colors of the pixels (s) encode the average level of weights (normalized by the number of input feature graphs) connecting the convolutional layer to the ‘dense region. The three columns highlighted by the black rectangle correspond to the two transition layers and classification layers. The first line is the weights connected to the dense block input layer.
The following conclusions can be drawn
-
Within the same block, all layers pass their weights to other layers as input. This suggests that features extracted from early layers can be utilized by deep layers under the same dense block.
-
The weight of the transition layer can also be transmitted to all layers of the former dense Block, that is, DenseNet information can flow from the first layer to the last layer in a very indirect way;
-
All layers in the second and third dense block assign the least weight to the output of the transition layer, indicating that the transition layer outputs many redundant features. This has to do with the strong results of Densenet-BC;
-
Although the final classifier also uses weights across the whole dense block, it seems to focus more on the final feature map, suggesting that the end of the network also generates some higher-level features.
conclusion
We propose a new convolutional network architecture called dense convolutional network (DenseNet). It introduces a direct connection between any two layers with the same feature graph size. We showed DenseNet. Scale to hundreds of layers naturally, while showing no optimization difficulties.
As the number of parameters increases, DenseNets tends to continue to improve in accuracy without any performance degradation or overfitting. In multiple Settings, it achieved state-of-the-art results on several competitive data sets. In addition, DenseNets requires fewer parameters and fewer computations to achieve state-of-the-art performance. Because we used hyperparameter Settings for residual network optimization in our study, we believe that the accuracy of DenseNets can be further improved by adjusting hyperparameter and learning rate plans in more detail.
DenseNets naturally integrates identity mapping, depth monitoring, and diverse depth features while following simple connection rules. They allow features to be reused across the network, so that more compact, accurate models can be learned. Because of its compact internal representation and reduced feature redundancy, DenseNets may be a good feature extractor for a variety of convolutional feature-based computer vision tasks. In future work, DenseNets will be used to investigate this feature transfer.