DenseNet model interpretation

Introduction:

** Click to follow and update two computer vision articles daily

Traditional convolutional networks have only one connection for each layer in a forward process. ResNet adds residual connections to increase the flow of information from one layer to the next. FractalNets repeatedly combine several parallel layer sequences with different numbers of convolution blocks, increasing the nominal depth while maintaining a short path forward propagation of the network. Similar operations include Stochastic Depth and Highway Networks.

These models all show a common feature of shortening the path between the front and back layers, with the main purpose of increasing the flow of information between the different layers. DenseNet proposed a new model of connectivity called Dense Connections.

In DenseNet, L(L+1)/2 connections are used. This has several obvious advantages: avoiding the gradient extinction problem, enhancing feature propagation, feature reuse, and substantially reducing the number of parameters.

DenseNet outperformed most of the SOTA models on CIFAR-10, CIFAR-100, SVHN, and ImageNet, achieving better results with fewer calculations.

DenseNet network structure

DenseNet uses Dense connections to form three Dense blocks through several layers of convolution and pooling. The Dense connections in the Dense Block are realized through CONCat.

In ResNet, the recognition function of residual connection is Xi = Hi (Xi-1) +Xi-1, where I refers to the number of layers. The recognition function in DenseNet is Xi = Hi([X0, X1, X2… xi-1]). (Note: An input is convolved with BN, the activation function, to obtain the output, which can be considered as an identification function Hi)

Considering that Concat connection is used, if the number of layers and channels in each Block is too large, the Block will be huge. Here, the number of channels in each layer of the Block is relatively small (in this paper, the number of channels is represented by K, and K is 12, 24 and 40). Here, K also stands for growth rate. If each recognition function produces K channels, then layer I will have K0 + K x(i-1) input channels.

In DenseNet, the structure of a Block is BN + ReLU +3x3Conv. In densenet-B, the Block structure is BN+ReLU+1×1 Conv+BN+ReLU+3×3 Conv, and the dimension is reduced at 1×1 to reduce parameters. In order to make the model smaller, a hyperparameter θ is used in Densenet-C. For blocks with channel number k, the channel number becomes θk in Densenet-C, and θ is 0.5. (Note: DenseNet does not use both the Bottleneck and the hyperparameter θ, and is densenet-BC for models that use both the Bottleneck and θ.)

The structure of DenseNet on ImageNet is shown above, where the blocks are BN+ReLU+3x3Conv structure.

Implementation details

In addition to the four Dense blocks used in the ImageNet dataset, all the others had only three. In front of the first Dense Block, there was a 16-channel output convolution layer with a 3×3 convolution kernel size for processing the input image (Densenet-BC was 32 channels). 1 pixel padding is used for each Layer of convolution to keep the output size constant, 1×1 convolution and 2×2 averaging is used as Transition Layer between the two Dense blocks, and global averaging is used after the last Dense blocks. And softmax classifier.

The feature map sizes of the three Dense blocks are 32×32, 16×16, and 8×8. For general DenseNet, there are three structural configurations: {L=40, k=12}, {L=100, k=12}, {L=100, k=24}. For densenet-BC, the following three network configurations are used: {L=100, k=12}, {L=250, K =24}, {L=190, k=40}. Here L refers to the total number of layers of the model, not the number of layers of the dense Block. (Note: BN, pooling and ReLU are not included in the number of layers).

Image input size: 224×224.

DenseNet theory

Dense connections make the information of the first several layers directly available in the rear layer, and the information is well preserved, increasing the information flow and gradient propagation between different layers, which makes the model easier to train. This method of using the information from all previous layers at each layer is called feature reuse, and it appears in other papers.

Each layer can directly obtain the gradient and input signal from Loss Function (the gradient directly from Loss Function here actually refers to dense block, not DenseNet). Thus Implicit Deep Supervision is realized, which is also conducive to the training of deeper networks.

In addition, it has a regularization effect, which allows it to use smaller model sizes without overfitting.

conclusion

Here C10 stands for CIFAR-10 data set.

Comparing DenseNet with ResNet on ImageNet, it is obvious that DenseNet has better effect.

This article comes from the public CV technical guide model interpretation series.

Welcome to pay attention to the public number CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.

A summary PDF of the following articles can be obtained by replying to the keyword “Technical Summary” in the public account.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

DenseNet network structure

Implementation details

DenseNet theory

conclusion

Other articles

DenseNet model interpretation

DenseNet network structure

Implementation details

DenseNet theory

conclusion

Other articles

Related Posts

OpenCV (3) — Get the important part of the image by bit and operation

The current user does not have write permissions to the target environment.

Analysis of thousands of travel notes to choose the best tourist attractions for you