A hot pot, enough to forget temporary unpleasantness.

gossip

Now I feel like every day there is something new to learn, today learn this, tomorrow learn that. I think this is very bad, learning what is on the surface of the walk, will not have a good result. Specific how to solve this problem, I am not very clear, there is a suggestion of small partners welcome message exchange 🤔.

preface

SENet:

Paper: arxiv.org/abs/1709.01…

Code: github.com/hujie-frank…

SKNet:

Paper: arxiv.org/abs/1903.06…

Code: github.com/implus/SKNe…

SENet series of articles

  • SENet
  • CSENet (essentially the same, the hyperparameter R of the ascending and descending dimensions is 2 or 16), sSENet (spatial Angle introduces attention mechanism), csSENet (channel and space both introduce attention mechanism)
  • SKNet

SSENet spatial perspective introduces attention mechanism, the SSE module used in the paper “Non-deep Networks” in October of this year, reaches 80% in ImageNet with a 12-layer network (proposes an SSE module with non-deep network design to build a new module repVGG-SSE), At CIFAR10, it reached 96% and at CIFAR100, 81%.

SENet

SENet (squeeze-and-congestion Networks) is the 2017ImageNet classification champion model.

SENet’s idea is to use global information to enhance useful information and suppress useless information at the same time, that is, different weighting methods are applied to different channels, so that useful features are amplified and useless ones are suppressed. Here is the main frame for Squeeze and Congestion Modual. The dimension of the output feature map X of the upper layer is H ‘, W ‘and C’, which respectively represent the length, width and number of channels. After a traditional convolution, U is obtained, and its dimension is H, W and C. Now I’m going to do an SE structure. Add a bypass branch in the normal process, then perform the Squeeze operation to globally pool U and compress it into a scalar. Then we do the math. The output is multiplied by the initial U and the end result is a re-weighted feature map.

SENet is integrated with Inception and ResNet

SE Block is integrated with other architectures, first with Inception and second with ResNet. One is normal Inception Modual and the other is added SE Block. H✖️W✖️C, Squeeze, global average pooling, change H✖️W✖️C to 1 sip 1 MCC, then reduce the calculation amount by 1/r through FC (full connection) dimension reduction, r=16 is the best result through experiment. After ReLU, FC dimension enhancement, and sigmoID, each channel gets a probability of 0 to 1, which can be multiplied with the original feature map, and finally output the weighted feature map. This ResNet is integrated with SE Block. There is no big difference from Inception integration except residual connection, SE operation is used for residual connection.

SKNet

Selective Kernel Network, or SKNet, was developed by the same team as SENet.

The idea of SKNet is to dynamically adjust the size of receptive field by integrating information in a non-linear way, that is, to find the most suitable receptive field to adapt to the picture according to the actual content. Its core method is the split-fuse-select three steps.

Split-Fuse-Select

  • Split: Multiple tributaries are divided, and each tributary has filter/kernel of different sizes, so as to realize receptive fields of different sizes.
  • Fuse: Integrate the tributary information and get selection weights;
  • Select: Aggregates feature maps through selection weights

This is the structure of the paper, very straightforward.

First is X points out two tributaries, U1, U2, then the U1, U2 integrate get U, and then to get S global average pooling, U in Z to FC S operation, and then points out the two small branch of a and b, and then to softmax between these two branches of small, then calculate, is the calculation of the above, You get a yellow and a green matrix, and then you combine it with an addition.

How to adjust the size of receptive field by image size?

The first is the Split of the process, points out the two tributaries, they feel the wild is different, if the image is larger, can choose receptive field larger tributaries, small, will choose feeling wild small tributaries, as well as the integration of the Select operation, from above, from below, their proportion is dynamically selected.

Here is the complete architecture of SKNet. M is the number of branches and G is the number of groups. These are the three network diagrams. It can be seen that SENet is directly added to the full connection layer after the complete convolution operation (1×1 convolution +3×3 convolution +1×1 convolution). SKNet replaces the 3*3 convolution part in ResNext. In terms of the number of parameters, the number of parameters of SKNet is roughly the same as that of SENet due to the different embedding positions of modules. The amount of computation also went up slightly.

The table above is a horizontal comparison with other networks to see how SKNet compares with other common models, including SENet. You can see that SKNet has achieved the best results.

After analyzing the influence of D and G on the model, it can be seen that the best case is 3 times 3, D=2, G=32 and 5 times 5, D=1, G=64.

Then it is the influence of different combinations on the model. The paper concludes that increasing the number of branches (M) reduces the overall error rate, and using SK is better than not using SK. After using SK, increasing the number of branches improves the accuracy of the model slightly.

Then, it also analyzes attention weights, namely the two small tributaries separated from Z. The paper concludes that when the size of an object increases, the total path of attention weights at 5✖️5 also increases, which means that the receptive field increases.

The figure above is the difference of the average attention weights.

The paper concludes that in the early layer, objects get bigger and more attention increases. In the later layer, this rule does not exist, which means that the SKS in the later layer can be replaced with other cells.

conclusion

  • SENet proposes the Sequeeze and Congestion block, and SKNet proposes Selective Kernel Convolution.
  • SENet and SKNet are lightweight modules that can be embedded directly into the network, such as ResNet, Inception, ShuffleNet, for improved accuracy.