directory
Abstract
I. Overview of SENet
Ii. Detailed explanation of SENet structure
3. Detailed calculation process
Application of SENet in a specific network (code implementation SE_ResNet)
The first residual module
The second residual module
ResNet18, ResNet34 model complete code
ResNet50, ResNet101, ResNet152 complete code
Abstract
I. Overview of SENet
Squeezing-and-congestion Networks (SENet for short) is a new network architecture proposed by Momenta And WMW that utilizes SENet, Won the last ImageNet 2017 Contest Image Classification task, reduced the top-5 error in ImageNet dataset to 2.251%, the original best score was 2.991%.
In this paper, SENet block is inserted into various existing classification networks and good results are achieved. The author’s motivation is to explicitly model the interdependencies between feature channels. In addition, the author does not introduce a new spatial dimension for the fusion of feature channels, but adopts a new strategy of “feature recalibration”. Specifically, the importance of each feature channel is automatically acquired through learning, and then the useful features are promoted according to this importance and the features that are not useful for the current task are suppressed.
Generally speaking, the core idea of SENet is to learn feature weights based on loss through the network, so that the effective feature map has a large weight, and the ineffective or ineffective feature map has a small weight to achieve better results. SE Block embedded in some original classification networks inevitably increases some parameters and calculation, but it is acceptable in terms of effect. The Sequeeze-and-congestion (SE) block is not a complete network structure, but rather a substructure that can be nested into other classification or detection models.
Ii. Detailed explanation of SENet structure
Squeeze and Congestion are two key operations in the above configuration, which are explained in detail below.
The diagram above shows the SE module. Given an input x, the number of characteristic channels is, a feature with the number of characteristic channels C is obtained after a series of general transformations such as convolution. Also re-label the previously obtained characteristics with the following three operations:
1. Squeeze operation, carry out feature compression along the spatial dimension, and turn each two-dimensional feature channel into a real number, which has global receptive field to some extent, and the dimension of output matches the number of feature channels of input. It represents the global distribution of responses on the characteristic channel and enables the layer close to the input to obtain the global receptive field, which is very useful in many tasks.
2. It is a gate mechanism similar to the one found in a cyclic neural network. Weights are generated for each feature channel by parameter W, where parameter W is learned to explicitly model the correlation between feature channels.
3. For the Reweight operation, the weight of the output is the importance of each attribute after the attribute selection. Then the weighting is multiplied to the previous attribute per channel.
3. Detailed calculation process
First of all,This step is a conversion operation (strictly speaking, it does not belong to SENet, but to the original network, as can be seen from the combination of SENet and Inception and ResNet network). In this paper, it is just a standard convolution operation. The definition of input and output is as follows:
Then thisThe formula of is the following formula 1 (convolution operation,Denotes the c convolution kernel,Represents the s th input).
The U you get is the second three-dimensional matrix on the left in Figure1, also called tensor, or C H*W feature maps. While UC represents the c-th two-dimensional matrix in U, and the subscript C represents channel.
Then there is the Squeeze operation, and the formula is very simple: a global average pooling:
Therefore, formula 2 converts the input of H*W*C into the output of 1*1*C, corresponding to the Fsq operation in Figure1. Why does that happen? The result of this step is equivalent to showing the numerical distribution of C feature maps at this layer, or global information.
The next operation is equation 3. In this case, we multiply W1 by z, which is a full connection layer operation. The dimension of W1 is C/r * C, and the r is a scaling parameter, which is 16 in this article. The purpose of this parameter is to reduce the number of channels and thus reduce the computation. Since z is 1*1*C, W1z is 1*1*C/r. Then through a ReLU layer, the output dimension is unchanged; And then multiply by W2, which is also a fully connected process, and W2 has dimensions C*C/ R, so the output dimensions are 1*1*C; Finally, through the sigmoid function, s is obtained:
In other words, the dimension of the resulting S is 1*1*C, where C is the number of channels. This S is actually the core of this paper. It is used to describe the weights of C feature maps in tensor U. And this weight is learned through the previous fully connected layer and nonlinear layer, so it can be trained end to end. The function of these two fully connected layers is to merge the feature map information of each channel, because the preceding squeeze operates in the feature map of a channel.
And then after you have s, you can play with your original tensor U, which is equation 4 down here. If it is a multi, it is channel-wise. Then what does that mean?It’s a two-dimensional matrix,It’s a number, it’s a weight, so it’s the same thing as thetaEvery value in the matrix is multiplied by. Corresponds to Fscale in Figure1.
Application of SENet in a specific network (code implementation SE_ResNet)
After introducing the specific formula implementation, here is how SE Block is applied to a specific network.
The figure above is an example of embedding an SE module into an Inception structure. The dimension information next to the box represents the output for that layer.
Here we use global Average pooling as a Squeeze operation. Then two Fully Connected layers form a Bottleneck structure to model the correlation between channels, and the output and input features have the same number of weights. We first reduced the feature dimension to 1/16 of the input, and then raised it back to the original dimension through a Fully Connected layer after ReLu activation. The advantage of doing this over using a Fully Connected layer is that:
1) It has more nonlinearity and can better fit the complex correlation between channels;
2) Greatly reduce the number of parameters and calculation. Then the normalized weight between 0 and 1 is obtained through a Sigmoid gate. Finally, the normalized weight is weighted to the characteristics of each channel through a Scale operation.
In addition, SE modules can be embedded in modules that contain Skip-connections. The upper right picture is an example of SE embedded in ResNet module. The operation process is basically the same as se-inception except that the Residual features on the branch are recalibrated before Addition. If the features on the rear main branch of the Addition are recalibrated, due to the existence of scale operation of 0~1 on the trunk, gradient dissipation will easily occur near the input layer when the network is deep enough for BP optimization, which makes the model difficult to optimize.
Most of the current mainstream networks are based on these two similar units superimposed by repeat. Thus, SE modules can be embedded in almost all current network structures. We can obtain different kinds of SENet by embedding SE modules in the building block units of the original network structure. Such as SE-BN-Inception, SE-Resnet, SE-Renext, Se-Inception – Resnet-V2, etc.
This example shows how to embed the SE module into the ResNet network by implementing SE-Resnet. The SE-ResNET model is shown as follows:
The first residual module
The first residual module is used to implement ResNet18 and ResNet34 models, and SENet is embedded after the second convolution.
# The first residual module
class BasicBlock(layers.Layer) :
def __init__(self, filter_num, stride=1) :
super(BasicBlock, self).__init__()
self.conv1 = layers.Conv2D(filter_num, (3.3), strides=stride, padding='same')
self.bn1 = layers.BatchNormalization()
self.relu = layers.Activation('relu')
self.conv2 = layers.Conv2D(filter_num, (3.3), strides=1, padding='same')
self.bn2 = layers.BatchNormalization()
# se-block
self.se_globalpool = keras.layers.GlobalAveragePooling2D()
self.se_resize = keras.layers.Reshape((1.1, filter_num))
self.se_fc1 = keras.layers.Dense(units=filter_num // 16, activation='relu',
use_bias=False)
self.se_fc2 = keras.layers.Dense(units=filter_num, activation='sigmoid',
use_bias=False)
ifstride ! =1:
self.downsample = Sequential()
self.downsample.add(layers.Conv2D(filter_num, (1.1), strides=stride))
else:
self.downsample = lambda x: x
def call(self, input, training=None) :
out = self.conv1(input)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
# se_block
b = out
out = self.se_globalpool(out)
out = self.se_resize(out)
out = self.se_fc1(out)
out = self.se_fc2(out)
out = keras.layers.Multiply()([b, out])
identity = self.downsample(input)
output = layers.add([out, identity])
output = tf.nn.relu(output)
return output
Copy the code
The second residual module
The second residual module is used to implement ResNet50, ResNet101, ResNet152 models, and the SENet module is embedded after the third convolution.
# Second residual module
class Block(layers.Layer) :
def __init__(self, filters, downsample=False, stride=1) :
super(Block, self).__init__()
self.downsample = downsample
self.conv1 = layers.Conv2D(filters, (1.1), strides=stride, padding='same')
self.bn1 = layers.BatchNormalization()
self.relu = layers.Activation('relu')
self.conv2 = layers.Conv2D(filters, (3.3), strides=1, padding='same')
self.bn2 = layers.BatchNormalization()
self.conv3 = layers.Conv2D(4 * filters, (1.1), strides=1, padding='same')
self.bn3 = layers.BatchNormalization()
# se-block
self.se_globalpool = keras.layers.GlobalAveragePooling2D()
self.se_resize = keras.layers.Reshape((1.1.4 * filters))
self.se_fc1 = keras.layers.Dense(units=4 * filters // 16, activation='relu',
use_bias=False)
self.se_fc2 = keras.layers.Dense(units=4 * filters, activation='sigmoid',
use_bias=False)
if self.downsample:
self.shortcut = Sequential()
self.shortcut.add(layers.Conv2D(4 * filters, (1.1), strides=stride))
self.shortcut.add(layers.BatchNormalization(axis=3))
def call(self, input, training=None) :
out = self.conv1(input)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
b = out
out = self.se_globalpool(out)
out = self.se_resize(out)
out = self.se_fc1(out)
out = self.se_fc2(out)
out = keras.layers.Multiply()([b, out])
if self.downsample:
shortcut = self.shortcut(input)
else:
shortcut = input
output = layers.add([out, shortcut])
output = tf.nn.relu(output)
return output
Copy the code
ResNet18, ResNet34 model complete code
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Sequential
# The first residual module
class BasicBlock(layers.Layer) :
def __init__(self, filter_num, stride=1) :
super(BasicBlock, self).__init__()
self.conv1 = layers.Conv2D(filter_num, (3.3), strides=stride, padding='same')
self.bn1 = layers.BatchNormalization()
self.relu = layers.Activation('relu')
self.conv2 = layers.Conv2D(filter_num, (3.3), strides=1, padding='same')
self.bn2 = layers.BatchNormalization()
# se-block
self.se_globalpool = keras.layers.GlobalAveragePooling2D()
self.se_resize = keras.layers.Reshape((1.1, filter_num))
self.se_fc1 = keras.layers.Dense(units=filter_num // 16, activation='relu',
use_bias=False)
self.se_fc2 = keras.layers.Dense(units=filter_num, activation='sigmoid',
use_bias=False)
ifstride ! =1:
self.downsample = Sequential()
self.downsample.add(layers.Conv2D(filter_num, (1.1), strides=stride))
else:
self.downsample = lambda x: x
def call(self, input, training=None) :
out = self.conv1(input)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
# se_block
b = out
out = self.se_globalpool(out)
out = self.se_resize(out)
out = self.se_fc1(out)
out = self.se_fc2(out)
out = keras.layers.Multiply()([b, out])
identity = self.downsample(input)
output = layers.add([out, identity])
output = tf.nn.relu(output)
return output
class ResNet(keras.Model) :
def __init__(self, layer_dims, num_classes=10) :
super(ResNet, self).__init__()
# Preprocessing layer
self.padding = keras.layers.ZeroPadding2D((3.3))
self.stem = Sequential([
layers.Conv2D(64, (7.7), strides=(2.2)),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.MaxPool2D(pool_size=(3.3), strides=(2.2), padding='same')])# resblock
self.layer1 = self.build_resblock(64, layer_dims[0])
self.layer2 = self.build_resblock(128, layer_dims[1], stride=2)
self.layer3 = self.build_resblock(256, layer_dims[2], stride=2)
self.layer4 = self.build_resblock(512, layer_dims[3], stride=2)
# global pooling
self.avgpool = layers.GlobalAveragePooling2D()
Full connection layer
self.fc = layers.Dense(num_classes, activation=tf.keras.activations.softmax)
def call(self, input, training=None) :
x= self.padding(input)
x = self.stem(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
# [b,c]
x = self.avgpool(x)
x = self.fc(x)
return x
def build_resblock(self, filter_num, blocks, stride=1) :
res_blocks = Sequential()
res_blocks.add(BasicBlock(filter_num, stride))
for pre in range(1, blocks):
res_blocks.add(BasicBlock(filter_num, stride=1))
return res_blocks
def ResNet34(num_classes=10) :
return ResNet([2.2.2.2], num_classes=num_classes)
def ResNet34(num_classes=10) :
return ResNet([3.4.6.3], num_classes=num_classes)
model = ResNet34(num_classes=1000)
model.build(input_shape=(1.224.224.3))
print(model.summary()) # Collect network parameters
Copy the code
ResNet50, ResNet101, ResNet152 complete code
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Sequential
# Second residual module
class Block(layers.Layer) :
def __init__(self, filters, downsample=False, stride=1) :
super(Block, self).__init__()
self.downsample = downsample
self.conv1 = layers.Conv2D(filters, (1.1), strides=stride, padding='same')
self.bn1 = layers.BatchNormalization()
self.relu = layers.Activation('relu')
self.conv2 = layers.Conv2D(filters, (3.3), strides=1, padding='same')
self.bn2 = layers.BatchNormalization()
self.conv3 = layers.Conv2D(4 * filters, (1.1), strides=1, padding='same')
self.bn3 = layers.BatchNormalization()
# se-block
self.se_globalpool = keras.layers.GlobalAveragePooling2D()
self.se_resize = keras.layers.Reshape((1.1.4 * filters))
self.se_fc1 = keras.layers.Dense(units=4 * filters // 16, activation='relu',
use_bias=False)
self.se_fc2 = keras.layers.Dense(units=4 * filters, activation='sigmoid',
use_bias=False)
if self.downsample:
self.shortcut = Sequential()
self.shortcut.add(layers.Conv2D(4 * filters, (1.1), strides=stride))
self.shortcut.add(layers.BatchNormalization(axis=3))
def call(self, input, training=None) :
out = self.conv1(input)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
b = out
out = self.se_globalpool(out)
out = self.se_resize(out)
out = self.se_fc1(out)
out = self.se_fc2(out)
out = keras.layers.Multiply()([b, out])
if self.downsample:
shortcut = self.shortcut(input)
else:
shortcut = input
output = layers.add([out, shortcut])
output = tf.nn.relu(output)
return output
class ResNet(keras.Model) :
def __init__(self, layer_dims, num_classes=10) :
super(ResNet, self).__init__()
# Preprocessing layer
self.padding = keras.layers.ZeroPadding2D((3.3))
self.stem = Sequential([
layers.Conv2D(64, (7.7), strides=(2.2)),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.MaxPool2D(pool_size=(3.3), strides=(2.2), padding='same')])# resblock
self.layer1 = self.build_resblock(64, layer_dims[0],stride=1)
self.layer2 = self.build_resblock(128, layer_dims[1], stride=2)
self.layer3 = self.build_resblock(256, layer_dims[2], stride=2)
self.layer4 = self.build_resblock(512, layer_dims[3], stride=2)
# global pooling
self.avgpool = layers.GlobalAveragePooling2D()
Full connection layer
self.fc = layers.Dense(num_classes, activation=tf.keras.activations.softmax)
def call(self, input, training=None) :
x = self.padding(input)
x = self.stem(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
# [b,c]
x = self.avgpool(x)
x = self.fc(x)
return x
def build_resblock(self, filter_num, blocks, stride=1) :
res_blocks = Sequential()
ifstride ! =1 or filter_num * 4! =64:
res_blocks.add(Block(filter_num, downsample=True,stride=stride))
for pre in range(1, blocks):
res_blocks.add(Block(filter_num, stride=1))
return res_blocks
def ResNet50(num_classes=10) :
return ResNet([3.4.6.3], num_classes=num_classes)
def ResNet101(num_classes=10) :
return ResNet([3.4.23.3], num_classes=num_classes)
def ResNet152(num_classes=10) :
return ResNet([3.8.36.3], num_classes=num_classes)
model = ResNet50(num_classes=1000)
model.build(input_shape=(1.224.224.3))
print(model.summary()) # Collect network parameters
Copy the code