Deep residual shrinkage network is a novel deep neural network algorithm, which is essentially an upgraded version of deep residual network, and can improve the feature learning effect of deep neural network on strong noise data.

First, let’s review the deep residual network. The basic modules of deep residual network are shown in the figure below. Compared with ordinary convolutional neural networks, deep residual networks introduce cross-layer identity connections to reduce the difficulty of model training and improve accuracy.



Then, based on the deep residual Network, the deep residual shrinkage Network borrows the squeeze-and-excitaion Network and introduces a small sub-network. The sub-network obtains a set of thresholds and soft-threshold each channel of the feature graph. This process can actually be regarded as a trainable feature selection process. In other words, through the previous convolution layer, the important features are changed into values with large absolute values, and the features corresponding to redundant information are changed into values with small absolute values. The boundary between them is obtained through the subnetwork, and the redundant features are set to zero by soft thresholding, while the important features have non-zero output.



Deep residual shrinkage network is a general feature learning method, which can be used not only for strong noise data, but also for data without noise. This is because the threshold in the deep residual shrinkage network is determined adaptively according to the sample situation. In other words, if there is no redundant information in the sample and no soft thresholding is required, then the threshold can be trained so close to zero that soft thresholding is equivalent to nonexistence.

Finally, stack a certain number of basic modules, the complete network structure is obtained.



MNIST handwritten digits are classified by deep residual contraction network, and it can be seen that the effect is good even without adding noise. Deep residual shrinkage network code:

#! /usr/bin/env python3
# -*- coding: utf-8 -*-
""Implemented using TensorFlow 1.0 and TFLearn 0.3.2 M. Zhao, S. Zhong, X. Fu, "Implemented on Thu Dec 26 07:46:00 2019 Implemented using TensorFlow 1.0 and TFLearn 0.3.2 M. Zhao, S. Zhong, X. Fu, et al., Deep Residual Shrinkage Networks for Fault Diagnosis, IEEE Transactions on Industrial Informatics, 2019, DOI: 10.1109 / TII. 2019.2943898 @ author: me"""

import tflearn
import tensorflow as tf
from tflearn.layers.conv import conv_2d

# Data loading
from tflearn.datasets import mnist
X, Y, testX, testX = x.reshape ([-1,28,28,1]) Y = mnist.load_data(one_hot=True) X = x.reshape ([-1,28,28,1])testX = testX.reshape([-1,28,28,1])

def residual_shrinkage_block(incoming, nb_blocks, out_channels, downsample=False,
                   downsample_strides=2, activation='relu', batch_norm=True,
                   bias=True, weights_init='variance_scaling',
                   bias_init='zeros', regularizer='L2', weight_decay=0.0001,
                   trainable=True, restore=True, reuse=False, scope=None,
                   name="ResidualBlock") :# residual shrinkage blocks with channel-wise thresholds

    residual = incoming
    in_channels = incoming.get_shape().as_list()[-1]

    # Variable Scope fix for older TF
    try:
        vscope = tf.variable_scope(scope, default_name=name, values=[incoming],
                                   reuse=reuse)
    except Exception:
        vscope = tf.variable_op_scope([incoming], scope, name, reuse=reuse)

    with vscope as scope:
        name = scope.name #TODO

        for i in range(nb_blocks):

            identity = residual

            if not downsample:
                downsample_strides = 1

            if batch_norm:
                residual = tflearn.batch_normalization(residual)
            residual = tflearn.activation(residual, activation)
            residual = conv_2d(residual, out_channels, 3,
                             downsample_strides, 'same'.'linear',
                             bias, weights_init, bias_init,
                             regularizer, weight_decay, trainable,
                             restore)

            if batch_norm:
                residual = tflearn.batch_normalization(residual)
            residual = tflearn.activation(residual, activation)
            residual = conv_2d(residual, out_channels, 3, 1, 'same'.'linear', bias, weights_init,
                             bias_init, regularizer, weight_decay,
                             trainable, restore)
            
            # get thresholds and apply thresholding
            abs_mean = tf.reduce_mean(tf.reduce_mean(tf.abs(residual),axis=2,keep_dims=True),axis=1,keep_dims=True)
            scales = tflearn.fully_connected(abs_mean, out_channels//4, activation='linear',regularizer='L2', weights_init = weight_decay = 0.0001'variance_scaling')
            scales = tflearn.batch_normalization(scales)
            scales = tflearn.activation(scales, 'relu')
            scales = tflearn.fully_connected(scales, out_channels, activation='linear',regularizer='L2', weights_init = weight_decay = 0.0001'variance_scaling')
            scales = tf.expand_dims(tf.expand_dims(scales,axis=1),axis=1)
            thres = tf.multiply(abs_mean,tflearn.activations.sigmoid(scales))
            residual = tf.multiply(tf.sign(residual), tf.maximum(tf.abs(residual)-thres,0))
            

            # Downsampling
            if downsample_strides > 1:
                identity = tflearn.avg_pool_2d(identity, 1,
                                               downsample_strides)

            # Projection to new dimension
            ifin_channels ! = out_channels:if (out_channels - in_channels) % 2 == 0:
                    ch = (out_channels - in_channels)//2
                    identity = tf.pad(identity,
                                      [[0, 0], [0, 0], [0, 0], [ch, ch]])
                else:
                    ch = (out_channels - in_channels)//2
                    identity = tf.pad(identity,
                                      [[0, 0], [0, 0], [0, 0], [ch, ch+1]])
                in_channels = out_channels

            residual = residual + identity

    return residual


# Real-time data preprocessing
img_prep = tflearn.ImagePreprocessing()
img_prep.add_featurewise_zero_center(per_channel=True)

# Build a deep residual shrinkage network
net = tflearn.input_data(shape=[None, 28, 28, 1])
net = tflearn.conv_2d(net, 8, 3, regularizer='L2', weight_decay=0.0001)
net = residual_shrinkage_block(net, 1,  8, downsample=True)
net = tflearn.batch_normalization(net)
net = tflearn.activation(net, 'relu')
net = tflearn.global_avg_pool(net)
# Regression
net = tflearn.fully_connected(net, 10, activation='softmax') mom = tflearn.Momentum(0.1, lr_decay=0.1, decay_step=40000, staircase=True) NET = tflearn. Regression (net, optimizer=mom, loss='categorical_crossentropy')
# Train
model = tflearn.DNN(net, checkpoint_path='model_mnist',
                    max_checkpoints=10, tensorboard_verbose=0,
                    clip_gradients=0.)

model.fit(X, Y, n_epoch=200, snapshot_epoch=False, snapshot_step=500,
          show_metric=True, batch_size=100, shuffle=True, run_id='model_mnist')
# Test
training_acc = model.evaluate(X, Y)[0]
validation_acc = model.evaluate(testX, testY)[0]Copy the code

Next is the program of deep residual network ResNet:

#! /usr/bin/env python3
# -*- coding: utf-8 -*-
""Implemented using TensorFlow 1.0 and TFLearn 0.3.2 K. He, X. Zhang, S. Ren, "Implemented on Thu Dec 26 07:46:00 2019 Implemented using TensorFlow 1.0 and TFLearn 0.3.2 K. J. Sun, Deep Residual Learning for Image Recognition, CVPR, 2016. @author: me """

import tflearn

# Data loading
from tflearn.datasets import mnist
X, Y, testX, testX = x.reshape ([-1,28,28,1]) Y = mnist.load_data(one_hot=True) X = x.reshape ([-1,28,28,1])testX = testX.r eshape (,28,28,1] [- 1)# Real-time data preprocessing
img_prep = tflearn.ImagePreprocessing()
img_prep.add_featurewise_zero_center(per_channel=True)

# Build a deep residual network
net = tflearn.input_data(shape=[None, 28, 28, 1])
net = tflearn.conv_2d(net, 8, 3, regularizer='L2', weight_decay=0.0001) net = tflearn. Residual_block (net, 1, 8, downsample=True) net = tflearn.batch_normalization(net) net = tflearn.activation(net,'relu')
net = tflearn.global_avg_pool(net)
# Regression
net = tflearn.fully_connected(net, 10, activation='softmax') mom = tflearn.Momentum(0.1, lr_decay=0.1, decay_step=40000, staircase=True) NET = tflearn. Regression (net, optimizer=mom, loss='categorical_crossentropy')
# Train
model = tflearn.DNN(net, checkpoint_path='model_mnist',
                    max_checkpoints=10, tensorboard_verbose=0,
                    clip_gradients=0.)

model.fit(X, Y, n_epoch=200, snapshot_epoch=False, snapshot_step=500,
          show_metric=True, batch_size=100, shuffle=True, run_id='model_mnist')
# Test
training_acc = model.evaluate(X, Y)[0]
validation_acc = model.evaluate(testX, testY)[0]Copy the code

The above two programs established a small neural network with only one basic module, and no artificial noise was added to MNIST image data. The accuracy is shown in the table below. It can be seen that even for data without noise, the result of deep residual shrinkage network is good:



References:

M. Zhao, S. Zhong, X. Fu, et al., Deep residual shrinkage networks for fault diagnosis, IEEE Transactions on Industrial Informatics, DOI: 10.1109/TII.2019.2943898

Ieeexplore.ieee.org/document/88…