# # introduction of BN
background
Batch Normalization (BN) algorithm is an algorithm developed to overcome the difficulty of training due to the deepening of neural network layers. According to ICS theory, when the distribution of the sample data of the training set is inconsistent with that of the target sample set, the model obtained by training cannot be generalized well.
In the neural network, the input of each layer will inevitably lead to different input signal distribution from the original after the operation in the layer, and the increase of the neural network in the front layer will be amplified by the incorrect accumulation of the neural network in the back. One way to solve this problem is to correct the training sample according to the proportion of the training sample to the target sample. BN algorithm (batch standardization) can be used to normalize the input of some or all layers, so as to fix the mean and variance of the input signal of each layer.
Method of use
Batch standardization is generally used to normalize y= Wx + B before nonlinear mapping (activation function), so that the mean value of the results (each dimension of the output signal) is 0 and the variance is 1. It is beneficial to network training to have a stable distribution of the input of each layer
The advantages can be tried in cases where the neural network converges too slowly or where the gradient explosion is untrainable
- Reduce artificial selection of parameters by removing dropout and L2 regular-term parameters, or by taking smaller L2 regular-term constraints
- Reduce the requirement of learning rate
- Local response normalization can no longer be used,BN is itself a normalized network (local response normalization -AlexNet)
- It also destroys the original data distribution and alleviates over-fitting to a certain extent
A formula to calculate
The process is similar to normalization but different.
reference
BN Principle for detailed reference suggestions :BN Study Notes: Click here
BN with TF
Part of the
BN has two main functions in TensorFlow: TF.nn. moments and TF.Nn.batch_normalization. The two functions are used together. The former returns mean and variance, and the latter performs batch normalization.
tf.nn.moments
Functions in TensorFlow
moments(
x,
axes,
shift=None,
name=None,
keep_dims=False
)
Returns:
Two `Tensor` objects: `mean` and `variance`.
Copy the code
X is the tensor to be passed,axes is an int array, and axes is the dimension to be calculated. The return values are two tensors: mean and variance. We need to use this function to calculate the first two terms of BN algorithm.
# Calculate the mean and variance of Wx_plus_b, where axis = [0] denotes the dimension img_shape= [128, 32, 32, 64] Wx_plus_b = tf.variable (tf.random_normal(img_shape)) axis = list(range(len(img_shape)-1) # [0,1,2] wb_mean, wb_var = tf.nn.moments(Wx_plus_b, axis)Copy the code
Run result, since the initial data is random, the result of each run is not consistent:
*** Wb_mean *** [1.05310767E-03 1.16801530E-03 4.95071337E-03-1.50891789E-03-2.95298663E-03-2.07848335E-03 -3.81800164E-05-3.11688287E-03 3.26496479E-03-2.68524280E-04-2.08893605E-03-3.05374013E-03 1.43721583E-03 -3.61034041E-03-3.03616724E-03-1.10225368E-03 6.14093244E-03-1.37914100E-03-1.13333750E-03 3.53972078E-03 -1.48577197E-03 1.04353309E-03 3.27868876E-03 1.40919012E-03 3.09609319E-03 1.98166977E-04 5.25404140E-03 -6.03850756E-04 1.04614964E-03 2.90997117E-03 5.78491192E-04 4.97420435E-04 3.03052540E-04 2.46527663E-04 -4.70882794E-03 2.79057049E-03 1.98713480E-03 4.13944060E-03 4.80978837E-04 3.90357309E-04 9.1114541304 -4.80215019E-03 6.26503082E-04 2.76877987E-03 3.79961479E-04 5.36157866E-04 2.12549698E-03 5.41620655E-03 -1.93006988E-03-8.54363534E-05 4.97094262E-03-2.45843385E-03 4.16610064E-03 2.44746287E-03-4.1542942603 -6.64028199E-03 2.56747357E-03-1.63110415E-03-1.53350492E-03-7.66420271E-04-1.81624549E-03 2.16634944E-03 1.74984348E-03-4.17272677E-04] *** wb_var *** [0.99813616 0.9983741 1.00014114 1.0012747 0.99496585 1.00168002 1.00439012 0.99607879 1.00104094 0.99969071 1.01024568 0.99614906 1.00092578 0.99977148 1.00447345 0.99580348 0.99797201 0.99119431 1.00352168 0.9958936 0.99980813 1.00598109 1.00050855 0.99667317 0.99352562 1.0036608 0.99794698 0.99324805 0.99862647 0.99930048 0.99658304 1.00278556 0.99731135 1.00254881 0.99352133 1.00371397 1.00258803 1.00388253 1.00404358 0.99454063 0.99434716 1.00087452 1.00818515 1.00019705 0.99542576 1.00410056 0.99707311 1.00215423 1.00199771 0.99394888 0.9973973 1.00197709 0.99835181 0.99944276 0.99977624 0.99892712 0.99871159 0.99913275 1.00471914 1.00210452 0.99568754 0.99547535 0.99983472 1.00523198Copy the code
We have assumed the shape of the picture [128, 32, 32, 64], and its operation mode is shown as follows:
tf.nn.batch_normalization
Functions in TensorFlow
batch_normalization(
x,
mean,
variance,
offset,
scale,
variance_epsilon,
name=None
)
Copy the code
X is the input tensor,mean and variance are calculated from moments(), offset and scale are initialized to 0 and 1 respectively, and variance_epsilon is set to small numbers **
Variable(tf.ones([64])) offset = tf.variable (tf.zeros([64])) variance_epsilon = 0.001 Wx_plus_b = tf.nn.batch_normalization(Wx_plus_b, wb_mean, wb_var, offset, scale, Wx_plus_b1 = (wx_plus_B-wb_mean)/tf.sqrt(wb_var + variance_epsilon) Wx_plus_b1 = Wx_plus_b1 * scale + offset # Because of the underlying calculation, the final results written by the author are actually different from those obtained by direct calls to TF.nn.batch_normalizationCopy the code
Run result, since the initial data is random, the results of each run are not consistent, but the calculation difference in this example always exists:
[[[[3.32006335E-01-1.00865233e +00 4.68401730E-01, + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1.66336447e-01 1.34183773e-01 1.18540943e+00 [-7.14844346e-01 -1.56187916e+00 -8.09686005e-01 -4.23679769E-01-4.32125211E-01-3.35091174E-01... [[[[3.31096262E-01-1.01013660E +00 4.63186830E-01... + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1.64460197e-01 2.32336998e-01 1.18214881e+00 [-7.16206789e-01-1.56353664e +00 + 8.14172268e-01... 4.26598638 e-01 e-01 4.33694094 3.33635926 e-01]Copy the code
Complete code
# - * - coding: Utf-8 -*- # # Created by FontTian # # http://blog.csdn.net/fontthrone import tensorflow as tf import OS OS. Environ [' TF_CPP_MIN_LOG_LEVEL] = '2' # Wx_plus_b calculation Where axis = [0] represents the dimension img_Shape = [128, 32, 32, 64] Wx_plus_b = tf.Variable(tf.random_normal(img_shape)) axis = list(range(len(img_shape) - 1)) wb_mean, wb_var = tf.nn.moments(Wx_plus_b, Axis) scale = tf.variable (tf.ones([64])) offset = tf.variable (tf.zeros([64])) variance_epsilon = 0.001 Wx_plus_b0 = tf.nn.batch_normalization(Wx_plus_b, wb_mean, wb_var, offset, scale, Wx_plus_b1 = (wx_plus_B-wb_mean)/tf.sqrt(wb_var + variance_epsilon) Wx_plus_b1 = Wx_plus_b1 * scale + offset # Because of the underlying calculation, there is actually a difference between the results of the normalization with TF.nn.batch_normalization with tf.session () as sess: tf.global_variables_initializer().run() print('*** wb_mean ***') print(sess.run(wb_mean)) print('*** wb_var ***') print(sess.run(wb_var)) print('*** Wx_plus_b ***') print(sess.run(Wx_plus_b0)) print('**** Wx_plus_b1 ****') print(sess.run(Wx_plus_b1))Copy the code