An image is known as a collection of pixel values, and this idea could help computer scientists and researchers build neural networks that are similar to the human brain and can perform specific functions. In some cases, this neural network can even exceed human precision.
The image above is a good example of how images are represented by pixel values. These small blocks of pixels form the most basic convolutional neural network.
Convolutional neural networks are very similar to general neural networks in that they are both composed of learnable weights and bias items as well as neurons. Each neuron takes some input, then performs the dot product (scalar), and optionally performs nonlinear classification. The entire network still represents a single differentiable score function, with raw image pixels input at one end and the probability of categories output at the other. The network still has loss functions, because loss functions can calculate relative probabilities at the last (fully connected) layer (e.g., support vector machines /Softmax), and various development techniques learned from conventional neural networks can be applied to loss functions.
How convolution works. Each pixel is replaced by a weighted sum of surrounding pixels, and the neural network learns these weights.
More recently, ConvNets have performed well in areas such as face recognition, object recognition, traffic signs, robotics and autonomous driving, as data volumes and computing power have increased dramatically.
The following figure shows the four main operations in ConvNet:
1. Convolution
2. Nonlinearity (e.g. ReLU)
Pooling or sub-sampling
4. Classification
A picture of a car passes through ConNet and outputs the category for cars at the full connection layer
All Convolution Network
Most modern convolutional neural networks (CNN) for target recognition are constructed using the same principle: alternating convolutional and maximum pooling layers, accompanied by a small number of fully connected layers. A previous paper proposed that max-pooling can be easily replaced by a convolution layer with increased step size without loss of accuracy in the image recognition benchmark. Another interesting thing mentioned in the paper is the replacement of the full connection layer with a Global Average pooling.
If you want to learn more about the convolution network, can refer to the paper: https://arxiv.org/abs/1412.6806#
Getting rid of the fully connected layer is perhaps not a surprise, since it has long been used in the first place. Not long ago Yann LeCun even wrote on Facebook that I never used the full connection layer in the first place.
This makes sense. The only difference between the fully connected layer and the convolution layer is that the neurons of the latter are only connected to the local domain of the input, and many neurons in the convolution space share parameters. However, neurons in the fully connected layer and the convolution layer still compute the dot product, and their functional form is the same. Therefore, it turns out that it is possible to convert between the fully connected layer and the convolution layer, and sometimes the fully connected layer can even be replaced by the convolution layer.
As mentioned above, the next step is to remove space pooling from the network. Now this might cause some confusion. Let’s look at this concept in detail.
Spatial Pooling, also known as subsampling or downsampling, reduces the dimension of each feature mapping, but retains the most important information.
Let’s take maximum pooling as an example. In this case, we define a spatial window and get the maximum element from its feature map. Now remember Figure 2 (how convolution works). Intuitively, the convolution layer with larger stride length can be used as sub-sampling layer and sub-sampling layer to make the input representation smaller and more controllable. It can also reduce the number and calculation of parameters in the network, and then control the occurrence of overfitting.
In order to reduce the representation size, using a larger stride length in the convolution layer is sometimes the best choice in many cases. It is also important to abandon pooling layers in trained generative models such as variational autoencoders (VAE) or generative adversarial networks (GAN). In addition, future neural network architectures may have very few or no pooling layers.
In view of all of the above mentioned tips or fine-tuning is more important, we released on making using Keras model to realize the convolution neural network: https://github.com/MateLabs/All-Conv-Keras
Import libraries and dependencies
from __future__ import print_function
import tensorflow as tf
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Activation, Convolution2D, GlobalAveragePooling2D
from keras.utils import np_utils
from keras.optimizers import SGD
from keras import backend as K
from keras.models import Model
from keras.layers.core import Lambda
from keras.callbacks import ModelCheckpoint
import pandasCopy the code
Train on multiple Gpus
For a multi-GPU implementation of the model, we have a custom function that allocates training data to available Gpus.
The computation is done on the GPU, and the output data is passed to the CPU to complete the model.
def make_parallel(model, gpu_count):
def get_slice(data, idx, parts):
shape = tf.shape(data)
size = tf.concat(0, [ shape[:1] // parts, shape[1:] ])
stride = tf.concat(0, [ shape[:1] // parts, shape[1:]*0 ])
start = stride * idx
return tf.slice(data, start, size)
outputs_all = []
for i in range(len(model.outputs)):
outputs_all.append([])Copy the code
#Place a copy of the model on each GPU, each getting a slice of the batchCopy the code
for i in range(gpu_count):
with tf.device('/gpu:%d' % i):
with tf.name_scope('tower_%d' % i) as scope:
inputs = []Copy the code
#Slice each input into a piece for processing on this GPUCopy the code
for x in model.inputs:
input_shape = tuple(x.get_shape().as_list())[1:]
slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
inputs.append(slice_n)
outputs = model(inputs)Copy the code
if not isinstance(outputs, list):
outputs = [outputs]Copy the code
#Save all the outputs for merging back together laterCopy the code
for l in range(len(outputs)):
outputs_all[l].append(outputs[l])Copy the code
# merge outputs on CPUCopy the code
with tf.device('/cpu:0'):
merged = []
for outputs in outputs_all:
merged.append(merge(outputs, mode='concat', concat_axis=0))
return Model(input=model.inputs, output=merged)Copy the code
Set batch size, class number, and iteration number
Since we are using the CIFAR 10 dataset with 10 classes (different object types), the number of classes is 10 and the batch size is equal to 32. The number of iterations is determined by your own available time and the computing power of your device. In this case we iterate 1000 times.
The image size is 32 by 32. Color channels=3 (RGB)
batch_size = 32
nb_classes = 10
nb_epoch = 1000
rows, cols = 32, 32
channels = 3Copy the code
The data set is divided into three parts: “training set”, “test set” and “verification set”
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
print (X_train.shape[1:])Copy the code
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)Copy the code
Build the model
model = Sequential()Copy the code
model.add(Convolution2D(96, 3, 3, border_mode = 'same', input_shape=(3, 32, 32)))
model.add(Activation('relu'))
model.add(Convolution2D(96, 3, 3,border_mode='same'))
model.add(Activation('relu'))Copy the code
#The next layer is the substitute of max pooling, we are taking a strided convolution layer to reduce the dimensionality of the image.Copy the code
Model. Add (Convolution2D(96, 3, 3, border_mode='same', subsample = (2,2))
Model. The add (Dropout (0.5))
model.add(Convolution2D(192, 3, 3, border_mode = 'same'))
model.add(Activation('relu'))
model.add(Convolution2D(192, 3, 3,border_mode='same'))
model.add(Activation('relu'))Copy the code
# The next layer is the substitute of max pooling, we are taking a strided convolution layer to reduce the dimensionality of the image.Copy the code
Model. Add (Convolution2D(192, 3, 3,border_mode='same', subsample = (2,2))
Model. The add (Dropout (0.5))
model.add(Convolution2D(192, 3, 3, border_mode = 'same'))
model.add(Activation('relu'))
model.add(Convolution2D(192, 1, 1,border_mode='valid'))
model.add(Activation('relu'))
model.add(Convolution2D(10, 1, 1, border_mode='valid'))Copy the code
model.add(GlobalAveragePooling2D())
model.add(Activation('softmax'))
model = make_parallel(model, 4)
SGD = SGD(LR =0.1, decay= 1E-6, Momentum =0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])Copy the code
Print the model. This will give you an overview of the model, which is very helpful in visualizing the dimensions and number of parameters of the model
print (model.summary())Copy the code
Data expansion
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the datasetCopy the code
samplewise_center=False, # set each sample mean to 0Copy the code
featurewise_std_normalization=False, # divide inputs by std of the datasetCopy the code
samplewise_std_normalization=False, # divide each input by its stdCopy the code
zca_whitening=False, # apply ZCA whiteningCopy the code
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)Copy the code
Width_shift_range =0.1, # swerving shift images horizontally (fraction of total width)Copy the code
Height_shift_range =0.1, # stereoscopic shift imagesCopy the code
horizontal_flip=False, # randomly flip imagesCopy the code
vertical_flip=False) # randomly flip imagesCopy the code
datagen.fit(X_train)Copy the code
Save the best weights in your model and add checkpoints
filepath="weights.{epoch:02d}-{val_loss:.2f}.hdf5"Copy the code
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='max')Copy the code
callbacks_list = [checkpoint]Copy the code
# Fit the model on the batches generated by datagen.flow().Copy the code
history_callback = model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size), samples_per_epoch=X_train.shape[0], nb_epoch=nb_epoch, validation_data=(X_test, Y_test), callbacks=callbacks_list, verbose=0)Copy the code
Finally, get a log of the training session and save your model
pandas.DataFrame(history_callback.history).to_csv("history.csv")
model.save('keras_allconv.h5')Copy the code
The above model easily achieves more than 90% accuracy after the first 350 iterations. If you want to increase accuracy, you can sacrifice computing time and try to scale up larger data.