An image is known as a collection of pixel values, and this idea could help computer scientists and researchers build neural networks that are similar to the human brain and can perform specific functions. In some cases, this neural network can even exceed human precision.


The image above is a good example of how images are represented by pixel values. These small blocks of pixels form the most basic convolutional neural network.


Convolutional neural networks are very similar to general neural networks in that they are both composed of learnable weights and bias items as well as neurons. Each neuron takes some input, then performs the dot product (scalar), and optionally performs nonlinear classification. The entire network still represents a single differentiable score function, with raw image pixels input at one end and the probability of categories output at the other. The network still has loss functions, because loss functions can calculate relative probabilities at the last (fully connected) layer (e.g., support vector machines /Softmax), and various development techniques learned from conventional neural networks can be applied to loss functions.


How convolution works. Each pixel is replaced by a weighted sum of surrounding pixels, and the neural network learns these weights.


More recently, ConvNets have performed well in areas such as face recognition, object recognition, traffic signs, robotics and autonomous driving, as data volumes and computing power have increased dramatically.



The following figure shows the four main operations in ConvNet:


1. Convolution

2. Nonlinearity (e.g. ReLU)

Pooling or sub-sampling

4. Classification


A picture of a car passes through ConNet and outputs the category for cars at the full connection layer


All Convolution Network


Most modern convolutional neural networks (CNN) for target recognition are constructed using the same principle: alternating convolutional and maximum pooling layers, accompanied by a small number of fully connected layers. A previous paper proposed that max-pooling can be easily replaced by a convolution layer with increased step size without loss of accuracy in the image recognition benchmark. Another interesting thing mentioned in the paper is the replacement of the full connection layer with a Global Average pooling.


If you want to learn more about the convolution network, can refer to the paper: https://arxiv.org/abs/1412.6806#


Getting rid of the fully connected layer is perhaps not a surprise, since it has long been used in the first place. Not long ago Yann LeCun even wrote on Facebook that I never used the full connection layer in the first place.


This makes sense. The only difference between the fully connected layer and the convolution layer is that the neurons of the latter are only connected to the local domain of the input, and many neurons in the convolution space share parameters. However, neurons in the fully connected layer and the convolution layer still compute the dot product, and their functional form is the same. Therefore, it turns out that it is possible to convert between the fully connected layer and the convolution layer, and sometimes the fully connected layer can even be replaced by the convolution layer.


As mentioned above, the next step is to remove space pooling from the network. Now this might cause some confusion. Let’s look at this concept in detail.


Spatial Pooling, also known as subsampling or downsampling, reduces the dimension of each feature mapping, but retains the most important information.



Let’s take maximum pooling as an example. In this case, we define a spatial window and get the maximum element from its feature map. Now remember Figure 2 (how convolution works). Intuitively, the convolution layer with larger stride length can be used as sub-sampling layer and sub-sampling layer to make the input representation smaller and more controllable. It can also reduce the number and calculation of parameters in the network, and then control the occurrence of overfitting.


In order to reduce the representation size, using a larger stride length in the convolution layer is sometimes the best choice in many cases. It is also important to abandon pooling layers in trained generative models such as variational autoencoders (VAE) or generative adversarial networks (GAN). In addition, future neural network architectures may have very few or no pooling layers.


In view of all of the above mentioned tips or fine-tuning is more important, we released on making using Keras model to realize the convolution neural network: https://github.com/MateLabs/All-Conv-Keras


Import libraries and dependencies



from __future__ import print_function

import tensorflow as tf

from keras.datasets import cifar10

from keras.preprocessing.image import ImageDataGenerator

from keras.models import Sequential

from keras.layers import Dropout, Activation, Convolution2D, GlobalAveragePooling2D

from keras.utils import np_utils

from keras.optimizers import SGD

from keras import backend as K

from keras.models import Model

from keras.layers.core import Lambda

from keras.callbacks import ModelCheckpoint

import pandasCopy the code



    Train on multiple Gpus


    For a multi-GPU implementation of the model, we have a custom function that allocates training data to available Gpus.


    The computation is done on the GPU, and the output data is passed to the CPU to complete the model.


    def make_parallel(model, gpu_count):

       def get_slice(data, idx, parts):

           shape = tf.shape(data)

           size = tf.concat(0, [ shape[:1] // parts, shape[1:] ])

           stride = tf.concat(0, [ shape[:1] // parts, shape[1:]*0 ])

           start = stride * idx

           return tf.slice(data, start, size)

       outputs_all = []

       for i in range(len(model.outputs)):

           outputs_all.append([])Copy the code
    #Place a copy of the model on each GPU, each getting a slice of the batchCopy the code
        for i in range(gpu_count):

           with tf.device('/gpu:%d' % i):

               with tf.name_scope('tower_%d' % i) as scope:

               inputs = []Copy the code
    #Slice each input into a piece for processing on this GPUCopy the code
                for x in model.inputs:

                   input_shape = tuple(x.get_shape().as_list())[1:]

                   slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)

                   inputs.append(slice_n)

               outputs = model(inputs)Copy the code
                if not isinstance(outputs, list):

                   outputs = [outputs]Copy the code
    #Save all the outputs for merging back together laterCopy the code
                for l in range(len(outputs)):

                   outputs_all[l].append(outputs[l])Copy the code
    # merge outputs on CPUCopy the code
    with tf.device('/cpu:0'):

       merged = []

       for outputs in outputs_all:

           merged.append(merge(outputs, mode='concat', concat_axis=0))

       return Model(input=model.inputs, output=merged)Copy the code


    Set batch size, class number, and iteration number


    Since we are using the CIFAR 10 dataset with 10 classes (different object types), the number of classes is 10 and the batch size is equal to 32. The number of iterations is determined by your own available time and the computing power of your device. In this case we iterate 1000 times.


    The image size is 32 by 32. Color channels=3 (RGB)


    batch_size = 32

    nb_classes = 10

    nb_epoch = 1000

    rows, cols = 32, 32

    channels = 3Copy the code


    The data set is divided into three parts: “training set”, “test set” and “verification set”



    (X_train, y_train), (X_test, y_test) = cifar10.load_data()

    print('X_train shape:', X_train.shape)

    print(X_train.shape[0], 'train samples')

    print(X_test.shape[0], 'test samples')

    print (X_train.shape[1:])Copy the code
    Y_train = np_utils.to_categorical(y_train, nb_classes)

    Y_test = np_utils.to_categorical(y_test, nb_classes)Copy the code


    Build the model



    model = Sequential()Copy the code
    model.add(Convolution2D(96, 3, 3, border_mode = 'same', input_shape=(3, 32, 32)))

    model.add(Activation('relu'))

    model.add(Convolution2D(96, 3, 3,border_mode='same'))

    model.add(Activation('relu'))Copy the code
    #The next layer is the substitute of max pooling, we are taking a strided convolution layer to reduce the dimensionality of the image.Copy the code
    Model. Add (Convolution2D(96, 3, 3, border_mode='same', subsample = (2,2))

    Model. The add (Dropout (0.5))

    model.add(Convolution2D(192, 3, 3, border_mode = 'same'))

    model.add(Activation('relu'))

    model.add(Convolution2D(192, 3, 3,border_mode='same'))

    model.add(Activation('relu'))Copy the code
    # The next layer is the substitute of max pooling, we are taking a strided convolution layer to reduce the dimensionality of the image.Copy the code
    Model. Add (Convolution2D(192, 3, 3,border_mode='same', subsample = (2,2))

    Model. The add (Dropout (0.5))

    model.add(Convolution2D(192, 3, 3, border_mode = 'same'))

    model.add(Activation('relu'))

    model.add(Convolution2D(192, 1, 1,border_mode='valid'))

    model.add(Activation('relu'))

    model.add(Convolution2D(10, 1, 1, border_mode='valid'))Copy the code
    model.add(GlobalAveragePooling2D())

    model.add(Activation('softmax'))

    model = make_parallel(model, 4)

    SGD = SGD(LR =0.1, decay= 1E-6, Momentum =0.9, nesterov=True)

    model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])Copy the code


      Print the model. This will give you an overview of the model, which is very helpful in visualizing the dimensions and number of parameters of the model



      print (model.summary())Copy the code


      Data expansion



      datagen = ImageDataGenerator(

      featurewise_center=False,  # set input mean to 0 over the datasetCopy the code
      samplewise_center=False,  # set each sample mean to 0Copy the code
      featurewise_std_normalization=False,  # divide inputs by std of the datasetCopy the code
      samplewise_std_normalization=False,  # divide each input by its stdCopy the code
      zca_whitening=False,  # apply ZCA whiteningCopy the code
      rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)Copy the code
      Width_shift_range =0.1, # swerving shift images horizontally (fraction of total width)Copy the code
      Height_shift_range =0.1, # stereoscopic shift imagesCopy the code
      horizontal_flip=False,  # randomly flip imagesCopy the code
      vertical_flip=False)  # randomly flip imagesCopy the code
      datagen.fit(X_train)Copy the code


        Save the best weights in your model and add checkpoints



        filepath="weights.{epoch:02d}-{val_loss:.2f}.hdf5"Copy the code
        checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='max')Copy the code
        callbacks_list = [checkpoint]Copy the code
        # Fit the model on the batches generated by datagen.flow().Copy the code
        history_callback = model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size), samples_per_epoch=X_train.shape[0], nb_epoch=nb_epoch, validation_data=(X_test, Y_test), callbacks=callbacks_list, verbose=0)Copy the code


        Finally, get a log of the training session and save your model



        pandas.DataFrame(history_callback.history).to_csv("history.csv")

        model.save('keras_allconv.h5')Copy the code


        The above model easily achieves more than 90% accuracy after the first 350 iterations. If you want to increase accuracy, you can sacrifice computing time and try to scale up larger data.