Overview

This paper is the first part of “SkySeraph AI From Practice to Theory Series”. Based on the classic MNIST data set of HelloWord in AI field, we realized CNN handwritten number recognition on Android platform based on TensorFlow. Code~


Practice

Environment

  • TensorFlow: 1.2.0
  • Python: 3.6
  • Python IDE: PyCharm 2017.2
  • Android IDE: Android Studio 3.0

Train & Evaluate (Python+TensorFlow)

The main purpose of the training and evaluation part is to generate PB files for testing, which save the network topology structure and parameter information constructed after training by using TensorFlow Python API. There are many ways to realize it. Besides CNN, RNN, FCNN and so on can also be used. Conv2d uses tf.nn. Conv2d as the back-end processing. The filters on the parameter are integers and filter is a 4-dimensional tensor. The prototype is as follows: Convolutional. Py file def conv2d(inputs, filters, kernel_size, strides=(1, 1), padding= ‘valid’, Data_format = ‘channels_last’, DILation_rate =(1, 1), activation=None, USe_bias =True, kernel_initializer=None, bias_initializer=init_ops.zeros_initializer(), kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, trainable=True, name=None, reuse=None)

Gen_nn_ops. Py files

def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format="NHWC", name=None)

The official Demo uses layers Module, with the following structure:

  • Convolutional Layer # 1:32 5×5 filters, using the ReLU activation function
  • Max Pooling: filter of Pooling Layer # 1:2 ×2, step 2
  • Convolutional Layer # 2:64 5×5 filters, using the ReLU activation function
  • Pooling Layer #2: filter 2×2: Max Pooling
  • Dense Layer # 1:1,024 neurons, using ReLU activation function, dropout rate 0.4 (to avoid overfitting, 40% of neurons will be randomly removed during training)
  • Dense Layer #2 (Logits Layer) : 10 neurons, each corresponding to a category (0-9)

The complete definition of convolution structure is completed in cnN_model_FN (features, labels, mode) function. The core code is as follows.

You can also use the traditional tF.nn. conv2d function, with the core code as follows.

The Test (Android + TensorFlow)

  • The core is to use the API interface: TensorFlowInferenceInterface. Java
  • Configuration gradle or import the jar and so since the compiler TensorFlow source compile ‘org. TensorFlow: TensorFlow – android: 1.2.0’
  • Import the PB file to the assets directory and read the PB file

    String actualFilename = labelFilename. Split (” file:///android_asset/ “) [1]; Log. I (TAG, “Reading labels from:” + actualFilename); BufferedReader br = null; br = new BufferedReader(new InputStreamReader(assetManager.open(actualFilename))); String line; while ((line = br.readLine()) ! = null) { c.labels.add(line); } br.close();

  • TensorFlow interface

  • End result:

Theory

MNIST

MNIST, one of the most classic machine learning models, contains 0~9 numbers, 28*28 monochrome gray handwritten digital image database, including 60,000 training examples and 10,000 test examples.

The file directory is as follows, mainly including four binary files, namely training and test pictures and Label.



The following is the binary structure of the training image. Before the real data (Pixel), there are some description fields (magic number, number of pictures, number of lines and columns of pictures), and the storage of real data adopts the big-endian rule.

(Big endian rule, where high bytes of data are stored in low memory address and low bytes are stored in high memory address)



In specific experiments, to extract real data, the unpack_FROM method in the library struct specially used for processing bytes can be adopted, and the core method is as follows:

struct.unpack_from(self._fourBytes2, buf, index)

MNIST = input_data.read_data_sets(‘ MNIST ‘, one_hot=True)

Convolutional Neural Network

CNN Keys

  • Convolutional Neural Network (CNN) is a Convolutional Neural Network.
  • Convolution (Convolution) is the single most important concept in modern deep learning, it is a mathematical operation, the reader can from the following link [23] in Convolution related mathematical mechanism, including respectively from Fourier transform and Dirac delta function in pushing the Convolution is defined, we can literally macro rude understand roll up into multiply factors turn together.
  • Convolution animation. The demonstration is shown below [26]. For more animation demonstration, please refer to [27].
  • Neural networks. A system consisting of a large number of neurons, as shown below [21]



    Where x represents the input vector, w represents the weight, B represents the bias, and f represents the activation function.

  • Activation Function: Commonly used nonlinear Activation functions include Sigmoid, TANH, ReLU, etc. The formulas are shown below.

    • Sigmoid shortcomings
      • Function saturation makes the gradient disappear (neurons are near saturation at values of 0 or 1, in these regions, the gradient is almost zero)
      • The Sigmoid function is not symmetric about the center of the origin (zero centered)
    • Tanh: Has saturation problems, but its output is zero centered, so tanH is actually more popular than SigmoID.
    • ReLU
      • Advantage 1: ReLU can greatly accelerate the convergence of SGD
      • Advantage 2: You only need a threshold to get the activation value instead of doing a lot of complicated (exponential) operations
      • Disadvantages: The learning rate needs to be set properly to prevent dead during training, and can be replaced by Leaky ReLU, PReLU, Maxout, etc
  • Pooling Pooling. Generally, it can be divided into mean pooling and maximum pooling Max pooling, Max pooling as shown in the figure below [21]. In addition, there is also OverlappingPooling [24]. Spatial Pyramid Pooling [25]
    • Average pooling: Calculate the average value of the image area as the pooled value of the area.
    • Maximum pooling: Select the maximum value of the image region as the pooled value of the region.

CNN Architecture

  • Three layer neural network. They are Input layer, Output layer and Hidden layer respectively, as shown in the figure below [21]
  • CNN hierarchy. Stanford CS231N describes a kind of [input-conv-relu-pool-FC], as shown in the figure below [21], which are INPUT layer, convolution layer, excitation layer, pooling layer and full connection layer respectively.
  • CNN general architecture is divided into three layers as follows:
    • Convolutional layers
    • Pooling layers
    • 10. Our Dense (fully connected) layers
  • Animation demo. Reference [22].

Regression + Softmax

Machine learning has two major algorithms in supervised learning: classification algorithm and regression algorithm. Classification algorithm is used for discrete distribution prediction, and regression algorithm is used for continuous distribution prediction. The purpose of regression is to establish a regression equation to predict the target value, and the solution of regression is to find the regression coefficient of the regression equation. Regression algorithms include Linear Regression, Logistic Regression, etc. Softmax Regression is one of the Logistic Regression algorithms used to solve the problem of multi-class classification. The classic example is the application of MNIST handwritten number classification.

Linear Regression

Linear Regression is the most basic model in machine learning, whose goal is to fit the target label as best as possible with the predicted results

  • Multiple linear regression model definition
  • Multiple linear regression solution
  • Mean Square Error (MSE)
    • “Gradient Descent”
    • Normal Equation (least square method)
    • Locally weighted linear regression (LocallyWeightedLinearRegression LWLR) : owe fitting phenomenon in linear regression model, to introduce some deviation in the estimated in order to reduce the mean square error of prediction.
    • Ridge regression and reduction methods
  • Selection: Normal Equation is larger than Gradient Descent (it needs to calculate the transpose and inverse matrix of X), which is only applicable when the number of features is less than 100000; The gradient method is used when the number of features is greater than 100,000. When X is irreversible, ridge regression algorithm can be used instead. LWLR method increases the amount of calculation, because it must use the whole data set when making prediction for each point, instead of calculating the regression coefficient and obtaining the regression equation, it is generally not selected.
  • Tuning: Balance prediction bias and model variance (high bias means underfitting, high variance means overfitting)
    • Get more training samples – solve high variance
    • Try to use a smaller set of features – solve for high variance
    • Try to get other features – resolve high bias
    • Try adding multiple combination features – resolve high bias
    • Try to reduce λ – resolve high bias
    • Try adding λ – to solve the high variance

Softmax Regression

  • Softmax Regression estimation function (Hypothesis)
  • Softmax Regression Cost Function
  • To understand:
  • Softmax Regression & Logistic Regression:
    • Polyclassification & dichotomies. The Logistic Regression is the Softmax Regression when K=2
    • For class K problems, Softmax Regression can be used when categories are mutually exclusive, and K independent Logistic Regression can be used when categories are not exclusive
  • Summary: Softmax Regression is suitable for classifications with a number of categories greater than 2, and in this case is used to determine the probability that each graph belongs to each number.

References & Recommends

MNIST

Softmax

  • [11]Convex functions
  • [12] Machine Learning lesson 7 — Regularization — Regularization
  • [13]MachineLearning_Python

CNN

  • [21]Stanford University’s Convolutional Neural Networks for Visual Recognition course materials translation
  • [22]July CNN Note: Popular Understanding of convolutional Neural Networks
  • [23] Understanding Convolution
  • [24]Imagenet classification with deep convolutional neural networks
  • [25]Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
  • [26]Convolutional Neural Networks-Basics
  • [27]A technical report on convolution arithmetic in the context of deep learning

TensorFlow+CNN / TensorFlow+Android

  • [31] Google official Demo
  • [32] official Google Codelab
  • [33]deep-learning-cnns-in-tensorflow Github
  • [34]tensorflow-classifier-android
  • [35]creating-custom-model-for-android-using-tensorflow
  • [36] TF – NN Mnist instance









SkySeraph cnBlogs



SkySeraph CSDN


Android+TensorFlow+CNN+MNIST Handwritten Number Recognition


Copyright statement



SkySeraph
SkySeraph
Creative Commons BY-NC-ND 4.0 International License



Bob
SkySeraph
Creative Commons Reserved Attribution – Non-commercial – No Deductive 4.0 International License



SkySeraph
skyseraph.com


Wechat scan code to rewardSkySeraph

If you would like to donate other amount please stamp me ~~, scan the code alipay/wechat

In this paper, permanent links: skyseraph.com/2018/01/10/…