Handwritten number recognition using TensorFlow

The author | Jackie chan edit | jing Lou

Chen Long, Tencent Jitong Product Department Android development engineer, responsible for the development and maintenance of Android QQ. Keen on machine learning research and sharing.

It is very easy for humans to recognize handwritten numbers. Without even thinking about it, we can see that the numbers below are 5,0,4,1.

But getting a machine to recognize the numbers is much more difficult.

If you were to write a program in a traditional programming language, such as Java, to recognize these different shapes of numbers, what would you do? Write many methods to detect basic shapes such as horizontal, vertical and circle, and then calculate their relative positions. I think you’ll soon sink into despair. Even if you spend a lot of time writing a program, it will not be accurate. While you struggle in the dark room of traditional programming methods, the higher order approach of machine learning offers a window.

To find a way to recognize handwritten numbers, Yann LeCun, a master of machine learning, uses the handwritten digital library of NIST(National Institute of Standards and Technology) to construct a subset MNIST which is convenient for machine learning research. MNIST consists of 70,000 handwritten numeric (0~9) images (grayscale) written by many different people, 60,000 of which are training sets and another 10,000 are test sets. Each image is 28 x 28 pixels in size and the number is 20 x 20 in size, located in the center of the image. More detailed information is available Yann LeCun website: yann.lecun.com/exdb/mnist/

Many researchers have made use of this data set to conduct research on handwritten number recognition, and many methods have been proposed, such as KNN, SVM, neural network, etc., with the accuracy reaching the human level.

So let’s go back to the drawing board and figure out how to use machine learning to recognize these handwritten numbers. Since numbers only contain numbers from 0 to 9, for any given image, we need to determine which number from 0 to 9 it is, so it is a classification problem. For the original image, we can think of it as a 28 x 28 matrix, or more simply as a one-dimensional array of length 784. Thinking of the image as a one-dimensional array will not make sense of the two-dimensional structure in the image, so let’s do that for the moment and see what precision we can achieve. After this analysis, it was natural to think that Softmax regression could solve this problem. For Softmax Regression, please refer to the following article:

Ufldl.stanford.edu/wiki/index….

Our model is as follows:

For a picture, we need to figure out the probability that it’s between 0 and 9, which is the highest probability, and that’s what we think the picture is. We give each pixel 10 weights, corresponding to 0 to 9, so that our weight matrix is 784 x 10. The model in the figure above can be expressed by the formula, namely:

Write it as a vector:

The Softmax function normalized n non-negative values to values between 0 and 1, forming a probability distribution.

So now that we have the model, what is our cost function? How do you evaluate the gap between the model output and the real value? We can represent the number as a 10-dimensional vector, and then set the number of elements to 1 and the rest to 0, as shown below:

For example, 1 can be expressed as: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]. In this way, we can use cross entropy to measure the gap between the output result of the model and the real value, which is defined as:

Where y is the probability distribution of the output of the model, y prime is the true value, or you can view it as a probability distribution, except that only one value is 1 and all the others are 0. The cost function is obtained by summing up the difference between each sample in the training set and the true value. This function can be minimized by gradient descent.

For cross-entropy, see the following article:

Colah. Making. IO/posts / 2015 -…

Now that we have the model and cost function, let’s use TensorFlow to implement it as follows:

from __future__ import absolute_import from __future__ import division from __future__ import print_function import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data # Import data mnist = input_data.read_data_sets('input_data/', one_hot=True) # Create the model x = tf.placeholder(tf.float32, [None, 784]) W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) y = tf.matmul(x, W) + b y_ = tf.placeholder(tf.float32, [None, 10]) # Define loss and optimizer # The raw formulation of cross-entropy, # # tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.nn.softmax(y)), # reduction_indices=[1])) # # can be numerically unstable. # # So here we use tf.nn.softmax_cross_entropy_with_logits on  the raw # outputs of 'y', and then average across the batch. cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(labels=y_, Logits = y)) train_step = tf. Train. GradientDescentOptimizer (0.5). Minimize (cross_entropy) sess = tf. InteractiveSession () tf.global_variables_initializer().run() # Train for _ in range(1000): batch_xs, batch_ys = mnist.train.next_batch(100) sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) # Test trained model correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))Copy the code

Let me explain some of the more important code:

 mnist = input_data.read_data_sets('input_data/', one_hot=True)
Copy the code

This line of code is used to download (if not download) and read MNIST’s training set, test set, and validation set (the validation set can be ignored for now). The input_data module is included in the TensorFlow installation package, so we can import it directly. Mnist.test. images and mnist.test.labels can be used to retrieve test set images and labels. The method provided by TensorFlow takes 5000 samples from the training set as the verification set, so the sizes of the training set, test set and verification set are 55000, 10000 and 5000 respectively.

‘input_data/’ is the name of the folder where you store the downloaded dataset.

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
Copy the code

Here we simply initialize all the parameters to 0. There are many tricks to initializing parameters in complex models.

y = tf.matmul(x, W) + b
Copy the code

This row here is the model that we built, a very simple model. Tf.matmul stands for matrix multiplication.

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
Copy the code

This line is equivalent to:

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.nn.softmax(y)), reduction_indices=[1]))
Copy the code

What y_ * tf.log(tf.nn.softmax(y) does is to calculate the cross-entropy for each sample. Since this calculation is not stable, tF.nn.softMAX_cross_entropy_with_logits is used instead.

Train_step = tf. Train. GradientDescentOptimizer (0.5). Minimize (cross_entropy)Copy the code

In this line, the gradient descent method provided by TensorFlow is used to adjust parameters in the process of minimizing the cost function, and the learning rate is set to 0.5.

for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Copy the code

The three lines of code are for training. Since the training set contains 55,000 samples, which are too large, Stochastic Gradient Descent is adopted, which greatly reduces the amount of calculation and can effectively train parameters to make them converge. Mnist.train.next_batch method is to randomly select 100 samples from the training set to train. The number of iterations is 1000.

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Copy the code

These three lines are used to test the accuracy of model parameters on the test set after they are trained.

Running our code results in the following:

As you can see, such a simple model can achieve an accuracy of 92%. Amazing~

Handwritten number recognition using TensorFlow

Related Posts

Having been an interviewer at Microsoft for 4 years, share the correct posture to swipe LeetCode

3D rocker controller a simple implementation! Cocos Creator 3D!

MySQL 8.0 User and role management