This is the official classic example of using TensorFlow to recognize handwritten numbers. The data is already processed and ready, but only to calculate the probability of accuracy that stops, the lack of actual images to use examples, beginners will be confused after reading. Therefore, the second paragraph of this paper uses some actual pictures to verify our model. The examples in this article are based on TensorFlow 1.0.0.

The first part introduces some basic knowledge of processing data, then takes a simple model, trains it with a bunch of accurate data, and after training, evaluates the accuracy of the model with another bunch of data (this is also the content of the official example). It’s important to figure this out, because otherwise the official example will be awesome, but you don’t know what it is.

In the second part, we take a couple of images, tell the model what we think the image is (this is random, of course), and then the model tells us if what it thinks is consistent with what we think.

There are many nouns and mathematical algorithms do not understand it does not matter, slowly check, first run an example to feel.

Some of the images in this article are from official documents.

Recognition of handwritten pictures

Because this example is the official example of TensorFlow, I won’t go into too much detail, but I will add a little personal understanding. The English document is the latest, and the Chinese document uses TensorFlow version 0.5, which can’t run in version 1.0. It is suggested to read the Chinese document and English document cross, which is helpful for understanding.

To prepare data

The handwritten images used for recognition here look something like this, with each image being 28 by 28 in size to reduce complexity.

But if we just throw the image to our model, the model doesn’t know it, so we have to do something with the image.

If you know linear algebra, you know that each pixel of an image can actually be represented as a two-dimensional matrix, and you do all kinds of transformations on this matrix, like flipping it, and so on, so your handwritten image might look something like this:

This matrix expands to a vector of length 28 times 28 is 784. We also need another thing to tell the model what we think the image is, which is to label the image. There is an array of 10 elements, only one of which is 1, and all the others are 0. Which bit is 1 indicates the corresponding image. For example, the label value representing the number 8 is ([0,0,0,0, 1,0]).

These are the data processing of single image. In fact, in order to efficiently train the model, image data and label data will be packaged together respectively, namely MNIST data set.

MNIST data set

The MNIST dataset is an entry-level computer vision dataset available on Yann LeCun’s website. There is no need to download this dataset manually, TensorFlow 1.0 will download it automatically.

This training data set has 55000 image data, which is organized in the way of tensors, with the shape of [55000,784], as shown below:

Remember why it was 784, because 28 by 28. The same is true for label, [5500010] :

In addition to training data, this dataset contains 10,000 data for testing model accuracy.

The entire dataset looks something like this:

Now that we have the data, let’s look at our model.

Softmax regression model

Softmax is called normalized exponential function in Chinese. This model can be used to assign probabilities to different objects. Such as the judge

You might think 80% of them are 9, and 5% of them might think 8, because they have a circle on them.

We weighted the sum over the pixel values of the image. For example, if a pixel has strong evidence that the graph is not 1, the corresponding weight of the pixel will be negative, and conversely, if the pixel is particularly favorable, the weight will be positive.

In the figure below, the red areas represent negative weights and the blue represent positive weights.

At the same time, there is a bias to reduce the amount of irrelevant interference.

The principle of Softmax regression model is probably like this. For more derivation process, you can refer to the official documents for more detailed content.

After all that talk, we’re finally ready to code.

Training model

Some of the packages introduced are not listed here, mainly two, one is tensorFlow itself, and the other is the method used to input data in the official example.

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tfCopy the code

And then we can build our model.

  # Create the model
  x = tf.placeholder(tf.float32, [None.784])
  W = tf.Variable(tf.zeros([784.10]))
  b = tf.Variable(tf.zeros([10]))
  y = tf.matmul(x, W) + bCopy the code

The code here is a placeholder that needs to be filled in to be useful.

  • X is read from the image data file, understood as a constant, an input value, because it’s a 28 by 28 image, so this is 784;
  • W stands for weight, which is [784, 10] because there are 784 points and then the weight of 10 numbers. This value will be adjusted continuously during the operation of the model, which can be understood as a variable.
  • B is the offset, and every number has a different offset, so this is 10, and this is going to be adjusted as the model works, and it’s also a variable;
  • Y is calculated based on the previous data matrix product.

Tf. zeros means zero.

We’re going to need something to take the correct input, which is the correct label to put in the training.

  # Define loss and optimizer
  y_ = tf.placeholder(tf.float32, [None, 10])Copy the code

We use something called cross entropy to measure the “surprise” of our predictions.

In terms of cross entropy, for example, when we write code, as soon as we compile it and everything runs, we’re not so “surprised” because our code is so good; But if we press the compile, the whole Crash, we are very “surprised”, a face is confused, how is this possible.

The perceptual understanding of cross entropy is to express this. When the output value is consistent with our expectation, the “surprise” value will be low; when the output value is not what we expect, the “surprise” value will be high.

  cross_entropy = tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))Copy the code

Here, softmax model implemented by TensorFlow is used to calculate cross entropy. Cross entropy, that’s what we want to optimize as much as possible, so that it fits our expectations, and doesn’t surprise us too much.

TensorFlow automatically uses a backpropagation algorithm to effectively determine how variables affect the cross entropy you want to minimize. TensorFlow then continuously modifies the variables to reduce the cross entropy using the optimization algorithm of your choice.

  train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)Copy the code

Here, the gradient Descent algorithm is used to optimize the cross entropy. Here, the cross entropy is optimized a little bit at the speed of 0.5.

After that, it’s time to initialize variables and start the Session

  sess = tf.InteractiveSession()
  tf.global_variables_initializer().run()Copy the code

After that, start training!

  # Train
  for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})Copy the code

The feed_dict={x: batch_xs, y_: batch_ys} is the two placeholder input values we set earlier.

At this point basic training is complete.

Evaluation model

After the training, we evaluate the accuracy of the model.

  # Test trained model
  correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                      y_: mnist.test.labels}))Copy the code

Tf. argmax gives an index of the maximum value of a tensor object in one dimension. Since our label has only one 1, tf.argmax(_y, 1) is the index of the label, that is, the image number. Equal the calculated value to the predicted value to determine whether the model is accurate. Accuracy below measures overall accuracy.

The data filled here is not training data, but test data and test label.

In the end, mine was 0.9151, 91.51% accurate. Officials say this is not good, but with better models, such as multi-layer convolutional networks, the recognition rate can be over 99%.

This is the end of the official example. Although TENS of thousands of pictures were identified, I didn’t see any decent pictures, so I decided to use this model to really find some pictures to test.

Model test

Looking at the example above, the point is to put the test data in here. If we want to take a picture test, we need to put the picture into the corresponding format.

sess.run(accuracy, feed_dict={x: mnist.test.images,
                                      y_: mnist.test.labels})Copy the code

See here mnist from tensorflow. Examples. Tutorials. Mnist input_data read_data_sets in the method.

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)Copy the code

Python is good, what don’t understand the source code. The source code is available online here

Open the read_data_sets method and find:

from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_setsCopy the code

It’s in this file.

def read_data_sets(train_dir,
                   fake_data=False,
                   one_hot=False,
                   dtype=dtypes.float32,
                   reshape=True,
                   validation_size=5000):. . . train = DataSet(train_images, train_labels, dtype=dtype, reshape=reshape) validation = DataSet(validation_images, validation_labels, dtype=dtype, reshape=reshape) test = DataSet(test_images, test_labels, dtype=dtype, reshape=reshape)return base.Datasets(train=train, validation=validation, test=test)Copy the code

The feeddict={x: mnist.test.images, y: mnist.test.labels} is a DataSet object. This object is also inside this file.

class DataSet(object):

  def __init__(self, images, labels, fake_data=False, one_hot=False, dtype=dtypes.float32, reshape=True):
    """Construct a DataSet. one_hot arg is used only if fake_data is true. `dtype` can be either `uint8` to leave the input as `[0, 255]`, or `float32` to rescale into `[0, 1]`. """. . .Copy the code

It’s a long object, so I’m just going to focus on the constructor. Make sure you pass in images and labels. In fact, it has been relatively clear here. As long as we put a single picture into mnIST format and input it into this DataSet, we can get the data we want.

Internet still true have, under the code addresses, corresponding article: www.jianshu.com/p/419557758… TensorFlow 1.0 and the actual directory need to be modified.

Go straight to my modified code:

from PIL import Image
from numpy import *

def GetImage(filelist):
    width=28
    height=28
    value=zeros([1,width,height,1])
    value[0.0.0.0] =- 1
    label=zeros([1.10])
    label[0.0] =- 1

    for filename in filelist:
        img=array(Image.open(filename).convert("L"))
        width,height=shape(img);
        index=0
        tmp_value=zeros([1,width,height,1])
        for i in range(width):
            for j in range(height):
                tmp_value[0,i,j,0]=img[i,j]
                index+=1

        if(value[0.0.0.0] = =- 1):
            value=tmp_value
        else:
            value=concatenate((value,tmp_value))

        tmp_label=zeros([1.10])
        index=int(filename.strip().split('/') [2] [0])
        print "input:",index
        tmp_label[0,index]=1
        if(label[0.0] = =- 1):
            label=tmp_label
        else:
            label=concatenate((label,tmp_label))

    return array(value),array(label)Copy the code

We rely on PIL to read images here. Since PIL is less maintained, we can use a branch of PIL, Pillow, instead. Also rely on numpy this scientific computing library, not installed to install.

This is to read the image and format it according to mnist. Label is the first word of the file name of the image, so the image should be named with a number.

If you are too lazy to use Photoshop or handwritten drawings, you can transfer the data from the test data set back to the image

Create a new folder called test_num, which contains the following image:

Start testing:

  print("Start Test Images")

  dir_name = "./test_num"
  files = glob2.glob(dir_name + "/*.png")
  cnt = len(files)
  for i in range(cnt):
    print(files[i])
    test_img, test_label = GetImage([files[i]])

    testDataSet = DataSet(test_img, test_label, dtype=tf.float32)

    res = accuracy.eval({x: testDataSet.images, y_: testDataSet.labels})

    print("output: ",  res)
    print("-- -- -- -- -- -- -- -- -- --")Copy the code

The glob2 library is used for traversal and filtering of files, which needs to be installed. Regular traversal also includes.ds_store files on the Mac.

It can be seen that the label we typed is consistent with that calculated by the model.

Then we can shuffle the file name and say 9 as 8 and 0 as 8:

It can be seen that our label is inconsistent with that calculated by the model.

Congratulations, you’ve just completed a simple artificial intelligence journey.

conclusion

From this example we can get a general idea of the operation mode of TensorFlow:

In the example where you have to go through the training flow each time, you can actually use tf.train.saver () to save the trained model. This introductory example is completed to give you a sense of TensorFlow.

TensorFlow is not as mysterious, not as complex, not as simple as we thought, and there’s a lot of math to fill in.

In addition, THIS aspect I am also a small white, if there are mistakes in the article, welcome to be corrected.

Demo code address: github.com/bob-chen/te…

twitter

Record some thoughts and thoughts, write science and technology and humanities, write life conditions, write reading experience, mainly bullshit and feeling. Welcome to pay attention and exchange.

Wechat official number: poem and distance of programmer

Public ID: Monkeycoder-Life

reference

Github.com/wlmnzf/tens…

Blog.csdn.net/u014046170/…

www.jianshu.com/p/419557758…

Zhuanlan.zhihu.com/p/22410917?…

Stackoverflow.com/questions/3…