Use TensorFlow to identify simple image captcha

A verification code is required to log in to a company that has a service to capture the data of a website. This type of verification code is similar to the following type, which should be used by many websites.

First of all, because the verification code is relatively simple, the image is not complex, and all numbers. So I tried to use the traditional way, according to the online tutorial to simply change one, using PHP identification. Roughly, the process is pre-processing such as cutting, binarization and denoising, and then storing them in the form of string array, identifying the transmitted pictures and comparing the similarity of strings after pre-processing to select a category with the highest degree of acquaintance. The recognition rate is not very good (the captcha is relatively simple and should be better optimized), vaguely remembering only more than 60%.

Because the identification effect is not ideal, the target website login state or can keep for a long time, there is no need to spend too much energy on this, so I looked for a manual code service. It’s so cheap, it doesn’t cost much for a month, it works fine, but sometimes the delay is high. It’s good enough for our business anyway.

With machine learning coming up, I wondered if I could use it, so I followed TensorFlow’s tutorial on recognizing handwritten numbers.

All the code and data are stored on GitHub at the end of the article. It is recommended to read this article in combination with the code. If there is any misunderstanding or Bug welcome to comment and exchange

What is TensorFlow

TensorFlow is a machine learning framework from Google. A TensorFlow is a TensorFlow. Well.. What is a tensor? Tensors, as I understand them, are data. Tensors have shapes of their own, such as tensors of order 0 are scalars, of order 1 are vectors, of order 2 are matrices… So we will see later that the quantities used in TensorFlow are almost always shaped, because they are tensors.

We can think of TensorFlow as a black box with some pipes in it. We feed it some tensors. It spits out some tensors, and what it spits out is what we want.

So we need to determine what’s being fed, what’s being spit out, how the pipes are constructed.

For more getting started concepts see this Keras Beginner’s Guide » Some basic concepts

Why use TensorFlow

Nothing more than googling your name and nothing more. Roll up your sleeves and get to work. For rapid prototyping, I suggest taking a look at Keras, a machine learning framework purported to be designed for humans, which is user-friendly and provides higher-level interfaces to several machine learning frameworks.

In the process

Capture captcha
Label the captcha
Image preprocessing
Save the data set
Model building training
Use of extraction model

Capture captcha

This is a simple, random way to loop through a bunch of downloads that I won’t go over here. I downloaded 750 captchas here, used 500 for training, and the remaining 250 to validate the model.

Label the captcha

There are 750 captchas, and it would be exhausting to tag each one by hand. At this point, we can use human coding services to do this for us with cheap labor. The identification results are saved after manual coding. Here the code does not provide, see you use which verification code service, I believe you will be able to solve the smart

Image preprocessing

Image information: This verification code is 68×23, JPG format
Binarization: I’m sure this captchas are simple enough to be recognized well after the color information is lost. And the model complexity can be reduced, so we can binarize the image. There are only two colors, all black or all white.
Cutting verification code: to observe the authentication code, there is no special distortion or adhesion, so we can put the authentication code, on average, cut into 4 pieces, identification, respectively, so the picture recognition model is only need to deal with 10 categories (if there is a letter that will be 36 classification) due to the authentication code outside a ring frame, so helping to remove the bezel also.
Processing result: 16×21, black and white 2 bits

The Python code is as follows:

img = Image.open(file).convert('L') # Read the image and grayscale it

img = img.crop((2.1.66.22)) # Cropped edges become 64x21

# Split numbers
img1 = img.crop((0.0.16.21))
img2 = img.crop((16.0.32.21))
img3 = img.crop((32.0.48.21))
img4 = img.crop((48.0.64.21))

img1 = np.array(img1).flatten() # Flattening, turning two dimensions into one
img1 = list(map(lambda x: 1 if x <= 180 else 0, img1)) # binarization
img2 = np.array(img2).flatten()
img2 = list(map(lambda x: 1 if x <= 180 else 0, img2))
img3 = np.array(img3).flatten()
img3 = list(map(lambda x: 1 if x <= 180 else 0, img3))
img4 = np.array(img4).flatten()
img4 = list(map(lambda x: 1 if x <= 180 else 0, img4))
Copy the code

Save the data set

The data set has input input data and tag data, training data and test data. Because of the small amount of data, it is easy to store the data directly in python files for the model to call. I’ll save it to another file and read it in pandas or whatever.

Finally we input model for data shape for [[0,1,0,1,0,1,0,1…], [0,1,0,1,0,1,0,1…],… Label data is special, essentially we are sorting the input data, so although labels should be numbers from 0 to 9, here we make the label data format one-hot Vectors [[1,0,0,0,0,0,0,0,0],…] A one-hot vector is 0** in all dimensions except for one digit, for example [1,0,0,0,0,0,0,0] represents 1 and [0,1,0,0,0,0,0,0,0] represents 2. Furthermore, the one-hot vector here actually represents the probability that the corresponding data is divided into these ten categories. A probability of one is the correct classification.

The Python code is as follows:

Save the input data
def px(prefix, img1, img2, img3, img4):
    with open('./data/' + prefix + '_images.py'.'a+') as f:
        print(img1, file=f, end=",\n")
        print(img2, file=f, end=",\n")
        print(img3, file=f, end=",\n")
        print(img4, file=f, end=",\n")

Save tag data
def py(prefix, code):
    with open('./data/' + prefix + '_labels.py'.'a+') as f:
        for x in range(4):
            tmp = [0.0.0.0.0.0.0.0.0.0]
            tmp[int(code[x])] = 1
            print(tmp, file=f, end=",\n")
Copy the code

After the previous two steps, we now have the data for training and testing and the tag data, and look like this

Model building training

The data is ready, and it’s time to build the pipeline. You need to tell TensorFlow:

1. What is the shape of the input data?

x = tf.placeholder(tf.float32, [None, DLEN])
Copy the code

None means we don’t define how much training data we have, and DLEN is 16*21, the size of the one-dimensional image.

2. What is the shape of output data?

y_ = tf.placeholder("float"[None.10])
Copy the code

Also None means that it does not define how much training data we have, and 10 is the dimension of the tag data, meaning that the image has 10 categories. Each category corresponds to a probability, so it is of floating point type.

3. How to fit input data, model and label data?

W = tf.Variable(tf.zeros([DLEN, 10])) # weights
b = tf.Variable(tf.zeros([10])) # bias

y = tf.nn.softmax(tf.matmul(x, W) + b)
Copy the code

Is it a very simple model? It is basically y = softmax(Wx+ B), where W and B are variables in TensorFlow. They store the data of the model in the training process and need to be defined. The purpose of our model training is to determine the values of W and B, so that this formula can better fit the data. Softmax is what is called an activation function that converts linear results into the pattern we need, which is the distribution of classified probabilities. See the reference link for more explanation of softmax and the like.

4. How to evaluate the quality of the model?

Model training is to make the difference between the output of the model and the actual situation as small as possible. So define the evaluation method. This is evaluated by something called cross entropy.

cross_entropy = -tf.reduce_sum(y_*tf.log(y))
Copy the code

5. How to minimize errors?

Now that TensorFlow knows enough information, all it has to do is make the error in the model small enough that it performs various tricks to make the cross entropy cross_entropy defined above as small as possible. TensorFlow has a number of built-in ways to do this, with different features and conditions. Gradient descent is used here for this purpose.

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
Copy the code

Training to prepare

You know Python doesn’t run very efficiently as an interpreted language, and this kind of machine learning is basically computationally intensive. TensorFlow is written in C++ at the bottom, and is just an operation port on the Python side. All calculations are handled by the bottom layer. This naturally leads to the concept of sessions, where the underlying and calling layers need to communicate. Because of this, TensorFlow supports access to many other languages, such as Java and C, not just Python. Communication with the underlying layer is done through a session. We can start the session with a single line of code:

sess = tf.Session()
# code...
sess.close()
Copy the code

Don’t forget to close the session when you’re done using it. Of course, you can also use Python’s with statement for automatic management.

In TensorFlow, variables need to be initialized after the session starts.

sess.run(tf.global_variables_initializer())
Copy the code

Start training

for i in range(DNUM):
    batch_xs = [train_images.data[i]]
    batch_ys = [train_labels.data[i]]
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Copy the code

We hand the model and training data to the session, and the bottom layer automatically handles it for us. We can pass any amount of data to the model at a time (setting None above), and adjust the data for each batch of training. Sometimes data are randomly selected to get better training results. Here we train one by one, anyway, the final effect is ok. See the reference link for more information.

Check training results

This is where our test data comes in handy

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print(sess.run(accuracy, feed_dict={x: test_images.data, y_: test_labels.data}))
Copy the code

The output of our model is an array, which stores the probability of each classification, so we need to take out the classification with the highest probability and compare it with the test label. Let’s see how accurate we are in 250 tests. Of course, these are also defined steps that the session runs.

Use of extraction model

Above we have trained the model well, and the effect is not bad oh, nearly 99% of the correct rate, perhaps even higher than manual code (get test data often return error values). But the question is, how do I apply this model to production now? It can’t be trained every time. Here, we use TensorFlow’s model save and load capabilities.

Save the model

Save the model when it is being trained, define a saver, and then simply save the session to a directory.

saver = tf.train.Saver()
# Training code
#...
saver.save(sess, 'model/model')
sess.close()
Copy the code

Of course, the Saver has a lot of Settings, such as how many recent batches of training results you can save.

Restoring model

Restoring the model is also simple

saver.restore(sess, "model/model")
Copy the code

Of course, you still need to define the model to recover. My understanding is that the model stores the values of various variables in the training process, weight bias and so on, so the structure frame should be set up in advance.

The last

This is just a good example of how to use TensorFlow to recognize simple captchas, so machine learning is probably not a total waste of time. After all, the model is mindless, which saves a lot of time. If you want to recognize more twisted, more twisted captches, maybe you need to run a convolutional neural network or something like that, you don’t want to lose structure or color. On the other hand, do website security, the pure graphic verification code is probably not as a robot to judge the basis. At the end of the confrontation, this is the abnormal captcha hahaha.

A link to the

https://github.com/purocean/tensorflow-simple-captcha
https://keras-cn.readthedocs.io/en/latest/for_beginners/concepts/
http://wiki.jikexueyuan.com/project/tensorflow-zh/