In this paper, we use TensorFlow to implement a deep learning model to realize the process of captcha recognition. The captcha identified here is a graphic captcha. First, we will train a model with annotated data, and then use the model to realize the identification of the captcha.

1. Prepare the verification code

Here we use python’s CAPTcha library, which is not installed by default, so we need to install this library first, and we also need to install the Pillow library

Once installed, we can use the following code to generate a simple graphical captcha

You can see that the text in the picture is exactly what we defined, so we have a picture and the actual text that corresponds to it, which we can then use to generate a batch of training data and test data.

2. The preprocessing

Now we first define the captcha text content to be generated, which is equivalent to having the label, and then we use it to generate the captcha to get the input data X. In this case, we first define our input word list. Since the word list of upper and lower case letters and numbers is quite large, suppose we use captcha with both upper and lower case letters and numbers, a captcha with four characters, the total number of possible combinations is (26 + 26 + 10) ^ 4 = 14776336, which is a bit large to train, so let’s simplify here. Using only a pure number of captcha to train, so the number of combinations is 10 ^ 4 = 10000, obviously a lot less.

So here we define a word list and its length variable:

VOCAB is the content of the word list, that is, the 10 numbers 0 to 9, the number of characters of CAPTCHA_LENGTH is 4, and the length of the word list is the length of VOCAB, that is, 10.

Next, we define a method to generate the captcha data in a similar way, except that we return the data as an array of Numpy:

This method converts the captcha code to RGB of each pixel. Let’s call this method and try it:

The contents are as follows:

As you can see, its shape is (60, 160, 3), which actually represents the captcha. The height of the image is 60, the width is 160, and the captcha is 60 x 160 pixels. Each pixel has an RGB value, so the last dimension is the RGB value of the pixel.

Next, we need to define label. Since we need to use deep learning model for training, it is better to use one-hot encoding for label data here. That is, if the captta-text is 1234, then the index position of the word list should be set to 1, and the total length is 40. We use the program to achieve one-hot code and text conversion:

The text2vec() method converts the real text to one-hot encoding, and the vec2Text () method converts the one-hot encoding back to real text.

For example, by calling these two methods, we convert the 1234 text to one-hot code, and then turn it back:

In this way we can translate text to One-Hot code.

Next, we can construct a batch of data, x data is the Numpy array of captcha, y data is the one-hot encoding of captcha text, the generated content is as follows:

Here we define a getrandomText () method that randomly generates captch-code text. We then use this randomly generated text to generate the corresponding X and y data, which we then write to the pickle file, thus finishing the preprocessing.

3. Build models

After we have the data, we start to build the model. Here, we still use the traintestsplit() method to divide the data into three parts, namely training set, development set and verification set:

Next we build three Dataset objects with three datasets:

Then initialize an iterator and bind it to the data set:

The next is the key part. Here, we use three-layer convolution and two-layer fully connected network for construction. In order to simplify the writing method, we directly use the Layers module of TensorFlow:

Here the convolution kernel size is 3, the padding uses the SAME mode, the activation uses relu.


After the transformation of the fully connected network, the shape of y becomes [batchsize, nclasses]. Our label is a combination of CAPTCHALENGTH one-hot vectors. In this case, we want to use cross entropy to calculate, but when we calculate cross entropy, The sum of each element in the last dimension of the label parameter vector must be 1, otherwise there will be problems when calculating the gradient. See the official TensorFlow documentation for details:

https://www.tensorflow.org/apidocs/python/tf/nn/softmaxcrossentropywithlogits


But the current label parameter is a combination of the one-hot vectors of CAPTCHALENGTH, so the sum of the elements here is CAPTCHALENGTH, so we need to come back and make sure that the last dimension is 0:



So we can make sure that the last dimension is VOCAB_LENGTH, which is a one-hot vector, so the sum of the elements must be 1.

Then the Loss and Accuracy are calculated:

Then you can perform the training:

Here, we first initialize traininitializer, bind iterator to Train Dataset, and then execute trainop to obtain loss, ACC, Gstep and other results and output them.

training

The results of running the training process are similar to the following:

test

We can also save the model every few epochs for the training process:

Of course, the model with the highest accuracy in the verification set can also be saved.

To verify, we can Reload the model and then verify: