1. Introduction

Recently, when we want to realize a captcha recognition function, we vacanted between neural network and computer graphics, but the convolutional neural network can realize “end-to-end”, that is, the captcha recognition of input image and output captcha, so we abandoned CV and chose CNN.

I was already familiar with PyTorch, and even had a pyTorch + CUDA + CUDNN environment. However, Tensorflow has tensorflow. js that can be perfectly combined with the cross-platform solution Electron. Move to Tensorflow.

After a while of research (MO) (GAI) official document + Github + StackOverflow + Zhhu + CSDN parties code, finally can use my GTX1660 Dan!

2. Implement

2.1 Image preprocessing

The first step to do is the image pretreatment, at least one gray mode to have it?

Take a look at the image of the captcha to work with:

You can see that there are two lines in there one red, one green, running through the picture. In theory, these two lines can be removed by image manipulation (original picture – red-green).

But now that we’re using convolutional neural networks, let’s not do so much fancy stuff, let’s just go ahead and do it.

A few lines of code grayscale the image and remove the surrounding black lines:

    img = cv2.imread(img_file, cv2.IMREAD_GRAYSCALE)
    np_img = np.array(img)
    flat_img = np_img.flatten()
    ind = np.argmax(np.bincount(flat_img))
    if(ind < 255):
        np_img[0] = ind
        np_img[-1] = ind
        np_img[:, 0] = ind
        np_img[:, -1] = ind
    img = np_img / 255.0
Copy the code

Output picture:

Of course, the image processing here is only because my network model does not require the shape of the input data. In fact, many neural networks require the shape of the input data, such as InceptionV4(299,299),ResNet_18(224,224), etc. It is also possible to extend images based on Opencv-Python. One implementation was to use cv2.copyMakeBorder to make the short side as long as the long side, and then cv2.resize.

Good. Now it’s time to collect enough image captcha for training.


Later, the original binary processing was also done, but later found that only gray performance is also good, so the binary was removed.

2.2 Convolutional neural network construction and training

2.2.1 Training process

After a few days of gathering and hand-marking hundreds of maps (later expanded to 18,000 +) as a training set, it was time to try alchemy.

Have to say that the age of [email protected] is too brilliant, now find most of the 1.x version of the source code.

So I took a look at the github repository and started manual training (magic change) by referring to the tutorial in the official document.

The idea is to feed the captcha images labeled in the training set into the model with one-hot according to the corresponding character set. As you can see from the previous captcha diagram, it has six characters per diagram, with a total of 10 digits +26 uppercase letters =36 possibilities per character. Therefore, it is a multi-class model, the total category is 6*36=216, the loss function categorical_crossentropy, the activation function softmax.

Here is the code to build the model (the body is in a class, of course, omitted)

    def generate_model(self) :
        self.model = model = models.Sequential()
        input_shape=(self.image_height, self.image_width, 1)
        # conventional
        model.add(layers.Conv2D(96, (3.3), activation='relu',padding="SAME"))
        model.add(layers.Conv2D(96, (3.3), activation='relu',padding="SAME"))
        model.add(layers.MaxPooling2D((2.2)))
        # conv2
        model.add(layers.Conv2D(128, (3.3), activation='relu',padding="SAME"))
        model.add(layers.Conv2D(128, (3.3), activation='relu',padding="SAME"))
        model.add(layers.Conv2D(128, (3.3), activation='relu',padding="SAME"))
        model.add(layers.MaxPooling2D((2.2)))
        # conv3
        model.add(layers.Conv2D(128, (3.3), activation='relu'))
        model.add(layers.Conv2D(128, (3.3), activation='relu'))
        model.add(layers.MaxPooling2D((2.2)))
        # conv4
        model.add(layers.Conv2D(128, (3.3), activation='relu'))
        model.add(layers.Conv2D(128, (3.3), activation='relu'))
        model.add(layers.MaxPooling2D((2.2)))
        # flatten
        model.add(layers.Flatten())
        # fc1
        model.add(layers.Dense(384, activation='relu'))
        model.add(layers.AlphaDropout(rate=0.2))
        # fc2
        model.add(layers.Dense(512, activation='relu'))
        model.add(layers.AlphaDropout(rate=0.2))
        # output
        model.add(layers.Dense(self.max_captcha*self.char_set_len, activation='softmax'))
        model.add(layers.Reshape((self.max_captcha,self.char_set_len)))
        model.summary()
        model.compile(optimizer=tf.keras.optimizers.Nadam(learning_rate=1e-5, clipnorm=1),
                      loss='categorical_crossentropy',
                      metrics=['accuracy'])
        return model
Copy the code

Compile produces a network model that looks like this:

Model: "sequential"
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 80, 300, 96) 960 conv2d_1 (Conv2D)           (None, 80, 300, 96)       83040

 conv2d_2 (Conv2D) (None, 80, 300, 96) 83040 max_pooling2d (MaxPooling2D  (None, 40, 150, 96)      0
 )

 conv2d_3 (Conv2D) (None, 40, 150, 128) 110720 conv2d_4 (Conv2D)           (None, 40, 150, 128)      147584

 conv2d_5 (Conv2D) (None, 40, 150, 128) 147584 max_pooling2d_1 (MaxPooling (None, 20, 75, 128) 0 2D) conv2d_6 (Conv2D)           (None, 18, 73, 128)       147584

 conv2d_7 (Conv2D) (None, 16, 71, 128) 147584 max_pooling2d_2 (MaxPooling (None, 8, 35, 128) 0 2D) conv2d_8 (Conv2D)           (None, 6, 33, 128)        147584

 conv2d_9 (Conv2D) (None, 4, 31, 128) 147584 max_pooling2d_3 (MaxPooling (None, 2, 15, 128) 0 2D) flatten (Flatten) (None, 3840) 0 dense (Dense) (None, 384) 1474944 alpha_dropout (AlphaDropout  (None, 384)              0
 )

 dense_1 (Dense) (None, 512) 197120 alpha_dropout_1 (AlphaDropo (None, 512) 0 ut) dense_2 (Dense) (None, 216) 110808 reshape (Reshape) (None, 6, 36) 0 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 2946136 Trainable params: 2946136 Non - trainable params: 0Copy the code

It is well known that networks can be optimized by increasing the network layers (depth, for example, from ResNet (He et al.) from RESnet18 to RESnet200), or by increasing the width, For example, WideResNet (Zagoruyko & Komodakis, 2016) and Mo-Bilenets (Howard et al., 2017) can expand the width (#channels) of the network, Also, a larger input image size (resolution) can help improve accuracy. — Zhihu elder brother

The original model of the code used by Alexnet is reused, but the calculation is complicated, there are many layers, too many parameters, and the model file is also large. After practicing for a while, I found that the convergence is super slow. Later, the current model was obtained through a certain trade-off combined with application scenarios.

I have to say tF2 is much better than TF1! Apply the Model.fit method to begin alchemy!

I added two callbacks to model.fit, one to save breakpoints and the other to save historical data.

    def train_cnn(self) :
        x_train, y_train = self.get_train_data()
        cp_path = './cp.h5'
        print(np.any(np.isnan(y_train)))
        print(y_train[0])
        save_chec_points = tf.keras.callbacks.ModelCheckpoint(
            filepath=cp_path, save_weights_only=False, save_best_only=True)
        try:
            model = tf.keras.models.load_model(self.model_save_dir)
        except Exception as e:
            model = self.generate_model()
            try:
                model.load_weights(cp_path)
            except Exception as e:
                pass
        filename = 'log.csv'
        history_logger = tf.keras.callbacks.CSVLogger(
            filename, separator=",", append=True)

        history = model.fit(x_train, y_train, batch_size=24,
                            epochs=200, validation_split=0.1, callbacks=[save_chec_points, history_logger],
                            shuffle=True)
        model.save(self.model_save_dir)
        plt.plot(history.history['accuracy'], label='accuracy')
        plt.plot(history.history['val_accuracy'], label='val_accuracy')
        plt.xlabel('Epoch')
        plt.ylabel('Accuracy')
        plt.ylim([0.1])
        plt.legend(loc='lower right')
        plt.show()
Copy the code

Callback adds two functions to the callback to store checkpoint and epoch data.

Find a long time to turn on the computer alchemy, practice 200 epoch, draw the following image (forget to draw loss, nevermind) :

It can be seen that after training 200 epochs, the accuracy of the model has been about 0.9. Then I added another 50 epochs and stabilized at 0.94.

100 pictures of this artificial retarded it can probably understand 94, this performance can! What bicycle is needed as a product of no tuning.

2.2.2 Tread pit records

When you’re writing code, you’re going to have to step on holes.

First, she stepped on a sinkhole. Every time she trained, the loss would increase rapidly in the first epoch until she became NaN… What’s going on here? Obviously the loss function and the activation function seem to be correct!

Took a long time to find!

It turned out that my model code was modified from the official document tutorial, and the last step in it was finished by activating the function softmax, while the training part was modified from someone else’s TF1 code. But finding out that categorical_crossentropy is 0 is needed that the output should not be (1,216) but 0 is 0 0

0 add model. Add (layers. new ((self.max_captcha,self.char_set_len))) 0) 0 make it a (0, 6,36) shape

Then, it was found that the loss would become NaN in a random epoch in the middle of the process, and the historical records were clueless. According to some websites to check whether the input and label are NaN, found that is not relevant.

It is found that when the optimal algorithm is Adam and the loss function is categorical_crossentropy, the irrational NaN change will not occur. Changing the optimal algorithm to SGD or other methods will not occur.

Later, other verification code recognition models were trained, and it was found that with the BatchNormalization layer, convergence was faster, and the lost and accuracy changes of each epoch were more stable.

Then you put in the tensor (N,80,300,1), output the tensor (N,6,36), and then you match the output tensor according to onehot order.

3. Summary

Although it was just a Toy Project, I still put a lot of effort into it, from environment matching to code writing to alchemy. It was the first successful Project of Tensorflow2. Along the way, I am very grateful for my GTX1660 silent pay!

This should be my last technical article in 2021, end scatter flower!