The last article realized CNN single label classification of pictures (cat and dog picture classification task)

Address: juejin. Cn/post / 684490…

The next article uses LSTM+CTC to achieve OCR of indefinite text, which is essentially a multi-label classification problem with an unfixed number of labels

Address: juejin. Cn/post / 684490…

10W verification code data set used in this article baidu web disk download address (can also be generated by using the following code) :

Pan.baidu.com/s/1N7bDHxIM…

Using the model trained and generated by the code in this article (corresponding to the Model folder in the project) :

Pan.baidu.com/s/1GyEpLdM5…

Project Introduction:

PIP install captcha==0.1.1, PIP install opencv-python, PIP install flask, PIP install tensorflow/ PIP install tensorflow-GPU) This paper uses CNN to implement 4-bit fixed-length captcha image OCR (the generated captcha is fixed by random 4-bit capital letters). It is essentially a classification problem of multiple labels in one image (the data is shown in the figure below)

Overall training logic:

1. The image is passed into CNN to extract features

2. Stretch the feature map and input it into the FC layer to obtain the classification prediction vector

3. Train the prediction vector and label vector by sigmoID cross entropy function, and get the final model (note: SigMOID is used for multi-label classification task, softmax is used for single-label classification)

Overall prediction logic:

1. The image is passed into CNN (VGG16) to extract features

2. Stretch the feature map and input it into the FC layer to obtain the classification prediction vector

3. Perform sigmoid operation on the prediction vector. Since the verification code is fixed at 4 bits, the vector is divided into 4 pieces, the maximum value is found from each piece and mapped to the corresponding letter

Make it a Web service:

The flask framework is used to start the whole project as a Web service, enabling the project to support HTTP invocation. After starting the service, the following address tests are invoked

http://127.0.0.1:5050/captchaOcr?img_path=./dataset/test/0_HZDZ.png

http://127.0.0.1:5050/captchaOcr?img_path=./dataset/test/1_CKAN.png

Subsequent optimization logic:

CNN of feature extraction part can be replaced by RNN

This scheme can only OCR fixed-length text, and LSTM+CTC is adopted to OCR non-fixed-length text

Run the command:

Self. Im_total_num: python cnnocr. py create_dataset

Data set training: Python cnnocr.py train

Test the new image: python cnnocr.py test

Start as an HTTP service: python cnnocr.py start

Project directory structure:

Training process:

The overall code is as follows:

# coding:utf-8

from captcha.image import ImageCaptcha
import numpy as np
import cv2
import tensorflow as tf
import random, os, sys

from flask import request
from flask import Flask
import json
app = Flask(__name__)


class CnnOcr:
    def __init__(self):
        self.epoch_max = 6  # Maximum number of iterations of the epoch
        self.batch_size = 64  # During the training, the number of images for each batch to participate in the training can be reduced if the video memory is insufficient
        self.lr = 1e-3  # Initial learning rate
        self.save_epoch = 1  Save the model every number of epochs


        self.im_width = 128
        self.im_height = 64
        self.im_total_num = 100000  # Total number of captcha images generated
        self.train_max_num = self.im_total_num  # Maximum number of images read during training
        self.val_num = 50 * self.batch_size  # cannot be greater than self.train_max_num
        self.words_num = 4  # Number of numbers on each captcha image
        self.words = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
        self.label_num = self.words_num * len(self.words)
        self.keep_drop = tf.placeholder(tf.float32)
        self.x = None
        self.y = None


    def captchaOcr(self, img_path):
        ""Verification code recognition :param img_path: :return:""
        im = cv2.imread(img_path)
        im = cv2.resize(im, (self.im_width, self.im_height))
        im = [im]
        im = np.array(im, dtype=np.float32)
        im -= 147
        output = self.sess.run(self.max_idx_p, feed_dict={self.x: im, self.keep_drop: 1.})
        ret = ' '
        for i in output.tolist()[0]:
            ret = ret + self.words[int(i)]
        return ret


    def test(self, img_path):
        """Test interface :param img_path: :return:"""
        self.x = tf.placeholder(tf.float32, [None, self.im_height, self.im_width, 3])  # Input data
        self.pred = self.cnnNet()
        self.output = tf.nn.sigmoid(self.pred)
        self.predict = tf.reshape(self.pred, [-1, self.words_num, len(self.words)])
        self.max_idx_p = tf.argmax(self.predict, 2)

        saver = tf.train.Saver()
        # tfconfig = tf.ConfigProto(allow_soft_placement=True)
        Per_process_gpu_memory_fraction = 0.3 # Memory occupied percentage
        # self.ses = tf.Session(config=tfconfig)
        self.sess = tf.Session()
        self.sess.run(tf.global_variables_initializer())  # global TF variable initialization

        Load the w,b parameters
        saver.restore(self.sess, './model/CnnOcr-6')
        im = cv2.imread(img_path)
        im = cv2.resize(im, (self.im_width, self.im_height))
        im = [im]
        im = np.array(im, dtype=np.float32)
        im -= 147
        output = self.sess.run(self.max_idx_p, feed_dict={self.x: im, self.keep_drop: 1.})
        ret = ' '
        for i in output.tolist()[0]:
            ret = ret + self.words[int(i)]
        print(ret)


    def train(self):
        x_train_list, y_train_list, x_val_list, y_val_list = self.getTrainDataset()

        print('Start transforming the tensor queue')
        x_train_list_tensor = tf.convert_to_tensor(x_train_list, dtype=tf.string)
        y_train_list_tensor = tf.convert_to_tensor(y_train_list, dtype=tf.float32)

        x_val_list_tensor = tf.convert_to_tensor(x_val_list, dtype=tf.string)
        y_val_list_tensor = tf.convert_to_tensor(y_val_list, dtype=tf.float32)

        x_train_queue = tf.train.slice_input_producer(tensor_list=[x_train_list_tensor], shuffle=False)
        y_train_queue = tf.train.slice_input_producer(tensor_list=[y_train_list_tensor], shuffle=False)

        x_val_queue = tf.train.slice_input_producer(tensor_list=[x_val_list_tensor], shuffle=False)
        y_val_queue = tf.train.slice_input_producer(tensor_list=[y_val_list_tensor], shuffle=False)

        train_im, train_label = self.dataset_opt(x_train_queue, y_train_queue)
        train_batch = tf.train.batch(tensors=[train_im, train_label], batch_size=self.batch_size, num_threads=2)

        val_im, val_label = self.dataset_opt(x_val_queue, y_val_queue)
        val_batch = tf.train.batch(tensors=[val_im, val_label], batch_size=self.batch_size, num_threads=2)

        print('Start training')
        self.learning_rate = tf.placeholder(dtype=tf.float32)  # Dynamic learning rate
        self.x = tf.placeholder(tf.float32, [None, self.im_height, self.im_width, 3])  # Training data
        self.y = tf.placeholder(tf.float32, [None, self.label_num])  # label
        self.pred = self.cnnNet()
        self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=self.pred, labels=self.y))
        self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)

        self.predict = tf.reshape(self.pred, [-1, self.words_num, len(self.words)])
        self.max_idx_p = tf.argmax(self.predict, 2)

        self.y_predict = tf.reshape(self.y, [-1, self.words_num, len(self.words)])
        self.max_idx_l = tf.argmax(self.y_predict, 2)

        self.correct_pred = tf.equal(self.max_idx_p, self.max_idx_l)
        self.accuracy = tf.reduce_mean(tf.cast(self.correct_pred, tf.float32))

        with tf.Session() as self.sess:
            # global TF variable initialization
            self.sess.run(tf.global_variables_initializer())
            coordinator = tf.train.Coordinator()
            threads = tf.train.start_queue_runners(sess=self.sess, coord=coordinator)

            # Save the model
            saver = tf.train.Saver()

            batch_max = len(x_train_list) // self.batch_size
            total_step = 1
            for epoch_num in range(self.epoch_max):
                lr = self.lr * (1 - (epoch_num/self.epoch_max) ** 2)  # Dynamic learning rate
                for batch_num in range(batch_max):
                    x_train_tmp, y_train_tmp = self.sess.run(train_batch)
                    # print(x_train_tmp.shape, y_train_tmp.shape)
                    # sys.exit()

                    self.sess.run(self.optimizer, feed_dict={self.x: x_train_tmp, self.y: y_train_tmp, self.learning_rate: lr, self.keep_drop: .5})

                    # Output evaluation criteria
                    if total_step % 50 == 0 or total_step == 1:
                        print(a)print('epoch:%d/%d batch:%d/%d step:%d lr:%.10f' % ((epoch_num + 1), self.epoch_max, (batch_num + 1), batch_max, total_step, lr))

                        # Output training set evaluation
                        train_loss, train_acc = self.sess.run([self.loss, self.accuracy], feed_dict={self.x: x_train_tmp, self.y: y_train_tmp, self.keep_drop: 1.})
                        print('train_loss:%.10f train_acc:%.10f' % (np.mean(train_loss), train_acc))

                        # output validation set evaluation
                        val_loss_list, val_acc_list = [], []
                        for i in range(int(self.val_num/self.batch_size)):
                            x_val_tmp, y_val_tmp = self.sess.run(val_batch)
                            val_loss, val_acc = self.sess.run([self.loss, self.accuracy], feed_dict={self.x: x_val_tmp, self.y: y_val_tmp, self.keep_drop: 1.})
                            val_loss_list.append(np.mean(val_loss))
                            val_acc_list.append(np.mean(val_acc))
                        print(' val_loss:%.10f val_acc:%.10f' % (np.mean(val_loss), np.mean(val_acc)))

                    total_step += 1

                # Save the model
                if (epoch_num + 1) % self.save_epoch == 0:
                    print('Saving model:')
                    saver.save(self.sess, './model/CnnOcr', global_step=(epoch_num + 1))
            coordinator.request_stop()
            coordinator.join(threads)



    def cnnNet(self):
        """CNN Network: Return:"""
        weight = {
            128 * 64 * 3 # inputs

            # the first layer
            'wc1_1': tf.get_variable('wc1_1', [5, 5, 3, 32]),  Convolution output: 128*64*32
            'wc1_2': tf.get_variable('wc1_2', [5, 5, 32, 32]),  Convolution output: 128*64*32
            Pooled output: 64*32*32

            The second floor #
            'wc2_1': tf.get_variable('wc2_1', [5, 5, 32, 64]),  Convolution output: 64*32*64
            'wc2_2': tf.get_variable('wc2_2', [5, 5, 64, 64]),  Convolution output: 64*32*64
            Pooled output: 32*16*64

            The third layer #
            'wc3_1': tf.get_variable('wc3_1', [3, 3, 64, 64]),  Convolution output: 32*16*256
            'wc3_2': tf.get_variable('wc3_2', [3, 3, 64, 64]),  Convolution output: 32*16*256
            Pooled output: 16*8*256

            # the fourth floor
            'wc4_1': tf.get_variable('wc4_1', [3, 3, 64, 64]),  Convolution output: 16*8*64
            'wc4_2': tf.get_variable('wc4_2', [3, 3, 64, 64]),  Convolution output: 16*8*64
            Pooled output: 8*4*64

            # full link first layer
            'wfc_1': tf.get_variable('wfc_1', [8 * 4 * 64, 2048]),# full link to the second layer
            'wfc_2': tf.get_variable('wfc_2', [2048, 2048]),

            # full link layer 3
            'wfc_3': tf.get_variable('wfc_3', [2048, self.label_num]),
        }

        biase = {
            # the first layer
            'bc1_1': tf.get_variable('bc1_1'[32]),'bc1_2': tf.get_variable('bc1_2'[32]),The second floor #
            'bc2_1': tf.get_variable('bc2_1', [64]),
            'bc2_2': tf.get_variable('bc2_2', [64]),

            The third layer #
            'bc3_1': tf.get_variable('bc3_1', [64]),
            'bc3_2': tf.get_variable('bc3_2', [64]),

            # the fourth floor
            'bc4_1': tf.get_variable('bc4_1', [64]),
            'bc4_2': tf.get_variable('bc4_2', [64]),

            # full link first layer
            'bfc_1': tf.get_variable('bfc_1', [2048]),

            # full link to the second layer
            'bfc_2': tf.get_variable('bfc_2', [2048]),

            # full link layer 3
            'bfc_3': tf.get_variable('bfc_3', [self.label_num]),
        }

        # the first layer
        net = tf.nn.conv2d(self.x, weight['wc1_1'[1, 1, 1, 1],'SAME')  # convolution
        net = tf.nn.bias_add(net, biase['bc1_1'])
        net = tf.nn.relu(net)  # plus b and activate
        print('conv1', net)
        net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')  # pooling
        print('pool1', net)

        The second floor #
        net = tf.nn.conv2d(net, weight['wc2_1'], [1, 1, 1, 1], padding='SAME')  # convolution
        net = tf.nn.bias_add(net, biase['bc2_1'])
        net = tf.nn.relu(net)  # plus b and activate
        print('conv2', net)
        net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')  # pooling
        print('pool2', net)

        The third layer #
        net = tf.nn.conv2d(net, weight['wc3_1'], [1, 1, 1, 1], padding='SAME')  # convolution
        net = tf.nn.bias_add(net, biase['bc3_1'])
        net = tf.nn.relu(net)  # plus b and activate
        print('conv3', net)
        net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')  # pooling
        print('pool3', net)

        # the fourth floor
        net = tf.nn.conv2d(net, weight['wc4_1'], [1, 1, 1, 1], padding='SAME')  # convolution
        net = tf.nn.bias_add(net, biase['bc4_1'])
        net = tf.nn.relu(net)  # plus b and activate
        print('conv4', net)
        net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')  # pooling
        print('pool4', net)

        # Flatten: Stretch multiple images simultaneously into a vector
        net = tf.reshape(net, shape=[-1, weight['wfc_1'].get_shape()[0]])
        print('tensile flatten', net)

        # Full link layer
        # fc layer 1
        net = tf.matmul(net, weight['wfc_1']) + biase['bfc_1']
        net = tf.nn.dropout(net, self.keep_drop)
        net = tf.nn.relu(net)

        print('FC Layer 1', net)
        # fc layer 2
        net = tf.matmul(net, weight['wfc_2']) + biase['bfc_2']
        net = tf.nn.dropout(net, self.keep_drop)
        net = tf.nn.relu(net)

        print('FC Layer 2', net)
        # fc layer 3
        net = tf.matmul(net, weight['wfc_3']) + biase['bfc_3']
        print('FC Tier 3', net)
        return net


    def getTrainDataset(self):
        ""Resize the image 128*64*3, make the training set self.im_total_num*128*64*3, and make the label 0,1 vector :return:""
        train_data_list = os.listdir('./dataset/train/')
        print('There are %d training images, read %d images:' % (len(train_data_list), self.train_max_num))
        random.shuffle(train_data_list)  # out of order

        y_val_list, y_train_list = [], []
        x_val_list = train_data_list[:self.val_num]
        for x_val in x_val_list:
            words_tmp = x_val.split('. ')[0].split('_')[1]
            y_val_list.append([1 if _w == w else 0 for w in words_tmp for _w in self.words])

        x_train_list = train_data_list[self.val_num:self.train_max_num]
        for x_train in x_train_list:
            words_tmp = x_train.split('. ')[0].split('_')[1]
            y_train_list.append([1 if _w == w else 0 for w in words_tmp for _w in self.words])

        return x_train_list, y_train_list, x_val_list, y_val_list


    def createCaptchaDataset(self):
        """Generate training image data set :return:"""
        image = ImageCaptcha(width=self.im_width, height=self.im_height, font_sizes=(56,))
        for i in range(self.im_total_num):
            words_tmp = ' '
            for j in range(self.words_num):
                words_tmp = words_tmp + random.choice(self.words)
            print(words_tmp, type(words_tmp))
            im_path = './dataset/train/%d_%s.png' % (i, words_tmp)
            print(im_path)
            image.write(words_tmp, im_path)
        return True


    def dataset_opt(self, x_train_queue, y_train_queue):
        """Processing images and tags :param Queue: :return:"""
        queue = x_train_queue[0]
        contents = tf.read_file('./dataset/train/' + queue)
        im = tf.image.decode_jpeg(contents)
        im = tf.image.resize_images(images=im, size=[self.im_height, self.im_width])
        im = tf.reshape(im, tf.stack([self.im_height, self.im_width, 3]))
        im -= 147  # de-mean
        # im /= 255 # Process pixels between 0 and 1 to accelerate convergence
        # im -= 0.5 # Process the pixel between -0.5 and 0.5
        return im, y_train_queue[0]




if __name__ == '__main__':
    opt_type = sys.argv[1:][0]

    instance = CnnOcr()

    if opt_type == 'create_dataset':
        instance.createCaptchaDataset()
    elif opt_type == 'train':
        instance.train()
    elif opt_type == 'test':
        instance.test('./dataset/test/0_HZDZ.png')
    elif opt_type == 'start':
        Persist session to memory
        instance.test('./dataset/test/0_HZDZ.png')

        Start the Web service
        # http://127.0.0.1:5050/captchaOcr? img_path=./dataset/test/2_SYVD.png
        @app.route('/captchaOcr', methods=['GET'])
        def captchaOcr():
            img_path = request.args.to_dict().get('img_path')
            print(img_path)
            ret = instance.captchaOcr(img_path)
            print(ret)
            return json.dumps({'img_path': img_path, 'ocr_ret': ret})

        app.run(host='0.0.0.0', port=5050, debug=False)
Copy the code