The author | Jiang Yu

preface

The concept of Serverless has attracted much attention since it was put forward, especially in recent years, Serverless has shown unprecedented vitality. Engineers in various fields are trying to combine Serverless architecture with their own work, so as to obtain the “technical dividend” brought by Serverless architecture.

CAPTCHA stands for “Completely Automated Public Turing Test to tell Computers and Humans Apart.” It is a public automatic program that distinguishes between a computer and a human user. It can prevent malicious cracking passwords, brushing tickets, forum flooding, effectively prevent a hacker to a specific registered user with a specific program violence cracking way to constantly try to land. In fact, verification code is now the way many websites are popular, we use a relatively simple way to achieve this function. CAPTCHA’s questions are generated and judged by a computer, but the question can only be answered by a human, and the computer can’t, so the user who answers the question can be considered human. To put it bluntly, a captcha is a code used to verify whether it is accessed by a person or a machine.

** What are the sparks between CAPTcha recognition and Serverless architecture in artificial intelligence? ** This paper will realize the verification code recognition function through Serverless architecture and convolutional Neural network (CNN) algorithm.

Discussion on captcha

The development of verification code, can be said to be very rapid, from the beginning of the simple digital verification code, to the later number + letter verification code, and then the number + letter + Chinese verification code and graphic image verification code, simple verification code material has been more and more. Verification codes vary from input, click, drag and drop to SMS verification codes and voice verification codes.

Bilibili’s login verification code includes multiple modes, such as sliders for verification:

For example, verify by clicking on text in sequence:

On Baidu Tieba, Zhihu and Related sites like Google, the captcha varies, such as selecting text written directly, selecting pictures including specified objects, and clicking characters in pictures in sequence.

The recognition of captcha may vary depending on the type of captcha, although the simplest captcha may be the original literal captcha:

Even text captcha, there are many differences, such as simple digital captcha, simple number + letter captcha, text captcha, including calculation in the captcha, simple captcha to add some interference into complex captcha.

Verification code recognition

1. Simple verification code identification

Verification code recognition is an ancient research field, simply speaking, it is the process of converting the text on the picture into text. In recent years, with the development of big data, the majority of crawler engineers have higher requirements for verification code recognition when they are fighting against anti-crawl strategy. In the era of simple captcha, the recognition of captcha is mainly aimed at text captcha. Each part of the captcha is clipped through image cutting, and then the similarity of each clipping unit is compared to obtain the most likely result. Finally, the verification code is splice, for example:

Perform binarization and other operations:

Cut after completion:

It is relatively easy to identify each character after cutting it and then splice it.

However, as time went by, when the simple captcha gradually failed to meet the problem of “human or machine”, the captcha underwent a small upgrade, that is, some interference lines were added to the captcha, or the captcha was severely distorted, adding strong color block interference, such as the captcha of Dynadot website:

There are not only image distortion overlapping, but also interference lines and color blocks. At this time, to identify the verification code, simple cutting recognition is difficult to obtain good results, at this time, through deep learning can obtain good results.

2. Verification code recognition based on CNN

Convolutional Neural Network (CNN) is a feedforward Neural Network, in which artificial neurons can respond to surrounding units and perform large-scale image processing. Convolutional neural networks include convolutional layer and pooling layer.

As shown in the figure, the left figure is a traditional neural network, whose basic structure is: input layer, hidden layer and output layer. The figure on the right is the convolutional neural network, whose structure is composed of input layer, output layer, convolutional layer, pooling layer and full connection layer. Convolutional neural network is actually an extension of neural network, and in fact, there is no difference between naive CNN and naive NN in terms of structure (of course, the complicated CNN with special structure will be quite different from NN). Compared with traditional neural network, CNN greatly reduces the number of network parameters in the actual effect, so that we can train a better model with fewer parameters, typically get twice the result with half the effort, and effectively avoid overfitting. Similarly, due to the shared parameters of filter, we can still identify features even if the image is translated to a certain extent, which is called “translation invariance”. So the model is more robust.

1) Verification code generation

The generation of the verification code is a very important step, because the verification code in this part will be used as our training set and test set, and what type of verification code our model can finally recognize is also related to this part.

# coding:utf-8 import random import numpy as np from PIL import Image from captcha.image import ImageCaptcha CAPTCHA_LIST = [eve for eve in "0123456789 abcdefghijklmnopqrsruvwxyzabcdefghijklmopqrstuvwxyz"] CAPTCHA_LEN = 4 # verification code length CAPTCHA_HEIGHT = 60 # CAPTCHA_WIDTH = 160 # CAPTCHA_WIDTH randomCaptchaText = lambda char=CAPTCHA_LIST, size=CAPTCHA_LEN: "".join([random.choice(char) for _ in range(size)]) def genCaptchaTextImage(width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT, save=None): image = ImageCaptcha(width=width, height=height) captchaText = randomCaptchaText() if save: image.write(captchaText, './img/%s.jpg' % captchaText) return captchaText, np.array(Image.open(image.generate(captchaText))) print(genCaptchaTextImage(save=True))Copy the code

With the above code, you can generate a simple Chinese and English captcha:

2) Model training

The code for model training is as follows (part of the code comes from the network).

The util.py file, which is mainly a collection of extracted public methods:

# -*- coding:utf-8 -*- import numpy as np from captcha_gen import genCaptchaTextImage from captcha_gen import Convert2Gray = lambda img: CAPTCHA_LIST, CAPTCHA_LEN, CAPTCHA_HEIGHT, CAPTCHA_WIDTH Np. mean(img, -1) if len(img.shape) > 2 else img # vec2Text = lambda vec, captcha_list= captcha_list: ''.join([captcha_list[int(v)] for v in vec]) def text2Vec(text, captchaLen=CAPTCHA_LEN, captchaList=CAPTCHA_LIST): "" vector = np.zeros(captchaLen * len(captchaList)) for I in range(len(text)): vector[captchaList.index(text[i]) + i * len(captchaList)] = 1 return vector def getNextBatch(batchCount=60, width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT): BatchX = np.zeros([batchCount, width * height]) batchY = np.zeros([batchCount, width * height]) batchY = np.zeros([batchCount, width * height]) CAPTCHA_LEN * len(CAPTCHA_LIST)]) for i in range(batchCount): Text, image = genCaptchaTextImage() image = convert2Gray(image) # :] = image.flatten() / 255 batchY[i, :] = text2Vec(text) return batchX, batchY # print(getNextBatch(batch_count=1))Copy the code

The model_train.py file is mainly used for model training. In this file, the basic information of the model is defined. For example, the model is a three-layer convolutional neural network, and the original image size is 60160, which becomes 60160 after the first convolution and 3080 after the first pooling. It becomes 3080 after the second convolution and 1540 after the second pooling; After the third convolution it becomes 1540, and after the third pooling it becomes 720. After three times of convolution and pooling, the original image data is changed into 720 plane data. Meanwhile, during the training, the project conducts data tests every 100 times to calculate the accuracy:

# -*- coding:utf-8 -*- import tensorflow.compat.v1 as tf from datetime import datetime from util import getNextBatch from captcha_gen import CAPTCHA_HEIGHT, CAPTCHA_WIDTH, CAPTCHA_LEN, CAPTCHA_LIST tf.pat.v1.disable_EAGer_Execution () variable = lambda shape, alpha=0.01: tf.Variable(alpha * tf.random_normal(shape)) conv2d = lambda x, w: tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME') maxPool2x2 = lambda x: tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') optimizeGraph = lambda y, y_conv: tf.train.AdamOptimizer(1e-3).minimize( tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=y_conv))) hDrop = lambda image, weight, bias, keepProb: Dropout (maxPool2x2(tf.nn.relu(conv2D (image, variable(weight, 0.01)) + variable(bias, 0.1))), keepProb) def cnnGraph(x, keepProb, size, captchaList=CAPTCHA_LIST, captchaLen=CAPTCHA_LEN): ImageHeight, imageWidth = size xImage = tf. 0 (x, Shape =[-1, imageHeight, imageWidth, 0 1]) hDrop1 = hDrop(xImage, [3, 3, 1, 32], [32], keepProb) hDrop2 = hDrop(hDrop1, [3, 3, 32, 64], [64], keepProb) hDrop3 = hDrop(hDrop2, [3, 3, 64, 64], [64], Shape [1]) imageWidth = int(hDrop3.shape[2]) wFc = variable([imageHeight *) ImageWidth * 64, 1024], 0 0) # 0 bFc = variable([1024], 0.1) hDrop3Re = tf. 0 0 (hDrop3, [-1, 0) imageHeight * imageWidth * 64]) hFc = tf.nn.relu(tf.matmul(hDrop3Re, wFc) + bFc) hDropFc = tf.nn.dropout(hFc, WOut = variable([1024, len(captchaList) * captchaLen], 0.01) bOut = variable ([len (captchaList) * captchaLen], 0.1) yConv = tf. Matmul (hDropFc, wOut) + bOut return yConv def accuracyGraph(y, yConv, width=len(CAPTCHA_LIST), height=CAPTCHA_LEN): "" Deviation calculation diagram, correct and predicted values, 0 maxPredictIdx = tf.argmax(tF.0 (yConv, [-1, height, Width]), 0) maxLabelIdx = tf.argmax(tF.0 (y, 0, 0) [-1, height, width]), 2) correct = tf.equal(maxPredictIdx, maxLabelIdx) tf.float32)) def train(height=CAPTCHA_HEIGHT, width=CAPTCHA_WIDTH, ySize=len(CAPTCHA_LIST) * CAPTCHA_LEN): Y = tf.placeholder(tF.placeholder, [None, height * width]) y = tF.placeholder (tF.placeholder, [None, ySize]) keepProb = tf.placeholder(tf.float32) yConv = cnnGraph(x, keepProb, (height, width)) optimizer = optimizeGraph(y, yConv) accuracy = accuracyGraph(y, yConv) saver = tf.train.Saver() with tf.Session() as sess: Sess.run (tf.global_variables_initializer()) # initialize step = 0 # while True: batchX, batchY = getNextBatch(64) sess.run(optimizer, feed_dict={x: batchX, y: batchY, keepProb: }) # if step % 100 == 0: batchXTest, batchYTest = getNextBatch(100) acc = sess.run(accuracy, feed_dict={x: batchXTest, y: batchYTest, keepProb: Print (datetime.now().strftime('%c'), 'step:', step,' accuracy:', acc) ModelPath = "./model/captcha.model" saver.save(sess, modelPath, global_step=step) accRate += 0.01 if accRate > 0.90: break step = step + 1 train()Copy the code

After completing this part, we can train the model through the local machine. In order to improve the training speed, I set the accRate part in the code as:

If accRate > 0.90: breakCopy the code

That is, when the accuracy exceeds 90%, the system automatically stops and saves the model.

Then you can train:

The training may take a long time. After the training is completed, you can draw a plot according to the results to view the curve of accuracy change with the increase of Step:

The horizontal axis represents Step of training and the vertical axis represents accuracy

3. Verification code identification based on Serverless architecture

To further integrate the above code parts, code according to the specification of function calculation:

# -*- coding:utf-8 -*-
# 核心后端服务
import base64
import json
import uuid
import tensorflow as tf
import random
import numpy as np
from PIL import Image
from captcha.image import ImageCaptcha
# Response
class Response:
    def __init__(self, start_response, response, errorCode=None):
        self.start = start_response
        responseBody = {
            'Error': {"Code": errorCode, "Message": response},
        } if errorCode else {
            'Response': response
        }
        # 默认增加uuid,便于后期定位
        responseBody['ResponseId'] = str(uuid.uuid1())
        print("Response: ", json.dumps(responseBody))
        self.response = json.dumps(responseBody)
    def __iter__(self):
        status = '200'
        response_headers = [('Content-type', 'application/json; charset=UTF-8')]
        self.start(status, response_headers)
        yield self.response.encode("utf-8")
CAPTCHA_LIST = [eve for eve in "0123456789abcdefghijklmnopqrsruvwxyzABCDEFGHIJKLMOPQRSTUVWXYZ"]
CAPTCHA_LEN = 4  # 验证码长度
CAPTCHA_HEIGHT = 60  # 验证码高度
CAPTCHA_WIDTH = 160  # 验证码宽度
# 随机字符串
randomStr = lambda num=5: "".join(random.sample('abcdefghijklmnopqrstuvwxyz', num))
randomCaptchaText = lambda char=CAPTCHA_LIST, size=CAPTCHA_LEN: "".join([random.choice(char) for _ in range(size)])
# 图片转为黑白,3维转1维
convert2Gray = lambda img: np.mean(img, -1) if len(img.shape) > 2 else img
# 验证码向量转为文本
vec2Text = lambda vec, captcha_list=CAPTCHA_LIST: ''.join([captcha_list[int(v)] for v in vec])
variable = lambda shape, alpha=0.01: tf.Variable(alpha * tf.random_normal(shape))
conv2d = lambda x, w: tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
maxPool2x2 = lambda x: tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
optimizeGraph = lambda y, y_conv: tf.train.AdamOptimizer(1e-3).minimize(
    tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=y_conv)))
hDrop = lambda image, weight, bias, keepProb: tf.nn.dropout(
    maxPool2x2(tf.nn.relu(conv2d(image, variable(weight, 0.01)) + variable(bias, 0.1))), keepProb)
def genCaptchaTextImage(width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT, save=None):
    image = ImageCaptcha(width=width, height=height)
    captchaText = randomCaptchaText()
    if save:
        image.write(captchaText, save)
    return captchaText, np.array(Image.open(image.generate(captchaText)))
def text2Vec(text, captcha_len=CAPTCHA_LEN, captcha_list=CAPTCHA_LIST):
    """
    验证码文本转为向量
    """
    vector = np.zeros(captcha_len * len(captcha_list))
    for i in range(len(text)):
        vector[captcha_list.index(text[i]) + i * len(captcha_list)] = 1
    return vector
def getNextBatch(batch_count=60, width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT):
    """
    获取训练图片组
    """
    batch_x = np.zeros([batch_count, width * height])
    batch_y = np.zeros([batch_count, CAPTCHA_LEN * len(CAPTCHA_LIST)])
    for i in range(batch_count):
        text, image = genCaptchaTextImage()
        image = convert2Gray(image)
        # 将图片数组一维化 同时将文本也对应在两个二维组的同一行
        batch_x[i, :] = image.flatten() / 255
        batch_y[i, :] = text2Vec(text)
    return batch_x, batch_y
def cnnGraph(x, keepProb, size, captchaList=CAPTCHA_LIST, captchaLen=CAPTCHA_LEN):
    """
    三层卷积神经网络
    """
    imageHeight, imageWidth = size
    xImage = tf.reshape(x, shape=[-1, imageHeight, imageWidth, 1])
    hDrop1 = hDrop(xImage, [3, 3, 1, 32], [32], keepProb)
    hDrop2 = hDrop(hDrop1, [3, 3, 32, 64], [64], keepProb)
    hDrop3 = hDrop(hDrop2, [3, 3, 64, 64], [64], keepProb)
    # 全连接层
    imageHeight = int(hDrop3.shape[1])
    imageWidth = int(hDrop3.shape[2])
    wFc = variable([imageHeight * imageWidth * 64, 1024], 0.01)  # 上一层有64个神经元 全连接层有1024个神经元
    bFc = variable([1024], 0.1)
    hDrop3Re = tf.reshape(hDrop3, [-1, imageHeight * imageWidth * 64])
    hFc = tf.nn.relu(tf.matmul(hDrop3Re, wFc) + bFc)
    hDropFc = tf.nn.dropout(hFc, keepProb)
    # 输出层
    wOut = variable([1024, len(captchaList) * captchaLen], 0.01)
    bOut = variable([len(captchaList) * captchaLen], 0.1)
    yConv = tf.matmul(hDropFc, wOut) + bOut
    return yConv
def captcha2Text(image_list):
    """
    验证码图片转化为文本
    """
    with tf.Session() as sess:
        saver.restore(sess, tf.train.latest_checkpoint('model/'))
        predict = tf.argmax(tf.reshape(yConv, [-1, CAPTCHA_LEN, len(CAPTCHA_LIST)]), 2)
        vector_list = sess.run(predict, feed_dict={x: image_list, keepProb: 1})
        vector_list = vector_list.tolist()
        text_list = [vec2Text(vector) for vector in vector_list]
        return text_list
x = tf.placeholder(tf.float32, [None, CAPTCHA_HEIGHT * CAPTCHA_WIDTH])
keepProb = tf.placeholder(tf.float32)
yConv = cnnGraph(x, keepProb, (CAPTCHA_HEIGHT, CAPTCHA_WIDTH))
saver = tf.train.Saver()
def handler(environ, start_response):
    try:
        request_body_size = int(environ.get('CONTENT_LENGTH', 0))
    except (ValueError):
        request_body_size = 0
    requestBody = json.loads(environ['wsgi.input'].read(request_body_size).decode("utf-8"))
    imageName = randomStr(10)
    imagePath = "/tmp/" + imageName
    print("requestBody: ", requestBody)
    reqType = requestBody.get("type", None)
    if reqType == "get_captcha":
        genCaptchaTextImage(save=imagePath)
        with open(imagePath, 'rb') as f:
            data = base64.b64encode(f.read()).decode()
        return Response(start_response, {'image': data})
    if reqType == "get_text":
        # 图片获取
        print("Get pucture")
        imageData = base64.b64decode(requestBody["image"])
        with open(imagePath, 'wb') as f:
            f.write(imageData)
        # 开始预测
        img = Image.open(imageName)
        img = img.resize((160, 60), Image.ANTIALIAS)
        img = img.convert("RGB")
        img = np.asarray(img)
        image = convert2Gray(img)
        image = image.flatten() / 255
        return Response(start_response, {'result': captcha2Text([image])})
Copy the code

In this function part, there are mainly two interfaces:

• Obtain a verification code: the verification code is generated after the user tests it. • Obtain verification code identification result: the verification code is used after the user identifies it

For this part of the code, the required dependencies are as follows:

Tensorflow ==1.13.1 NUMpy ==1.19.4 SCIPY ==1.5.4 Pillow ==8.0.1 CAPTCHA ==0.3Copy the code

In addition, in order to make the experience more simple, the test page is provided. The background service of the test page uses the Python Web Bottle framework:

# -*- coding:utf-8 -*- import os import json from bottle import route, run, static_file, request import urllib.request url = "http://" + os.environ.get("url") @route('/') def index(): return static_file("index.html", root='html/') @route('/get_captcha') def getCaptcha(): data = json.dumps({"type": "get_captcha"}).encode("utf-8") reqAttr = urllib.request.Request(data=data, url=url) return urllib.request.urlopen(reqAttr).read().decode("utf-8") @route('/get_captcha_result', method='POST') def getCaptcha(): data = json.dumps({"type": "get_text", "image": json.loads(request.body.read().decode("utf-8"))["image"]}).encode( "utf-8") reqAttr = urllib.request.Request(data=data, Url =url) return urllib.request.urlopen(reqAttr).read().decode(" utF-8 ") run(host='0.0.0.0', debug=False, port=9000)Copy the code

The back-end service needs to rely on:

Bottle = = 0.12.19Copy the code

Front-end page code:

<! DOCTYPE HTML > < HTML lang="en"> <head> <meta charset="UTF-8"> <title> Verification code recognition test system </title> <link href="https://www.bootcss.com/p/layoutit/css/bootstrap-combined.min.css" rel="stylesheet"> <script> var image = undefined function getCaptcha() { const xmlhttp = window.XMLHttpRequest ? new XMLHttpRequest() : new ActiveXObject("Microsoft.XMLHTTP"); xmlhttp.open("GET", '/get_captcha', false); xmlhttp.onreadystatechange = function () { if (xmlhttp.readyState == 4 && xmlhttp.status == 200) { image = JSON.parse(xmlhttp.responseText).Response.image document.getElementById("captcha").src = "data:image/png; base64," + image document.getElementById("getResult").style.visibility = 'visible' } } xmlhttp.setRequestHeader("Content-type", "application/json"); xmlhttp.send(); } function getCaptchaResult() { const xmlhttp = window.XMLHttpRequest ? new XMLHttpRequest() : new ActiveXObject("Microsoft.XMLHTTP"); xmlhttp.open("POST", '/get_captcha_result', false); xmlhttp.onreadystatechange = function () { if (xmlhttp.readyState == 4 && xmlhttp.status == 200) { Document.getelementbyid ("result").innertext =" " + JSON.parse(xmlhttp.responseText).Response.result } } xmlhttp.setRequestHeader("Content-type", "application/json"); xmlhttp.send(JSON.stringify({"image": image})); } </script> </head> <body> <div class="container-fluid" style="margin-top: 10px"> <div class="row-fluid"> <div class=" SPAN12 "> <center> < H3 > Verification code recognition test system </ H3 > </center> </div> </div> <div class="row-fluid"> <div class="span2"> </div> <div class="span8"> <center> <img src="" id="captcha"/> <br><br> <p Id ="result"></p> </center> <fieldset> <legend> </legend> <button class=" BTN "onclick="getCaptcha()"> </button> id="getResult" style="visibility: Hidden "> identification authentication code < / button > < fieldset > < / div > < div class =" span2 "> < / div > < / div > < / div > < / body > < / HTML >Copy the code

With the code ready, start writing the deployment file:

Global: Service: Name: ServerlessBook Description: Serverless book case Log: Auto Nas: Auto ServerlessBookCaptchaDemo: Component: fc Provider: alibaba Access: release Extends: deploy: - Hook: s install docker Path: ./ Pre: true Properties: Region: cn-beijing Service: ${Global.Service} Function: Name: serverless_captcha Description: CodeUri: Src:./ Src /backend Excludes: - Src /backend/. Fun-src /backend/model Handler: index. Handler Environment: - Key: PYTHONUSERBASE Value: /mnt/auto/.fun/python MemorySize: 3072 Runtime: python3 Timeout: 60 Triggers: - Name: ImageAI Type: HTTP Parameters: AuthType: ANONYMOUS Methods: - GET - POST - PUT Domains: - Domain: Auto ServerlessBookCaptchaWebsiteDemo: Component: bottle Provider: alibaba Access: release Extends: deploy: - Hook: pip3 install -r requirements.txt -t ./ Path: ./src/website Pre: true Properties: Region: cn-beijing CodeUri: ./src/website App: index.py Environment: - Key: url Value: ${ServerlessBookCaptchaDemo.Output.Triggers[0].Domains[0]} Detail: Service: ${Global.Service} Function: Name: serverless_captcha_websiteCopy the code

Overall directory structure:

| - SRC # project directory | | - backend # project back end, the core interface | | - index. Py # back-end core code | | - requirements. TXT # core code depends on the backend | | - website front-end # project, Facilitate testing using | | - HTML front page # project | | - index. HTML front page # project | | - index. Py # project front-end background services framework (bottle) | | - requirements. TXT # Back-end service dependencies at the front end of the projectCopy the code

After completion, we can deploy the project under the project directory:

s deploy
Copy the code

After deployment is complete, open the returned page address:

Click obtain verification code to generate a verification code online:

At this point, click the identification verification code to identify the verification code:

Since the target accuracy rate of the model is 90% during training, it can be considered that the overall accuracy rate is about 90% after massive verification code tests of the same type.

conclusion

Serverless is developing rapidly. I think it’s a very cool thing to do a captcha recognition tool through Serverless. In the future data collection work, it is very necessary to have a beautiful verification code recognition tool. Of course, there are many types of captcha, and it is also a very challenging work to identify different types of captcha.