Verification code identification solution For web applications, safety consideration, at the time of the login, will set up the authentication code, verification code type variety, with images of recognize alphanumeric, have designated by click on the image of the text, there are also arithmetic calculation results, the complex is sliding verification again. Such verification code, for our system to increase the security of the guarantee, but for our testers, in the process of automated testing, is undoubtedly a tricky problem.

In our testing process, when logging in to the verification code mentioned above, there are the following solutions:

First, make development remove the verification code Second, set up a universal authentication code Third, through the cookie bypass the login Fourth, automatic identification technology identification authentication code 2, automatic identification technology to identify verification code The first three solutions, we all know, the better the paper emphasis on the fourth kind of solution, also is the automatic identification authentication code, There are two solutions to the problem of captcha recognition,

The first is: OCR automatic identification technology, the second is: through the interface of the third party coding platform to identify. Tesseract is a well-known open source OCR recognition framework. Combined with Leptonica image processing library, tesseract can read images of various formats and convert them into texts in more than 60 languages. You can constantly train your recognition library. The ability of image to transform text is enhanced. It can also be used as a template to develop an OCR engine that meets your needs if the depth of the team requires. So here’s how to use Tessract to identify our captcha.

OCR automatic recognition of this piece, you need to install Tesseract, and configure the environment, the steps are as follows: 1), install Tesseract

Applicable to Tesseract 3.05-02 and Tesseract 4.00-beta

Download the Windows installer from github.com/UB-Mannheim…

2) Add training data

Tesseract only recognizes English by default. If you want to recognize other languages, you need to download the corresponding training data

Download: github.com/tesseract-o…

The following figure shows Chinese data packets

We only do Chinese, temporarily download a Chinese text training data, and then copy the.trainedData file to the ‘tessData’ directory after installation. C:\OCR\Tesseract-OCR\tessdata

3) Configure environment variables

To access tesseract-OCr from any location, you may have to add the directory where the tesseract-OCr binary resides to the Path variable C:\OCR\ tesseract-ocr.

After tesseract is installed, you can’t use it directly in Python, so if you want to use it in Python, PIP install Pytesseract pytesseract pytesseract pytesseract pytesseract Find a captcha image like the one below (name it test.jpg) and place it in the same directory as the current Python file.

Open the captcha Image with the open method in Image in PIL and call pytesseract.image_to_string to recognize the text in the Image and convert it to a string, as shown in the code below.

import pytesseract from PIL import Image pic = Image.open(‘test.jpg’)

PIC is the open image,lang specifies the language library to recognize the transformation

Text = Pytesseract. image_to_string(PIC,lang=’chi_sim’) print(text) Verification codes with interference lines cannot correctly identify the result.

Next to introduce the second kind of identification scheme, third-party coding platform recognition

Compared with OCR, third-party coding platforms have the advantage of high accuracy in identification. There are many third-party coding platforms on the network, and there are dozens of them in Baidu. Here are a few examples, as shown below:

There are many third-party coding platforms on the network, here xiaobian choose super Eagle this third-party platform to give you a demonstration.

First of all, we need to register and log in to the super Eagle website www.chaojiying.com. After entering the website, we will find the corresponding development document of Python and download it. After downloading the development document, we will uncompress it. After opening the chaojiying.py file, we will find that the interface given in this file is very simple, as shown below

First step create a user object: three parameters (account, password, software ID), account password is the site account password, then the software ID? Software ID We can find the software ID in the user center, and then click to generate a software ID (as shown below).

The second line of code is to open a captcha image, to identify and read the content, the third line, call PostPic method identification authentication code, two parameters (captcha image content authentication code type), about the authentication code type, please refer to the website of the price system (pictured), depending on the type of authentication code choose corresponding numerical incoming.

Results extraction: Postpaid returns a dictionary type of data, Recognized captcha in the dictionary pic_str key res = cjy. Postpaid (im, Print () data = res[‘pic_str’] print(data) Tip: Coding platforms are generally charged (almost a penny, identification once)

Selenium, the web automation testing framework, is used to implement automatic verification code recognition login. The required libraries include Selenium, Pillow, And Time. Selenium install Selenium 2, Chromedriver installation

Download address

Chromedriver.storage.googleapis.com/index.html download and their corresponding chromedriver chrome version, configure the environment variables

PIP Install Pillow is a library for image processing

Step analysis 1. Obtain the account password input box: Input the account password 2. Obtain the verification code picture

Selenium open the login page import time from Selenium import webdriver from PIL import Image from chaojiying import Chaojiying_Client

Create a browser

browser = webdriver.Chrome()

Access the Login page

Url = ‘www.chaojiying.com/user/mysoft… ‘browser.get(url) time.sleep(1) # pause for a second

Select the account and password input field and enter the corresponding account password

input_user=browser.find_element_by_xpath(‘/html/body/div[3]/div/div[3]/div[1]/form/p[1]/input’)

Enter account

Input_user. Send_keys (‘ account ‘) input_pwd=browser.find_element_by_xpath(‘/html/body/div[3]/div/div[3]/div[1]/form/p[2]/input’)

Enter the password

Input_pwd. send_keys(‘ password ‘) 2. Obtain the verification code image

Take a screenshot of the current page

Browse.save_screenshot (‘login.png’) select the image element and get the top, bottom, left and right screenshot positions

Select the element of the captcha picture

yzm_btn = browser.find_element_by_xpath(‘/html/body/div[3]/div/div[3]/div[1]/form/div/img’)

Gets the position of the image element

loc = yzm_btn.location

Gets the width and height of the picture

Size = yzm_project. size = yzm_project. size = yzm_project. size = yzm_project. size = yzm_project. size = yzm_project. size = yzm_project. size Right = (loc[‘x’] + size[‘width’])*1.25 # Left = loc[‘x’]*1.25 # right = (loc[‘x’] + size[‘width’])*1.25 # = (loc[‘y’] + size[‘height’])*1.25 #

Put the upper, lower, left and right boundary values into the primitives (note the order: top left, bottom right)

local = (left, top, right, botom)

PIC = pil.image.open (‘file’) pic.crop(local) pic.sava(‘zym, PNG ‘)

3. Invoke the verification code of the third-party interface identification

Identification verification code

Cjy = Chaojiying_Client(‘ ID’, ‘password ‘,’ software ID’) # usercenter >> Software ID generate an IM = open(‘yzm.png’, ‘rb’).read() # local photo file path to replace a.jpg // res = cjy. Postpaid (im, Data = res[‘pic_str’] print(data

Enter the verification code in the input box

yzm_input = browser.find_element_by_xpath(‘/html/body/div[3]/div/div[3]/div[1]/form/p[3]/input’) 5. Click “log in”

Click login

submit = browser.find_element_by_xpath(‘/html/body/div[3]/div/div[3]/div[1]/form/p[4]/input’) submit.click( import time from selenium import webdriver from PIL import Image from chaojiying import Chaojiying_Client ​

Create a browser

browser = webdriver.Chrome()

Access the Login page

Url = ‘www.chaojiying.com/user/mysoft… ‘Browser.get (URL) time.sleep(1) # Pause for one second

Select the account and password input field and enter the corresponding account password

input_user = browser.find_element_by_xpath(‘/html/body/div[3]/div/div[3]/div[1]/form/p[1]/input’) input_user.send_keys(‘qq121292679’) input_pwd = browser.find_element_by_xpath(‘/html/body/div[3]/div/div[3]/div[1]/form/p[2]/input’) input_pwd.send_keys(‘546245426’)

Obtain the picture of the verification code, and identify, identify the result, input to the verification code input box

Take a screenshot of the current page

browser.save_screenshot(‘login.png’)

Select the element of the captcha picture

yzm_btn = browser.find_element_by_xpath(‘/html/body/div[3]/div/div[3]/div[1]/form/div/img’)

Gets the position of the image element

loc = yzm_btn.location

Gets the width and height of the picture

size = yzm_btn.size

Gets the position of the verification code

Left = loc [‘ x ‘] * 1.25 top = loc [‘ y ‘] * 1.25 right = (loc [‘ x ‘] + size [‘ width ‘]) * 1.25 botom = (loc [‘ y ‘] + Size [‘height’])* 12val = (left, top, right, botom)

Open web Screenshot

login_pic = Image.open(‘login.png’)

The captcha is intercepted by the left, right, and upper values

yzm_pic = login_pic.crop(val) yzm_pic.save(‘yzm.png’) ​

Identification verification code

Cjy = Chaojiying_Client(‘ QQ121292679 ‘, ‘546245426’, ‘96001’) # usercenter >> Software ID generates a replacement 96001 im = open(‘yzm.png’, ‘rb’).read() # local photo file path to replace a.jpg // res = cjy. Postpaid (im, Print () data = res[‘pic_str’] print(data)

Enter the verification code in the input box

yzm_input = browser.find_element_by_xpath(‘/html/body/div[3]/div/div[3]/div[1]/form/p[3]/input’) yzm_input.send_keys(data) ​ ​

Click login

Submit = browser.element_by_xpath (‘/ HTML /body/div[3]/div [3]/div[1]/form/p[4]/input’) submit = browser.element_by_xpath (‘/ HTML /body/div[3]/div [1]/form/p[4]/input’) Like to remember to collect!