This is the 13th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Selenium obtains dynamic image verification code

I think we have all read similar articles about image captcha.

When we do UI automation, we often encounter the problem of image verification code.

When developers do not provide us with universal captcha, or when testing third-party websites such as Zhihu, we need to identify the captcha ourselves.

OCR

OCR is a kind of image text recognition technology, such as the captcha in the picture, which we can recognize with our eyes is C5S3, but the machine is not better than our eyes. So we will use OCR technology to make our Python script automatically recognize the corresponding text from the image.

Common identification class libraries

There are a lot of recognition libraries in Python, here is only the blogger’s own practice has a good success rate: Baidu OCR.

To put it simply, Baidu provides an SDK that allows us to pass in image data and get recognition results. We don’t need to worry about the details of OCR.

Apply to open OCR

Above all we must have a Baidu account, this believes everybody has, do not have can apply for one.

  • Log in to baidu Console

    Go to login.bce.baidu.com/ and login.

  • Selective text recognition

  • Create an

  • Enter application information

Once created, you can see the specific application information. Keep these three key information in mind. We’ll need it later.

  • appid
  • apikey
  • secret key

Be familiar with OCR documentation

The official documentation address: cloud.baidu.com/doc/OCR/s/w…

The document will be written more clearly, the simple is to use your appID, API key and Secret key to get a client, then you can call the client API to get the text in the image. The official SDK is pretty sweet.

  • Installing the SDK
pip install baidu-aip
Copy the code

Finished talking about how to identify text, then talk about the title of the dynamic picture verification code.

Dynamic picture verification code

I named this concept myself. Generally speaking, we have a single image that corresponds to a single URL, such as:

yuque.com?image=dshqadiau

(I made up the address)

Generally speaking, different values of the image field will result in different images, which are a string of random or regular non-repeating data to ensure that the image will not repeat.

But bloggers have recently come across a situation like this:

You type in a URL, and every time you type in a URL, you get a different image.

This can cause a serious problem when you read the image information on the page. When we transferred the URL of the picture to Baidu SDK, the picture changed because the URL was called again.

For example, c5S3 is displayed on the website. When baidu SDK is called, Baidu will read the picture through the URL, but when it is read again, the picture may become Lfew.

If you don’t believe me, you can take a look at this picture address:

How do you solve it?

Fortunately, baidu SDK, he not only supports URLS, but also supports image files and base64 image data. Let’s take a look at the official document:

Back in Selenium, how can we get the image of the captcha?

Think about it:

  1. Read the SRC of the IMG tag, then download the image, save the image file and convert it to Base64

Obviously, it doesn’t work. Why?

Because the SRC property of the IMG is the URL, and if you go and get the URL, it will change as well.

  1. Screenshots, cut out the verification code part, thrown to Baidu to identify

Yes, it works, but is it too complicated?

Would it be easier to generate base64 data if I just took a screenshot of the IMG element of the captcha?

In fact, Selenium, as an established automated testing tool, has more methods in supply than demand. So it does!

Selenium takes screenshots of the specified region

Selenium, as we all know, has some methods for taking screenshots.

driver.get_screenshot_as_file(filename)
Copy the code

However, there are also screenshot methods for elements.

The pseudocode is as follows:

Get the image by id
img = driver.find_element_by_id("image")
Screenshot_as_png = screenshot_as_png = screenshot_as_png = screenshot_as_png
data = img.screenshot_as_png
Copy the code

Then we can use this to obtain the picture data to find baidu to answer!

Full version code:

from aip import AipOcr 
from selenium import webdriver


client = AipOcr("Your appid"."Your app_key"."Your secret_key")
driver = webdriver.Chrome()
driver.get("https://iam.pt.ouchn.cn/am/UI/Login")
img = driver.find_element_by_id("kaptchaImage")
data = img.screenshot_as_png

res = client.basicGeneral(data, {})
print(res)
Copy the code

As you can see, only CFX is recognized, and the image doesn’t change any further.

After all, text recognition is about finding text from an image, and text can have some interference like horizontal lines, so if it doesn’t work at first, try several times.

The idea is to write a while loop that tries to identify the captcha and log in, then determines whether the logon succeeded, and if it didn’t, repeat the previous step.

In my personal experience, it usually takes 1-10 attempts to succeed.

Ok, the above is a simple blogger for you to try UI automation process for verification code identification. The main focus is on the identification of the verification code and the screenshots of part of the area.