“This is the 20th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.
preface
Using Python to identify graphic verification code, automatic login. Without further ado.
Let’s have a good time
The development tools
Python version: 3.6.4
Related modules:
Re;
Numpy module;
Pytesseract module;
The selenium module;
And some modules that come with Python.
Environment set up
Install Python and add it to the environment variables. PIP installs the required related modules.
1. Gray processing Turns the color verification code image into a gray image
import cv2
image = cv2.imread('1.jpeg'.0)
cv2.imwrite('1.jpg', image)
Copy the code
2. Binarization processing The image is processed into a picture with only black and white. Here we find that there are no interference lines, which means we only need to deal with the interference points.
import cv2
image = cv2.imread('1.jpeg'.0)
ret, image = cv2.threshold(image, 100.255.1)
height, width = image.shape
new_image = image[0:height, 0:150]
cv2.imwrite('1.jpg', new_image)
Copy the code
3. Noise reduction processing to remove small black dots, that is, isolated black pixels.
The principle of point noise reduction is to detect the 8 adjacent points of black points and judge the color of the 8 points. If all white dots, then the point is considered white, do black dots to white dots processing. For example, at point ⑤, there are 8 adjacent areas in terms of field lattice.
The coordinates of points ①②③ are shown in the figure below. Similarly, the coordinates of points ④⑤⑥⑦⑧⑨ are known
The noise reduction code
import cv2
import numpy as np
from PIL import Image
def inverse_color(image, col_range) :
# Read the image, 0 means the image becomes grayscale
image = cv2.imread(image, 0)
100 = set threshold, 255 = maximum threshold, 1 = threshold type, current point value > threshold, set to 0, otherwise set to 255. Ret is short for Return Value and stands for the current threshold
ret, image = cv2.threshold(image, 110.255.1)
# Height and width of the image
height, width = image.shape
# Image reverse color processing, reason: the above processing can only generate white words and black background picture, and we need black words and white background picture
img2 = image.copy()
for i in range(height):
for j in range(width):
img2[i, j] = (255 - image[i, j])
img = np.array(img2)
# Capture the processed picture
height, width = img.shape
new_image = img[0:height, col_range[0]:col_range[1]]
cv2.imwrite('handle_one.png', new_image)
image = Image.open('handle_one.png')
return image
def clear_noise(img) :
# Image denoising
x, y = img.width, img.height
for i in range(x):
for j in range(y):
if sum_9_region(img, i, j) < 2:
# Change the pixel color to white
img.putpixel((i, j), 255)
img = np.array(img)
cv2.imwrite('handle_two.png', img)
img = Image.open('handle_two.png')
return img
def sum_9_region(img, x, y) :
""" Field case """
Get the color value of the current pixel
cur_pixel = img.getpixel((x, y))
width = img.width
height = img.height
if cur_pixel == 255: # If the current point is a white area, the neighborhood value is not counted
return 10
if y == 0: # the first line
if x == 0: # top left vertex,4 neighborhood
# 3 points next to the center point
sum_1 = cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1))
return 4 - sum_1 / 255
elif x == width - 1: # top right vertex
sum_2 = cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1))
return 4 - sum_2 / 255
else: # uppermost non-vertex,6 neighborhood
sum_3 = img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1))
return 6 - sum_3 / 255
elif y == height - 1: # bottom line
if x == 0: # lower left vertex
# 3 points next to the center point
sum_4 = cur_pixel + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y - 1)) + img.getpixel((x, y - 1))
return 4 - sum_4 / 255
elif x == width - 1: # lower right vertex
sum_5 = cur_pixel + img.getpixel((x, y - 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y - 1))
return 4 - sum_5 / 255
else: # lowest non-vertex,6 neighborhood
sum_6 = cur_pixel + img.getpixel((x - 1, y)) + img.getpixel((x + 1, y)) + img.getpixel((x, y - 1)) + img.getpixel((x - 1, y - 1)) + img.getpixel((x + 1, y - 1))
return 6 - sum_6 / 255
else: # y is not on the boundary
if x == 0: # Left non-vertex
sum_7 = img.getpixel((x, y - 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y - 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1))
return 6 - sum_7 / 255
elif x == width - 1: # Right non-vertex
sum_8 = img.getpixel((x, y - 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x - 1, y - 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1))
return 6 - sum_8 / 255
else: # Meet the requirements of 9 domains
sum_9 = img.getpixel((x - 1, y - 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1)) + img.getpixel((x, y - 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y - 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1))
return 9 - sum_9 / 255
def main() :
img = '1.jpeg'
img = inverse_color(img, (0.160))
clear_noise(img)
if __name__ == '__main__':
main()
Copy the code
With the biggest problem solved, the next step is to implement automatic login. First, use Selenium to automatically click the login button.
Finally, the verification code is successfully obtained.
Why is this a screenshot? The reason is that the captcha image changes all the time. For example, if I now copy the image link of the 8863 verification code and open it in a new TAB, I will see that the verification code has changed, not 8863, but a different image of the verification code. So we get the captcha image by getting the captcha link of the current page, this method is definitely not feasible.
This problem can be solved successfully by referring to relevant information and knowing that there are links with cookies to access verification codes. But because the relevant library did not import success, also gave up. We’ll solve that next time when we do captcha machine learning.