Crawler is always confronted with a variety of anti-crawler restrictions, the first line of defense of anti-crawler often appears in the login, in order to limit the automatic login of crawler, all the people try their best, the so-called “one foot higher than the devil is one foot higher”.
Today I will share a simple example of how to handle the captcha of sliding images.
This kind of login verification, where you drag the slider to the gap in the image, is common on many websites or apps because it is user-friendly and easy to identify. It also intercepts most of the primary crawlers.
As a Python crawler, how do you properly automate this validation process?
First, the core problem is how to find the location of the target gap. Once we know the location, we can use a tool like Selenium to do the dragging.
We can use OpencV to solve this problem. The main steps are:
What is OpencV?
OpenCV (Open Source Computer Vision Library) is an Open Source Computer Vision Library, the main algorithm involves image processing, Computer Vision and machine learning related methods, can be used to develop real-time image processing, Computer Vision and pattern recognition programs.
Installed directly
pip install opencv-python
Copy the code
Firstly, the image is processed with Gaussian blur. The main function of Gaussian blur is to reduce the noise of the image, which is used in the pretreatment stage.
Import cv2 as CV image = CV. Imread (image_path) = CV.GaussianBlur(image, (5,5), 0) CV. Imshow ("blurred")Copy the code
The effect after treatment
Canny edge detection is then used to obtain a binary image containing a “narrow boundary”. A binary image is a black and white image, just black and white.
Canny = CV. Canny (blurred,200,400) CV. Imshow ("canny", canny)Copy the code
Contour detection
contours, hierarchy = cv.findContours(canny, cv.RETR_CCOMP, cv.CHAIN_APPROX_SIMPLE) for i, contour in enumerate(contours): Rectangle (image, (x, y), (x + w, y + h), (0, 0, 255); 2) cv.imshow('image', image)Copy the code
Find out all the contours and mark them with a red wireframe. There are dozens of contours, big and small
The rest of the problem is easy to solve. We only need to limit the area or perimeter range of the contour to filter out the position of the target contour, provided that we determine the size of the target contour in advance.
for i, contour in enumerate(contours): If 6000 < CV. ContourArea (contour) <= 8000 and 300 < CV. ArcLength (contour, True) < 500: Rectangle (image, (x, y), (x + w, y + h), (0, 0, 255), rectangle(x + w, y + h), If x <= 200: continue return x + int(w / 2), 675Copy the code
The area of the contour is approximately between 6000 and 8000, and the circumference is between 300 and 500. Finally, the coordinate position and width and height of the contour are obtained by the outer rectangle.
Now that the target position is found, all that remains is to move the slider to the desired position. For more practical Python applications, please leave your comments and discussions in the comments section. Please be sure to retweet, like, comment and follow!