Common anti-reptilian measures

Verification code

Manually enter the picture verification code, enter the mobile phone verification code, or click and drag the verification code.

The solution

Use the image recognition API for verification code identification, call code website API, simulated login manual assistance.

IP restrictions

Websites restrict access to IP addresses for a short time based on the frequency and number of crawls.

The solution

Build an IP proxy pool and randomly select proxies.

UA limit

Websites are restricted based on the browser identifier when accessed.

The solution

Build a UA pool, select a random UA id, and select a random wait time.

Cookie limit

Validation is performed based on the random cookie generated each time it is opened.

The solution

Build your own cookie or, if complex, use Selenium to simulate the login.

Referer hotlinking prevention

Anti-link theft is mainly to verify the legitimacy of the request based on some key information carried by the client during the request process. There are many kinds of anti-link theft, such as Referer anti-link theft, timestamp anti-link theft, etc. According to the Referer judgment, the Referer anti-link theft informs the server which page the request is linked from. You can prevent crawling by limiting Referer.

The solution

The Referer field and its corresponding value are forged in the request headers.

HTML/JS/CSS confusion

HTML/JS/CSS code obfuscation, such as adding random useless code, to increase the difficulty of parsing.

The solution

Use tools to clean up and then parse.

Ajax dynamic loading

After loading the source code of the web page from its URL, the JavaScript program executes in the browser. These programs load more content and transfer it to the web page. If a crawler does not have a JS engine, or does have a JS engine but does not have a solution to handle JS returns, or does have a JS engine but has no way to make the site display the script enabled Settings.

The solution

Parse the Ajax request and grab the returned data.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Common anti – crawler measures and solutions

Common anti-reptilian measures

Verification code

The solution

IP restrictions

The solution

UA limit

The solution

Cookie limit

The solution

Referer hotlinking prevention

The solution

HTML/JS/CSS confusion

The solution

Ajax dynamic loading

The solution

Common anti – crawler measures and solutions

Common anti-reptilian measures

Verification code

The solution

IP restrictions

The solution

UA limit

The solution

Cookie limit

The solution

Referer hotlinking prevention

The solution

HTML/JS/CSS confusion

The solution

Ajax dynamic loading

The solution

Related Posts

SpringCloud — Distributed transaction issues

Hadoop cluster

What other methods do concatenation strings have besides “+”