Common anti - crawling means and solutions

Anti-crawl Strategy 1: restrict by UA or other header information

Solution: Build a user agent pool or other header information

Anti-creep Strategy 2: Restrict by visitor IP address

Solution: Build an IP proxy pool

Anti-crawl strategy 3: Pass the verification code limit

Solution: Manual coding, verification code interface automatic recognition or automatic recognition through machine learning

Anti-crawling strategy 4: restriction through asynchronous loading of data

Solution: Capture packet analysis or use PhantomJS

Anti-crawl Strategy 5: Restrict by Cookie

Solution: Cookie processing

Anti-crawling strategy 6: restrict by JS (e.g., request data generated randomly by JS, etc.)

Solution: Analyze JS decrypt or use PhantomJS

Of course, when using crawler, it is necessary to follow the robots convention of the website and do not affect the website.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Common anti – crawling means and solutions

Common anti – crawling means and solutions

Related Posts

Summary of ios Certificates

Talk about Canal, alibaba open source middleware

Getting Started with Python 4– Operators