Anti-crawl Strategy 1: restrict by UA or other header information
Solution: Build a user agent pool or other header information
Anti-creep Strategy 2: Restrict by visitor IP address
Solution: Build an IP proxy pool
Anti-crawl strategy 3: Pass the verification code limit
Solution: Manual coding, verification code interface automatic recognition or automatic recognition through machine learning
Anti-crawling strategy 4: restriction through asynchronous loading of data
Solution: Capture packet analysis or use PhantomJS
Anti-crawl Strategy 5: Restrict by Cookie
Solution: Cookie processing
Anti-crawling strategy 6: restrict by JS (e.g., request data generated randomly by JS, etc.)
Solution: Analyze JS decrypt or use PhantomJS
Of course, when using crawler, it is necessary to follow the robots convention of the website and do not affect the website.