Many people know that crawler work must need proxy IP to proceed. In fact, it is not, the crawler can not use proxy, crawler is also a user visiting the web page in essence, but a special user not so obey the rules, the server is generally not welcome such a special user is always found and banned by various means. So let’s see where we don’t use it.
Small volume of business
Crawler work with small business volume can be completed sometimes without the use of proxy IP, such as crawling hundreds of articles, easily completed with a locomotive; Or the work efficiency is not too big requirements, can simulate the artificial normal access speed slowly climb.
Weak anti-crawl strategy
Some websites have no anti-crawler strategy, so they can carry out crawler work normally without using proxy IP. However, it is recommended not to be too fierce, so as not to cause website server crash. Some websites have very weak anti-crawler strategies, so crawlers can work normally without proxy IP.
Low access frequency
The most common is to judge the frequency of your visit, because the average person visiting the web page frequency is not very fast, if a certain IP access too fast will be blocked. Of course, you can choose to reduce the frequency of access to avoid being detected by the server, but if your crawler visits the same frequency and access logic as a normal person, then your crawler doesn’t have much of a purpose. Do reptiles are hoping their crawler to capture a large amount of data as soon as possible, but also often frequent updates, of course have moral integrity of the crawler knows will crawl frequency in a reasonable range to reduce the pressure of the target server rather than bravado climb quickly, to know there is no absolute crawl and effective means, they often keep their subtle between tacit understanding, not you, And if you climb somebody else, they’ll climb you no, you’ll have to pay them back, but that’s a whole other level of stuff. The more commonly used means is to use tomato to accelerate www.fanqieip.net/ proxy IP, break through the anti-crawler mechanism of the server, continue to grab at a higher frequency.