Euniu Cloud crawler IP through fixed cloud proxy service address, establish a dedicated network link, the proxy platform automatically realize massive IP pool management and load balancing, real-time non-sensitive millisecon-level proxy IP switch, provide network stability and request response speed of enterprise cloud services, and reduce the client computing load pressure. It avoids the crawler clients’ investment in proxy IP policy optimization and improves crawler efficiency as a whole.
Using a proxy:
1. If you use crawler proxy through a browser, please set the server address and port of the browser proxy. Settings After you save the configuration, an authorization authentication window is displayed when you open any URL in the browser. Enter the user name and password for confirmation.
2. Use crawlers in code. Most programming languages have libraries to implement this authorization mode, see www.16yun.cn/help/ss_dem… In demo, if the HTTP request method in the code does not support setting identity authentication information in the form of user name and password, you need to manually add a proxy-authorization header for each HTTP request with the value of Basic, where Base64 is the user name and password. A string encoded in BASE64 after concatenation. Identity authentication is performed in the form of user name and password. The authentication information is eventually converted into the proxy-authorization header and sent together with the request. If the user authentication is incorrect, The system returns 401 Unauthorized or 407 Proxy Authentication Required.
Crawler access error:
Proxy server failed: The local DNS server is faulty. Please set aliyun public DNS 223.5.5.5
HTTP code 407: Incorrect Proxy user name password The Proxy requires user authentication, and please add the user authentication header proxy-authorization
HTTP code 401: Incorrect proxy user name Password The proxy user name or password is incorrect
You need to check whether the user’s bandwidth is sufficient and the target web site is slow
If you encounter too many 429s, consider reducing the number of threads (concurrent number) or adding a time interval (recommended >300ms).
HTTP code 504: The proxy is switching IP or the destination site is unreachable. If a large number of sites occur, it is recommended to check whether the target website is accessible without using a proxy. It may be caused by the protection measures of the target website. It is recommended to add the correct cookie, referer, user-agent, etc.