Web crawlers know that using HTTP proxies for business collection increases performance and quality. Crawler users who have used HTTP proxies know that there are two kinds of proxies. One is the traditional API quality proxy and tunnel forwarding crawler proxy enhanced version.
API agent:
Traditional API extraction agent obtains agent IP information regularly through URL. It needs to verify THE availability of IP, change agent Settings, and design multi-threaded asynchronous IO to realize concurrent processing of agent IP, which is not only cumbersome, but also affects efficiency.
Tunnel forwarding agent:
“Yiniu Cloud crawler proxy IP” establishes dedicated network links through fixed cloud proxy service addresses, and the proxy platform automatically realizes the millisecond proxy IP switch, ensuring the stability and speed of the network, and avoiding the crawler customers to invest in the optimization of proxy IP strategy.
Workflow of tunnel forwarding agent:
Set up a tunnel proxy server and issue a request to randomly assign a proxy IP address. The proxy server forwards these requests to the target website server, and then returns data to the tunnel proxy server for forwarding to the client
String targetUrl = "http://httpbin.org/ip"; String proxyHost = "http://t.16yun.cn"; string proxyPort = "31111"; String proxyUser = "username"; string proxyPass = "password"; WebProxy = new WebProxy(string.Format("{0}:{1}", proxyHost, proxyPort), true); ServicePointManager.Expect100Continue = false; var request = WebRequest.Create(targetUrl) as HttpWebRequest; request.AllowAutoRedirect = true; request.KeepAlive = true; request.Method = "GET"; request.Proxy = proxy; //request.Proxy.Credentials = CredentialCache.DefaultCredentials; request.Proxy.Credentials = new System.Net.NetworkCredential(proxyUser, proxyPass); // Set Proxy Tunnel // Random ran=new Random(); / / int tunnel = ran. Next 00 (1100); // request.Headers.Add("Proxy-Tunnel", String.valueOf(tunnel)); //request.Timeout = 20000; //request.ServicePoint.ConnectionLimit = 512; Request. UserAgent = "Mozilla/5.0 (Windows NT 6.3; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 Safari/537.36"; //request.Headers.Add("Cache-Control", "max-age=0"); //request.Headers.Add("DNT", "1"); //String encoded = System.Convert.ToBase64String(System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(proxyUser + ":" + proxyPass)); //request.Headers.Add("Proxy-Authorization", "Basic " + encoded); using (var response = request.GetResponse() as HttpWebResponse) using (var sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8)) { string htmlStr = sr.ReadToEnd(); }Copy the code
As can be seen above, THE API proxy is troublesome to use, while the tunnel forwarding proxy is convenient, simple and fast to use.