Have a friend recently found a problem, when using the crawler agent request initiated by the crawler agent, after has not fulfilled each HTTP request automatically assigns different proxy IP, but all the requests are fixed use 20 seconds later, the same proxy IP will switch new proxy IP, this is what causes? Part of the code provided by partners is as follows:
#! -* -encoding: UTF-8 -* -import requests import random # Target page to access targetUrl = "http://httpbin.org/ip" # Target HTTPS page to access # TargetUrl = "https://httpbin.org/ip" # proxy server (product website www.16yun.cn) proxyHost = "t.16yun.cn" proxyPort = "31111" # Proxy authentication information proxyUser = "username" proxyPass = "password" proxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s" % { "host" : proxyHost, "port" : proxyPort, "user" : proxyUser, "pass" : ProxyPass,} # Set HTTP and HTTPS access to all proxies via PROXIES = {" HTTP ": proxyMeta," HTTPS ": {' Connection ':' keep-alive', 'accept-language ':' en ', "Proxy-Tunnel": str(tunnel) } for i in range(100): Resp = request.get (targetUrl, proxies=proxies, headers=headers) print resp.status_code print resp.text time.sleep(0.2)Copy the code
After debugging analysis, the above code is mainly two problems:
1. ‘Connection’ :’keep-alive’ needs to be closed
Keep-alive is a protocol specification between the client and the server. If keep-alive is enabled, the server does not close the TCP connection after returning a response, and the client does not close the connection after receiving the response packet. When sending the next HTTP request, the client will reuse the connection. This causes TCP links to keep opening, so the crawler agent’s automatic IP switch is disabled. As a result, a proxy IP address is kept in use for a long time. When the valid time of the proxy IP address expires in 20 seconds, the TCP connection is forcibly closed and the new proxy IP address is switched over.
2. The tunnel parameters are incorrectly set
Tunnel is used to control the proxy IP address switchover. The crawler proxy checks the value of the tunnel. If the value is different, a new proxy IP address is randomly assigned to the HTTP request for forwarding. If the tunnel is the same, the SAME proxy IP address is assigned to the HTTP request for forwarding. Therefore, to ensure that each HTTP request is forwarded through a different proxy IP address, you should implement tunnel = random.randint(1 10000) under for to ensure that the tunnel value in each HTTP request is different