The problem
Starting with a review of the previous two Selenium series, Selenium crawlers use agents. Why is selenium still considered a robot by the server? Selenium crawler use proxy without setting these parameters, the proxy is useless, correctly turn off webrTC and set the time zone and geographical location of the proxy, can very well help our browser disguise as a serious normal browser, just like the filter can turn Wang Ma ma into popular nerdy Joe Bilo. But that’s not enough. Risk control systems have ways of picking you out. So at the request of my fans, I write today:
How to get rid of selenium’s annoying WebDriver trace as smoothly as Dove?
why
The reason is simple. When we use Selenium + ChromeDriver to start Chrome, we inject some properties into the Chrome Navigator and Document objects. If the Javascript code returned by the Web server has a check for these properties, Then we’ll be recognized as a robot visiting.
The solution
The solution is logically simple: we remove whatever selenium adds.
Let’s start with Selenium code that does not remove WebDriver
from selenium import webdriver chrome_options = webdriver.ChromeOptions() chrome_options.add_experimental_option( "excludeSwitches", ["enable-automation"]) chrome_options.add_experimental_option('useAutomationExtension', Lang = False) chrome_options. Add_argument (' useful - CN, useful, useful - TW, en - US, en ') chrome_options. Add_argument (' the user-agent = Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, = webDriver.Chrome("./ chromeDriver ", chrome_options=chrome_options) driver.get("https://bot.sannysoft.com/")Copy the code
Test results screenshot:
As we can see, WebDriver is detected, which means that the server knows that you are using Selenium to access its site.
We remove the WebDriver code on Selenium code
from selenium import webdriver chrome_options = webdriver.ChromeOptions() chrome_options.add_experimental_option( "excludeSwitches", ["enable-automation"]) chrome_options.add_experimental_option('useAutomationExtension', Lang = False) chrome_options. Add_argument (' useful - CN, useful, useful - TW, en - US, en ') chrome_options. Add_argument (' the user-agent = Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome / 67.0.3396.99 Safari / 537.36 ') Chrome_options. add_argument("disable-blink-features=AutomationControlled")# this line tells Chrome to remove webDriver traces driver = webdriver.Chrome( "./chromedriver", chrome_options=chrome_options) driver.get("https://bot.sannysoft.com/")Copy the code
Test results screenshot:
By adding a disable-blink-features=AutomationControlled flag to chrome startup, you can reduce the webdriver’s footprint.
If you think you can do this: turn off WebrTC, set browser time zone and location, and get rid of WebDriver?
No, that’s not enough. The server can find you, so you can only send the following text message to your boss.
Then continue to work overtime. For more anti-detection, see you next time! Image from: game