Introduction to Selenium

With the development of network technology, most websites use dynamic loading technology, such as JavaScript dynamic rendering and Ajax dynamic loading

For crawling these sites, there are two general ideas:

  • Analyzing Ajax requests and simulating them to get real data is a method that has been used many times in previous articles and won’t be covered here
  • Using Selenium mock browser for dynamic rendering to capture the real data returned by the site is described in detail below

What exactly is Selenium? Selenium is simply a testing tool for Web applications

According to the official documentation, one of the biggest advantages of Selenium is that it can run directly in the browser, simulating real user behavior

But this is also its biggest drawback, as it is slow to run due to the need to simulate the real rendering process

Please refer to the official documentation for other details

Two, selenium use

0. Preparation

  • To install selenium
pip install selenium
Copy the code
  • Install the driver

When using Selenium, you must have the corresponding browser drive file in the Python installation directory, otherwise an exception will occur

Chrome driver download website is as follows: sites.google.com/a/chromium….

Because the above official website needs to climb over the wall to access, so the blogger also here simply to tell you about the method of installing drivers, the specific steps are as follows:

  1. Open Chrome and enter Chrome :// Settings /help in the address box to view the version information of Chrome

    For example 70.0.3538.67

  2. After remove the last part of the above information is attached to the chromedriver.storage.googleapis.com/LATEST_RELE…

    For example chromedriver.storage.googleapis.com/LATEST_RELE…

  3. Visit the link above for the corresponding drive version information

    For example 70.0.3538.97

  4. The above information is attached to the chromedriver.storage.googleapis.com/index.html?… “With a slash at the end

    For example chromedriver.storage.googleapis.com/index.html?…

  5. Visit the above link and select the appropriate platform (Linux, MAC, and Win) to download the package

  6. After the download is complete, decompress the decompressed file to the Python installation directory

1. Import modules

>>> from selenium import webdriver
Copy the code

Webdriver is what we call a browser driver. It supports a variety of browsers. Take Chrome as an example

2. Open the browser

>>> browser = webdriver.Chrome()
>>> type(browser)
# <class 'selenium.webdriver.chrome.webdriver.WebDriver'>
Copy the code

3. Visit the page

Use the Get (URL) method of the WebDriver object to access the page corresponding to the URL

>>> browser.get('https://www.baidu.com')
>>> print(browser.current_url) The # current_URL property gets the URL of the current web page
# https://www.baidu.com/
>>> print(browser.page_source) The # page_source property gets the source code for the current web page
Copy the code

4. Find elements

Method one:

methods describe
find_element_by_id(id) Matching by ID
find_element_by_name(name) Match by name
find_element_by_class_name(name) Matches by class_name
find_element_by_tag_name(name) Matches by tag_name
find_element_by_link_text(link_text) Matches by link_text
find_element_by_partical_link_text(link_text) Match with partical_link_text
find_element_by_css_selector(css_selector) Matched by cSS_selector
find_element_by_xpath(xpath) Match by xpath

Here are several ways to try to match the input box:

>>> search_bar = browser.find_element_by_id('kw')
>>> search_bar = browser.find_element_by_css_selector('#kw')
>>> search_bar = browser.find_element_by_xpath('//*[@id="kw"]')
>>> type(search_bar)
# <class 'selenium.webdriver.remote.webelement.WebElement'>
Copy the code

Method 2:

>>> from selenium.webdriver.common.by import By
>>> element = browser.find_element(by,value)
Copy the code
  • The value argument is the matching expression corresponding to the matching method
  • The by argument specifies the matching method, whose optional values are listed below (similar to method 1).
value describe
By.ID Matching by ID
By.NAME Match by name
By.CLASS_NAME Matches by class_name
By.TAG_NAME Matches by tag_name
By.LINK_TEXT Matches by link_text
By.PARTIAL_LINK_TEXT Match with partical_link_text
By.CSS_SELECTOR Matched by cSS_selector
By.XPATH Match by xpath

Here’s an attempt to match the confirm button using several methods:

>>> from selenium.webdriver.common.by import By
>>> button = browser.find_element(By.ID,'su')
>>> button = browser.find_element(By.CSS_SELECTOR,'#su')
>>> button = browser.find_element(By.XPATH,'//*[@id="su"]')
>>> type(button)
# <class 'selenium.webdriver.remote.webelement.WebElement'>
Copy the code

Note:

For both methods, the WebElement object is returned on success, and NoSuchElementException is thrown on failure

When you need to find more than one element, you simply change element in the method to elements, and the list of matches is returned

5. Element interaction

Common element interactions are listed as follows:

  • Get the text node (you can get the text node using the text property)
  • Gets the element attribute value
>>> button.get_attribute('type')
# 'submit'
Copy the code
  • Write input box
>>> search_bar.send_keys('Selenium') Enter content into the input box
>>> search_bar.clear() # Empty the input field
>>> search_bar.send_keys('Selenium')
>>> from selenium.webdriver.common.keys import Keys
>>> search_bar.send_keys(Keys.ENTER) Type ENTER into the input box
Copy the code
  • Click the Submit button
>>> button.click() # Click submit button, equivalent to search_bar.send_keys(keys.enter) above
Copy the code

6. Perform interactive actions

The common methods for attaching an action to an action chain are listed as follows:

methods describe
click(on_element=None) Click the element with the left mouse button
double_click(on_element=None) Double click the element
context_click(on_element=None) Right-click on the element
click_and_hold(on_element=None) Press the mouse
release(on_element=None) Release the mouse
move_to_element(to_element) Move the mouse to the center of the specified element
drag_and_drop(source, target) Drag and drop elements
key_down(value, element=None) Press the keyboard, usually only with Ctrl, Alt and Shift
key_up(value, element=None) Loosen the keyboard
send_keys(keys_to_send) Sends keyboard input to the current focused element
send_keys_to_element(element, keys_to_send) Sends keyboard input to the specified element
pause(seconds) Suspends all input for the specified time
perform() Perform all actions in the action chain

The following example shows the scroll to next button and click the next button to turn the page

>>> from selenium.webdriver.common.action_chains import ActionChains
>>> target = browser.find_element_by_class_name('n')
>>> ActionChains(browser).move_to_element(target).click(target).perform()
Copy the code

7. Execute JavaScript

JavaScript does most of the work on a web page, but I won’t go into details here because of the complexity of the content

Here’s a simple example of what JavaScript can do: Drag a page to the bottom:

>>> js = "window.scrollTo(0,document.body.scrollHeight)"
>>> browser.execute_script(js)
Copy the code

8, wait

If the specified element is not found within the specified time, an exception will be thrown.

>>> from selenium.webdriver.support.wait import WebDriverWait
>>> from selenium.webdriver.support import expected_conditions as EC
>>> wait = WebDriverWait(browser,10)
>>> try:
	element = wait.until(EC.presence_of_element_located((By.CLASS_NAME,'n')))
except:
    browser.quit()
Copy the code

The other EXPECted_conditions methods are listed as follows:

attribute describe
title_is(title) Verify that title is equal to browser.title
title_contains(title) Verify that title is included in browser.title
presence_of_element_located(locator) Verify that the locator element is loaded in the DOM
presence_of_all_elements_located(locator) Verify that the locator elements are all loaded in the DOM
visibility_of_element_located(locator) Verify that the locator element is visible
invisibility_of_element_located(locator) Verify that the locator element is hidden
text_to_be_present_in_element(locator,text) Verify that text is contained in the text of the locator element
text_to_be_present_in_element_value(locator,text) Verify that text is contained in the value of the locator element
frame_to_be_available_and_switch_to_it(locator) Verify that the locator(frame) element is accessible
element_to_be_clickable(locator) Verify that the locator element is clickable
element_located_to_be_selected(locator) Verify that the locator element is selected

9. Close the browser

methods describe
close() Close current window
quit() Close all associated Windows

A simple example is as follows:

>>> browser.quit()
Copy the code