Introduction to Selenium
With the development of network technology, most websites use dynamic loading technology, such as JavaScript dynamic rendering and Ajax dynamic loading
For crawling these sites, there are two general ideas:
- Analyzing Ajax requests and simulating them to get real data is a method that has been used many times in previous articles and won’t be covered here
- Using Selenium mock browser for dynamic rendering to capture the real data returned by the site is described in detail below
What exactly is Selenium? Selenium is simply a testing tool for Web applications
According to the official documentation, one of the biggest advantages of Selenium is that it can run directly in the browser, simulating real user behavior
But this is also its biggest drawback, as it is slow to run due to the need to simulate the real rendering process
Please refer to the official documentation for other details
Two, selenium use
0. Preparation
- To install selenium
pip install selenium
Copy the code
- Install the driver
When using Selenium, you must have the corresponding browser drive file in the Python installation directory, otherwise an exception will occur
Chrome driver download website is as follows: sites.google.com/a/chromium….
Because the above official website needs to climb over the wall to access, so the blogger also here simply to tell you about the method of installing drivers, the specific steps are as follows:
-
Open Chrome and enter Chrome :// Settings /help in the address box to view the version information of Chrome
For example 70.0.3538.67
-
After remove the last part of the above information is attached to the chromedriver.storage.googleapis.com/LATEST_RELE…
For example chromedriver.storage.googleapis.com/LATEST_RELE…
-
Visit the link above for the corresponding drive version information
For example 70.0.3538.97
-
The above information is attached to the chromedriver.storage.googleapis.com/index.html?… “With a slash at the end
For example chromedriver.storage.googleapis.com/index.html?…
-
Visit the above link and select the appropriate platform (Linux, MAC, and Win) to download the package
-
After the download is complete, decompress the decompressed file to the Python installation directory
1. Import modules
>>> from selenium import webdriver
Copy the code
Webdriver is what we call a browser driver. It supports a variety of browsers. Take Chrome as an example
2. Open the browser
>>> browser = webdriver.Chrome()
>>> type(browser)
# <class 'selenium.webdriver.chrome.webdriver.WebDriver'>
Copy the code
3. Visit the page
Use the Get (URL) method of the WebDriver object to access the page corresponding to the URL
>>> browser.get('https://www.baidu.com')
>>> print(browser.current_url) The # current_URL property gets the URL of the current web page
# https://www.baidu.com/
>>> print(browser.page_source) The # page_source property gets the source code for the current web page
Copy the code
4. Find elements
Method one:
methods | describe |
---|---|
find_element_by_id(id) | Matching by ID |
find_element_by_name(name) | Match by name |
find_element_by_class_name(name) | Matches by class_name |
find_element_by_tag_name(name) | Matches by tag_name |
find_element_by_link_text(link_text) | Matches by link_text |
find_element_by_partical_link_text(link_text) | Match with partical_link_text |
find_element_by_css_selector(css_selector) | Matched by cSS_selector |
find_element_by_xpath(xpath) | Match by xpath |
Here are several ways to try to match the input box:
>>> search_bar = browser.find_element_by_id('kw')
>>> search_bar = browser.find_element_by_css_selector('#kw')
>>> search_bar = browser.find_element_by_xpath('//*[@id="kw"]')
>>> type(search_bar)
# <class 'selenium.webdriver.remote.webelement.WebElement'>
Copy the code
Method 2:
>>> from selenium.webdriver.common.by import By
>>> element = browser.find_element(by,value)
Copy the code
- The value argument is the matching expression corresponding to the matching method
- The by argument specifies the matching method, whose optional values are listed below (similar to method 1).
value | describe |
---|---|
By.ID | Matching by ID |
By.NAME | Match by name |
By.CLASS_NAME | Matches by class_name |
By.TAG_NAME | Matches by tag_name |
By.LINK_TEXT | Matches by link_text |
By.PARTIAL_LINK_TEXT | Match with partical_link_text |
By.CSS_SELECTOR | Matched by cSS_selector |
By.XPATH | Match by xpath |
Here’s an attempt to match the confirm button using several methods:
>>> from selenium.webdriver.common.by import By
>>> button = browser.find_element(By.ID,'su')
>>> button = browser.find_element(By.CSS_SELECTOR,'#su')
>>> button = browser.find_element(By.XPATH,'//*[@id="su"]')
>>> type(button)
# <class 'selenium.webdriver.remote.webelement.WebElement'>
Copy the code
Note:
For both methods, the WebElement object is returned on success, and NoSuchElementException is thrown on failure
When you need to find more than one element, you simply change element in the method to elements, and the list of matches is returned
5. Element interaction
Common element interactions are listed as follows:
- Get the text node (you can get the text node using the text property)
- Gets the element attribute value
>>> button.get_attribute('type')
# 'submit'
Copy the code
- Write input box
>>> search_bar.send_keys('Selenium') Enter content into the input box
>>> search_bar.clear() # Empty the input field
>>> search_bar.send_keys('Selenium')
>>> from selenium.webdriver.common.keys import Keys
>>> search_bar.send_keys(Keys.ENTER) Type ENTER into the input box
Copy the code
- Click the Submit button
>>> button.click() # Click submit button, equivalent to search_bar.send_keys(keys.enter) above
Copy the code
6. Perform interactive actions
The common methods for attaching an action to an action chain are listed as follows:
methods | describe |
---|---|
click(on_element=None) | Click the element with the left mouse button |
double_click(on_element=None) | Double click the element |
context_click(on_element=None) | Right-click on the element |
click_and_hold(on_element=None) | Press the mouse |
release(on_element=None) | Release the mouse |
move_to_element(to_element) | Move the mouse to the center of the specified element |
drag_and_drop(source, target) | Drag and drop elements |
key_down(value, element=None) | Press the keyboard, usually only with Ctrl, Alt and Shift |
key_up(value, element=None) | Loosen the keyboard |
send_keys(keys_to_send) | Sends keyboard input to the current focused element |
send_keys_to_element(element, keys_to_send) | Sends keyboard input to the specified element |
pause(seconds) | Suspends all input for the specified time |
perform() | Perform all actions in the action chain |
The following example shows the scroll to next button and click the next button to turn the page
>>> from selenium.webdriver.common.action_chains import ActionChains
>>> target = browser.find_element_by_class_name('n')
>>> ActionChains(browser).move_to_element(target).click(target).perform()
Copy the code
7. Execute JavaScript
JavaScript does most of the work on a web page, but I won’t go into details here because of the complexity of the content
Here’s a simple example of what JavaScript can do: Drag a page to the bottom:
>>> js = "window.scrollTo(0,document.body.scrollHeight)"
>>> browser.execute_script(js)
Copy the code
8, wait
If the specified element is not found within the specified time, an exception will be thrown.
>>> from selenium.webdriver.support.wait import WebDriverWait
>>> from selenium.webdriver.support import expected_conditions as EC
>>> wait = WebDriverWait(browser,10)
>>> try:
element = wait.until(EC.presence_of_element_located((By.CLASS_NAME,'n')))
except:
browser.quit()
Copy the code
The other EXPECted_conditions methods are listed as follows:
attribute | describe |
---|---|
title_is(title) | Verify that title is equal to browser.title |
title_contains(title) | Verify that title is included in browser.title |
presence_of_element_located(locator) | Verify that the locator element is loaded in the DOM |
presence_of_all_elements_located(locator) | Verify that the locator elements are all loaded in the DOM |
visibility_of_element_located(locator) | Verify that the locator element is visible |
invisibility_of_element_located(locator) | Verify that the locator element is hidden |
text_to_be_present_in_element(locator,text) | Verify that text is contained in the text of the locator element |
text_to_be_present_in_element_value(locator,text) | Verify that text is contained in the value of the locator element |
frame_to_be_available_and_switch_to_it(locator) | Verify that the locator(frame) element is accessible |
element_to_be_clickable(locator) | Verify that the locator element is clickable |
element_located_to_be_selected(locator) | Verify that the locator element is selected |
9. Close the browser
methods | describe |
---|---|
close() | Close current window |
quit() | Close all associated Windows |
A simple example is as follows:
>>> browser.quit()
Copy the code