“This is the first day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021”

Install Selenium

Based on the Window operating system

1. Installation of Selenium library

(Python3 needs to be installed first, which will not be covered here.)

  • PIP Install Selenium can be run on CMD
  • The Selenium library can also be added under the PyCharm project directory

2. Download the browser driver

All major browsers webdriver addresses can see: docs.seleniumhq.org/download/

The browser Webdriver addresses
Firefox Github.com/mozilla/gec…
Chrome Sites.google.com/a/chromium….orchromedriver.storage.googleapis.com/index.html
IE selenium-release.storage.googleapis.com/index.html

3. Webdriver installation directory

  • (1) Place the downloaded and decompressed driver file in the Python installation path

(If you do not know the installation path, run the where python command to query the installation path.)

  • Venv ->Scripts (pycharm project

(I have python3 configured, browser: Firefox 66.03 (latest) driver: geckodriver-v0.24.0-win64.zip)

Simple operation of Selenium

Tiantian fund website: fund.eastmoney.com/

1. Open tiantian Fund website based on firefox browser driver;

After the page “fund ranking” appears, click “Fund ranking” to enter the ranking page.

from selenium import webdriver

d = webdriver.Firefox()  # Open browser
d.get('http://fund.eastmoney.com/') # Enter the page

# # Click fund ranking
d.find_element_by_xpath("//li[@class='ph']/a").click()  Location via xpath elements
The # click() method mimics a human click

Copy the code

Why xpath is “//li[@class=’ph’]/a

Learn more about xpath by referring to the basics of Xpath in Python + Selenium.

2. Enter 00 in the fund search box, select the fourth fund from the drop-down list box, and click to enter the fund details page.

d.switch_to.window(d.window_handles[1])  Switch handle to current page
d.find_element_by_xpath("//input[@id='search-input']").clear()
d.find_element_by_xpath("//input[@id='search-input']").send_keys("00") #send_keys("00") Imitate manual input
# d.reresh () # refresh the page
time.sleep(3)
#click () the fourth message in the dropdown menu
d.find_element_by_xpath("//tr[@data-submenu='590008']/td[@class='seaCol2']").click()
Copy the code

3. Obtain the red box content on the fund details page and store it in the mysql database

d.switch_to.window(d.window_handles[2])
# Get the content of the web page and
bs = BeautifulSoup(d.page_source,'html.parser')
Create a list to store data in
InfoList = []
Get the fund code
code = bs.find("span", {"class":"ui-num"}).text
Get basic fund information
div = bs.find('div', {'class':'fundInfoItem'})
InfoList.append(code) Add the fund code to the list
dls = div.find_all('dl')
for dl in dls:
    dds = dl.find_all('dd')
    for i,dd in enumerate(dds):
        # print(dd)
        "" "dd content is as follows: dd [0] < dd class = "dataNums" > < dl class = "floatleft" >... Id = "gz_gszzl" > + 0.15% < / span > < / dl > < / dd > dd [1] < dd > < span > nearly 1 month: -4.45% dd[2] < DD > -4.76%
      
        if i == 1:
            span = dd.find_all('span') [1].string
            InfoList.append(span)
        if i == 2:
            span = dd.find_all('span') [1].string
            InfoList.append(span)

conn = pymysql.connect(host = 'localhost',user = 'root',password = '123456',port = 3306,db = 'mystudy',charset = 'utf8') # Database connection
cursor = conn.cursor() Create a cursor
tablename = 'seleniumGetNews'
# Data store operation
try:
    insertSql = "insert into {} values (%s,%s,%s,%s,%s,%s,%s)".format(tablename) # mysql statement
    cursor.execute(insertSql,InfoList)
except pymysql.err.ProgrammingError:
    Create table if the table does not exist.
    createSql = "create table {}(" \
                Varchar (100) primary key, \
                Last month varchar(100), \
                Last year varchar(100), \
                Varchar (100)," \
                "Last three years Varchar (100)," \
                Varchar (100)," \
                "Set to varchar(100)" \
                "ENGINE = InnoDB DEFAULT CHARSET =utf8;".format(tablename)
    cursor.execute(createSql)
    cursor.execute(insertSql,InfoList)
conn.commit() # submit command
cursor.close() Close the cursor
conn.close()  # close the connection

Copy the code

4. Switch the page back to the home page (handle switch)

d.switch_to.window(d.window_handles[0]) # Regarding page handles, the third question is addressed
d.refresh() # Refresh page
Copy the code

3. Problems encountered

1. After jumping from main window A to main window B, elements in window B cannot be located

Cause: After the script is started and page B is opened from page A, the window handle (focus) still stays on the main page A, so the element of page B cannot be located, and an error is reported that there is no current element, and the page is redirected.

When we print the handle, the output is a list, each representing its own page. Solution: Switch the window handle (focus) to the current page

d.switch_to.window(d.window_handles[1])
Copy the code