This is the 21st day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021
Experiment 3
3.1 the topic
Proficient in Selenium searching HTML elements, crawling Ajax web data, waiting for HTML elements, etc.
Selenium framework + MySQL database storage technology is used to climb the stock data information of “Shanghai and Shenzhen A-shares”, “Shanghai A-shares” and “Shenzhen A-shares”.
Candidate sites: Oriental wealth network: quote.eastmoney.com/center/grid…
3.2 train of thought
3.2.1 Sending a Request
- Introduction drive
chrome_path = r"D:\Download\Dirver\chromedriver_win32\chromedriver_win32\chromedriver.exe" The path of the driver
browser = webdriver.Chrome(executable_path=chrome_path)
Copy the code
- Save the sections you need to climb
target = ["hs_a_board"."sh_a_board"."sz_a_board"]
target_name = {"hs_a_board": Shanghai and Shenzhen A-shares."sh_a_board": "Shanghai A-Share"."sz_a_board": Shenzhen A-shares}
Copy the code
The plan is to crawl two pages of information from three templates.
- Send the request
for k in target:
browser.get('http://quote.eastmoney.com/center/gridlist.html#%s'.format(k))
for i in range(1.3) :print("------------- page {} ---------".format(i))
if i <= 1:
get_data(browser, target_name[k])
browser.find_element_by_xpath('//*[@id="main-table_paginate"]/a[2]').click() # flip
time.sleep(2)
else:
get_data(browser, target_name[k])
Copy the code
Time. Sleep (2)
Otherwise, he will request so quickly that even though you turn to the second page, you still crawl the first page of information!!
3.2.2 Obtaining a Node
- Even when parsing web pages
implicitly_wait
Waiting for the
browser.implicitly_wait(10)
items = browser.find_elements_by_xpath('//*[@id="table_wrapper-table"]/tbody/tr')
Copy the code
And then this item is all the information
for item in items:
try:
info = item.text
infos = info.split("")
db.insertData([infos[0], part, infos[1], infos[2],
infos[4], infos[5],
infos[6], infos[7],
infos[8], infos[9],
infos[10], infos[11],
infos[12], infos[13]])except Exception as e:
print(e)
Copy the code
3.2.3 Saving Data
- Database class that encapsulates initialization and insert operations
class database() :
def __init__(self) :
self.HOSTNAME = '127.0.0.1'
self.PORT = '3306'
self.DATABASE = 'scrapy_homeword'
self.USERNAME = 'root'
self.PASSWORD = 'root'
Open a database connection
self.conn = pymysql.connect(host=self.HOSTNAME, user=self.USERNAME, password=self.PASSWORD,
database=self.DATABASE, charset='utf8')
Create a cursor object cursor using the cursor() method
self.cursor = self.conn.cursor()
def insertData(self, lt) :
sql = "INSERT INTO spider_gp(serial number, block, stock code, stock name, latest offer, up/down, up/down, volume, turnover, amplitude, high, low, today, yesterday)" \
"VALUES (%s,%s, %s, %s, %s, %s,%s, %s, %s, %s, %s,%s,%s,%s)"
try:
self.conn.commit()
self.cursor.execute(sql, lt)
print("Insert successful")
except Exception as err:
print("Insert failed", err)
Copy the code