It’s mooncake day, and it’s mooncake time again. Python will help you pick your favorite mooncake flavor
Target website: Mou Bao
Tool use
Development tool: PyCharm
Development environment: python3.7, Windows10
Use the toolkit: Requests, LXML
Focus on learning
- A get request
- Get Web data
- Data extraction method
Project idea analysis
Taobao website needs login to obtain, login can try to parse taobao interface, white and white here directly use the cookie request header to maintain the state, after login to obtain (but taobao does not need to login recently, you can try by yourself)
Find the keyword you need to search, the Mid-Autumn Festival is coming (white and white here search is moon cake)
Page number of Taobao is rendered by URL, through the URL to obtain the page number of goods
Urls can be simplified
To simplify before https://s.taobao.com/search?q=%E6%9C%88%E9%A5%BC&imgfile=&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_2 0210829 & ie = utf8 & bcoffset = 3 & ntoffset = 3 & p4ppushleft = 2% 2 c48 & s = https://s.taobao.com/search?q= after 44 simplified & s = {} {}Copy the code
Q is the search keywords and S is the number of pages you want to retrieve
Send network requests through the Requests tool
Get Web data
Key = "moon cakes" for I in range (1, 4) : url = 'https://s.taobao.com/search?q= & s = {} {}'. The format (key, STR (I * 44)) get_data (url)Copy the code
The data obtained is HTML data, which can be extracted by xpath, regular, PyQuery, and BS4 to select the appropriate data for your own use
Extract data in a canonical manner
Taobao data is existing in JSON data after extraction can be obtained through the dictionary value
Extracted data:
-
The price
-
The number of payment
-
The title
-
The store
-
place
data = re.findall(‘”auctions”:(.*?) ,”recommendAuctions’, response.text)[0] for info in json.loads(data): Item = {} item [‘ url ‘] = info [‘ detail_url] item [‘ title ‘] = info [‘ raw_title] item [‘ image address] = info [‘ pic_url ‘] item = [‘ price ‘] Item info [‘ view_price] [‘ location ‘] = info [‘ item_loc] item [‘ buy ‘] = info. Get (‘ view_sales) item = [‘ comments’] Info [‘comment_count’] item[‘ shop ‘] = info[‘ Nick ‘]
Finally, the data is saved in a CSV table
def save_data(data): F = open(' file.csv ', "a", newline="", encoding=' UTF-8 ') csv_writer = csv.DictWriter(f, fieldnames=[' title ', 'price ',' number of buyers ', 'place', 'site', 'address' images, the 'comments',' shop ']) csv_writer. Writerow (data)Copy the code
Easy source sharing
Headers = {'referer': 'https://s.taobao.com/', 'cookie': '', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit Chrome/ 577.36 (KHTML, like Gecko) Safari/ 577.36 ',} def save_data(data): F = open(' file.csv ', "a", newline="", encoding=' UTF-8 ') csv_writer = csv.DictWriter(f, fieldnames=[' title ', 'price ',' number of buyers ', 'place', 'site', 'address' images, the 'comments',' shop ']) csv_writer. Writerow (data) def get_data (url) : response = requests.get(url, headers=headers) print(response.text) data = re.findall('"auctions":(.*?) ,"recommendAuctions', response.text)[0] for info in json.loads(data): Item = {} item [' url '] = info [' detail_url] item [' title '] = info [' raw_title] item [' image address] = info [' pic_url '] item = [' price '] Item info [' view_price] [' location '] = info [' item_loc] item [' buy '] = info. Get (' view_sales) item = [' comments'] Item info [' comment_count] [' shops'] = info [' Nick '] print (item) save_data (item) if __name__ = = "__main__ ': File = open(' csv.csv ', "w", encoding=" utF-8-sig ", newline= "") csv_head = csv.writer(file) # header = [' title ', 'price ', Csv_head. Writerow (header) key = "中 国" for I in range(1, 4): url = 'https://s.taobao.com/search?q={}&s={}'.format(key, str(i*44)) get_data(url) time.sleep(5)Copy the code
Suitable to send what need not say more!
Need complete code, tutorial
① Part-time exchange, industry consultation, online professional answers
②Python development environment installation tutorial
③Python400 self-study video
④ Common vocabulary of software development
⑤Python learning roadmap
⑥ Over 3000 Python ebooks
You can just take it if you need it and click collect. Inconvenient point link group: 948351247