It’s mooncake day, and it’s mooncake time again. Python will help you pick your favorite mooncake flavor

Target website: Mou Bao

​​

Tool use

Development tool: PyCharm

Development environment: python3.7, Windows10

Use the toolkit: Requests, LXML

Focus on learning

  • A get request
  • Get Web data
  • Data extraction method

Project idea analysis

Taobao website needs login to obtain, login can try to parse taobao interface, white and white here directly use the cookie request header to maintain the state, after login to obtain (but taobao does not need to login recently, you can try by yourself)

Find the keyword you need to search, the Mid-Autumn Festival is coming (white and white here search is moon cake)

Page number of Taobao is rendered by URL, through the URL to obtain the page number of goods

Urls can be simplified

To simplify before https://s.taobao.com/search?q=%E6%9C%88%E9%A5%BC&imgfile=&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_2 0210829 & ie = utf8 & bcoffset = 3 & ntoffset = 3 & p4ppushleft = 2% 2 c48 & s = https://s.taobao.com/search?q= after 44 simplified & s = {} {}Copy the code

Q is the search keywords and S is the number of pages you want to retrieve

Send network requests through the Requests tool

Get Web data

Key = "moon cakes" for I in range (1, 4) : url = 'https://s.taobao.com/search?q= & s = {} {}'. The format (key, STR (I * 44)) get_data (url)Copy the code

The data obtained is HTML data, which can be extracted by xpath, regular, PyQuery, and BS4 to select the appropriate data for your own use

Extract data in a canonical manner

Taobao data is existing in JSON data after extraction can be obtained through the dictionary value

Extracted data:

  • The price

  • The number of payment

  • The title

  • The store

  • place

    data = re.findall(‘”auctions”:(.*?) ,”recommendAuctions’, response.text)[0] for info in json.loads(data): Item = {} item [‘ url ‘] = info [‘ detail_url] item [‘ title ‘] = info [‘ raw_title] item [‘ image address] = info [‘ pic_url ‘] item = [‘ price ‘] Item info [‘ view_price] [‘ location ‘] = info [‘ item_loc] item [‘ buy ‘] = info. Get (‘ view_sales) item = [‘ comments’] Info [‘comment_count’] item[‘ shop ‘] = info[‘ Nick ‘]

Finally, the data is saved in a CSV table

def save_data(data): F = open(' file.csv ', "a", newline="", encoding=' UTF-8 ') csv_writer = csv.DictWriter(f, fieldnames=[' title ', 'price ',' number of buyers ', 'place', 'site', 'address' images, the 'comments',' shop ']) csv_writer. Writerow (data)Copy the code

Easy source sharing

Headers = {'referer': 'https://s.taobao.com/', 'cookie': '', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit Chrome/ 577.36 (KHTML, like Gecko) Safari/ 577.36 ',} def save_data(data): F = open(' file.csv ', "a", newline="", encoding=' UTF-8 ') csv_writer = csv.DictWriter(f, fieldnames=[' title ', 'price ',' number of buyers ', 'place', 'site', 'address' images, the 'comments',' shop ']) csv_writer. Writerow (data) def get_data (url) : response = requests.get(url, headers=headers) print(response.text) data = re.findall('"auctions":(.*?) ,"recommendAuctions', response.text)[0] for info in json.loads(data): Item = {} item [' url '] = info [' detail_url] item [' title '] = info [' raw_title] item [' image address] = info [' pic_url '] item = [' price '] Item info [' view_price] [' location '] = info [' item_loc] item [' buy '] = info. Get (' view_sales) item = [' comments'] Item info [' comment_count] [' shops'] = info [' Nick '] print (item) save_data (item) if __name__ = = "__main__ ': File = open(' csv.csv ', "w", encoding=" utF-8-sig ", newline= "") csv_head = csv.writer(file) # header = [' title ', 'price ', Csv_head. Writerow (header) key = "中 国" for I in range(1, 4): url = 'https://s.taobao.com/search?q={}&s={}'.format(key, str(i*44)) get_data(url) time.sleep(5)Copy the code

Suitable to send what need not say more!

Need complete code, tutorial

① Part-time exchange, industry consultation, online professional answers

②Python development environment installation tutorial

③Python400 self-study video

④ Common vocabulary of software development

⑤Python learning roadmap

⑥ Over 3000 Python ebooks

You can just take it if you need it and click collect. Inconvenient point link group: 948351247