I. Project Introduction
The main goal of this paper is to collect taobao’s comments and find out what functions customers need. Statistics customer evaluation above kua which function, such as waterproof, large capacity, good-looking and so on.
Ii. Project Preparation
Pycharm – Amway Python Pycharm – Amway Python Pycharm
2. Crawl the commodity address, as shown below:
https://detail.tmall.com/item.htm?spm=a230r.1.14.1.55a84b1721XG00&id=552918017887&ns=1&abbucket=17
Copy the code
3. How many libraries need to be downloaded?
Open pyCharm and click File under Setting and then Project:Interpreter under Project: your File name.
Click the + sign to install the libraries needed for the project, such as Requests, Beautifulsoup4, simpleJSON.
Iii. Project Realization
1. Import required libraries
import requests
from bs4 import BeautifulSoup as bs
import json
import csv
import re
Copy the code
2. To log in to Taobao.com, choose Google Chrome to select developer tools or press F12 with a Network option to find list_detail_rate.htm? file
Define a variable to store URL address PAGE_URL = []
Defines a link list function that uses string concatenation to form the number of pages of comments
Define a function to get the comment data to define the fields that need to be used, such as username, comment time, color classification, and comment, as shown in the figure below.
Which cookie value is in the developer tools Network there’s an item.htm? The SPM file should have a cookie copied over.
Parse the contents of the JS file and write the data to the TEXT file, as shown below.
Finally, define a main function to crawl the required number of comment pages, as shown in the figure below.
The final result is shown in the figure below:
【 IV. Summary 】
1. Based on The Python web crawler, this paper collects the evaluation of Taobao products, and the method is effective. However, it is recommended not to grab too much, so as not to cause pressure to the server.
2. If you need the source of this article, please reply to “Taobao comment” in the background of the public account.
Learn more about front-end, Python crawlers, big data and more at pdcfighting.com/