Application treasure APP data collection
-
- Tools to prepare
- Project idea analysis
- Simple source code analysis
Tools to prepare
Data Source:Application of treasure
Development environment: Win10, PYTHon3.7
Development tools: PyCharm, Chrome
Project idea analysis
Define the data to be collected:
- Download address of app
- Number of app downloads
- The name of the app
- The company that developed the app
Extract the category tag to the page
Get the href attribute of the A tag
Used to concatenate dynamic addresses later
Find dynamically loaded APP data loading address
The value of the URL is the value of each category tag
Sj.qq.com/myapp/cate/…Concatenate the new URL value to send the request
Simple source code analysis
Import Requests # Toolkit sends network requests from LXML import etree # Convert to objects import CSV # Process table data URL = "https://sj.qq.com/myapp/category.htm?orgame=1" response = requests.get(url) html_data = etree.HTML(response.text) li_list = html_data.xpath('//ul[@data-modname="cates"][position()>1]/a/@href') del(li_list[-1]) for url1 in li_list: for i in range(10): new_url = "https://sj.qq.com/myapp/cate/appList.htm" + url1 + "&pageSize=20&pageContext={}".format(i*20) res = Request.get(new_url).json() if res["count"] == 0: break with open(" app.csv ", "a", newline="", encoding=" UTF-8 ")as f: csv_data = csv.DictWriter(f, fieldnames=["appName", 'authorName', "apkUrl"]) for info in res["obj"]: appName = info['appName'] authorName = info['authorName'] apkUrl = info['apkUrl'] print({"appName": appName, "authorName": authorName, "apkUrl": apkUrl}) csv_data.writerow({"appName": appName, "authorName": authorName, "apkUrl": apkUrl})Copy the code