Application treasure APP data collection

    • Tools to prepare
    • Project idea analysis
    • Simple source code analysis

Tools to prepare

Data Source:Application of treasure

Development environment: Win10, PYTHon3.7

Development tools: PyCharm, Chrome

Project idea analysis

Define the data to be collected:

  • Download address of app
  • Number of app downloads
  • The name of the app
  • The company that developed the app

Extract the category tag to the page

Get the href attribute of the A tag

Used to concatenate dynamic addresses later



Find dynamically loaded APP data loading address



The value of the URL is the value of each category tag

Sj.qq.com/myapp/cate/…Concatenate the new URL value to send the request

Simple source code analysis

Import Requests # Toolkit sends network requests from LXML import etree # Convert to objects import CSV # Process table data URL = "https://sj.qq.com/myapp/category.htm?orgame=1" response = requests.get(url) html_data = etree.HTML(response.text) li_list = html_data.xpath('//ul[@data-modname="cates"][position()>1]/a/@href') del(li_list[-1]) for url1 in li_list: for i in range(10): new_url = "https://sj.qq.com/myapp/cate/appList.htm" + url1 + "&pageSize=20&pageContext={}".format(i*20) res = Request.get(new_url).json() if res["count"] == 0: break with open(" app.csv ", "a", newline="", encoding=" UTF-8 ")as f: csv_data = csv.DictWriter(f, fieldnames=["appName", 'authorName', "apkUrl"]) for info in res["obj"]: appName = info['appName'] authorName = info['authorName'] apkUrl = info['apkUrl'] print({"appName": appName, "authorName": authorName, "apkUrl": apkUrl}) csv_data.writerow({"appName": appName, "authorName": authorName, "apkUrl": apkUrl})Copy the code