Application treasure APP data collection

    • Tools to prepare
    • Project idea analysis
    • Simple source code analysis

Tools to prepare

Data Source:Application of treasure

Development environment: Win10, PYTHon3.7

Development tools: PyCharm, Chrome

Project idea analysis

Define the data to be collected:

  • Download address of app
  • Number of app downloads
  • The name of the app
  • The company that developed the app

Extract the category tag to the page

Get the href attribute of the A tag

Used to concatenate dynamic addresses later

Find dynamically loaded APP data loading address

The value of the URL is the value of each category tag…Concatenate the new URL value to send the request

Simple source code analysis

Import Requests # Toolkit sends network requests from LXML import etree # Convert to objects import CSV # Process table data URL = "" response = requests.get(url) html_data = etree.HTML(response.text) li_list = html_data.xpath('//ul[@data-modname="cates"][position()>1]/a/@href') del(li_list[-1]) for url1 in li_list: for i in range(10): new_url = "" + url1 + "&pageSize=20&pageContext={}".format(i*20) res = Request.get(new_url).json() if res["count"] == 0: break with open(" app.csv ", "a", newline="", encoding=" UTF-8 ")as f: csv_data = csv.DictWriter(f, fieldnames=["appName", 'authorName', "apkUrl"]) for info in res["obj"]: appName = info['appName'] authorName = info['authorName'] apkUrl = info['apkUrl'] print({"appName": appName, "authorName": authorName, "apkUrl": apkUrl}) csv_data.writerow({"appName": appName, "authorName": authorName, "apkUrl": apkUrl})Copy the code