With the gradual expansion of the market share of mobile Internet, mobile apps have occupied our lives. In the past, data analysis was carried out by crawling web data. However, emerging products only have APPS, and there is no web terminal. This chapter takes Bean and Fruit food APP as an example to demonstrate how to extract data from mobile phones.
Install the Fiddler
The Fiddler’s official website to download address: http://www.fiddler2.com/fiddler2/, I was in the baidu search the download version of the directly
The installation process is the next step, the next step is finally completed, after the installation needs to configure some content
To allow capturing HTTPS packets
Open the downloaded Fiddler, go to Tools -> Options, and check Decrpt HTTPS Traffic in the HTTPS toolbar. In the new option box, check Ignore Server Certificate Errors. That way, Fiddler grabs HTTPS packets
Set to allow external devices to send HTTP/HTTPS to Fiddler
Under the Connections TAB, check Allow Remote Computers to Connect
Connect your phone to your computer
A major difficulty in capturing data on mobile APP is that you do not know the interface address of their data request. If you want to capture data of a website on PC, you only need to visit the URL and use the packet capture tool to know. Therefore, the first step is to configure the environment. Addresses accessed from the phone (sending any Internet request) can be retrieved from Fiddler on the computer.
Step 1: Ensure the connection between the phone and the computer. Here is the network cable connected to the computer. I installed a Wi-Fi sharing wizard separately, and the phone (iphone6s) connected to the shared wifi
Step 2: View the IP address of the PC. Open CMD on the PC and enter ipconfig to view the IP address
Note here that the IP address is a wireless connection IP address, not a local connection IP address (pit)
Step 3: Mobile phone set HTTP proxy Open mobile phone wireless network connection, select the network connection already connected, click a small circle exclamation mark to enter, you can see the picture below, select Configure proxy, enter the IP address just entered, the port is set in Fiddler 8888.
Step 4: Install the certificate on the mobile phone and PC. Access the PC: http://localhost:8888/
Mobile phone access computer IP address add port 8888 can, MY address here is: http://192.168.23.1:8888
The last step is to test. Open any APP on the mobile phone to access the contents. Then open Fiddler and you can see the network request
Analyze the mobile APP request address
By observing the fiddler requests can be found in http://api.douguo.net/personalized/home/0/20, this is the request of the home page section data, direct copy the address to the web page you can see the returned JSON data
In fact, this is the most important part of the most difficult part, the time to test your working life, to isolate the correct API requests, and analyze the data structure in the API to prepare for the subsequent data analysis.
Python3.x crawler retrieves data
Urllib. request is a direct request, and no framework is used here. The code is as follows:
Import urllib.request # send a request to the specified URL. And returns the server response class file object response = urllib. Request. Urlopen (" http://api.douguo.net/personalized/home/0/20 ") # The # read() method reads the entire contents of a file, Print (html.decode("unicode_escape"))Copy the code
Running the code results in the following printed data
Whether to store or analyze the data is the follow-up operation. By now, we have completed the steps of extracting data from the mobile phone APP