This is the 13th day of my participation in the August More Text Challenge. For details, see:August is more challenging

This article will talk about APP packet capture environment construction. App packet capture should be a topic that every crawler engineer cannot avoid. I have written before about the automatic participation in the “Lucky Draw Assistant” draw. At that time, THE packet capture tool was Charles, and you can scroll down the previous article if you need it.

Principle of App packet capture

  1. The client sends an HTTPS request to the server
  2. The packet capture tool intercepts the client’s request and sends the request to the server disguised as the client
  3. The server returns the CA certificate of the server to the client, which is actually a packet capture tool
  4. The packet capture tool intercepts the server response, gets the server certificate public key, and makes its own certificate.

Replace the server certificate and send it to the client. After receiving the certificate from the server (actually the packet capture tool), the client generates a symmetric key, encrypts it with the public key of the packet capture tool, and sends it to the server (the packet capture tool) 6. The packet capture tool intercepts the response from the client, decrypts the symmetric key with its own private key, encrypts it with the server certificate public key, and sends it to the server. 7. The server decrypts the symmetric key with its private key and sends a response to the client. The packet capture tool intercepts the response from the server, replaces it with its own certificate, and sends it to the client

The essence of the crawler is to “cheat” the server, all kinds of anti-crawl means is to enhance the trust process, constantly let the server believe that you are one of their own if you are anti-crawl, is your deception is not fine, was found.

Is App data easy to catch?

App data can be easy or difficult. At present, I still stay at the simple part. Here’s a brief description of what I understand (if wrong):

Simple:

App data is easier to capture than web data, which is basically HTTP or HTTPS protocol, and the returned data format is relatively neat, mostly in JSON format

Difficulties:

1. Need to decompile knowledge, need to analyze encryption algorithm

2. Need to shell + decompile

3. Need to crack all kinds of signatures, certificates.

So a reptile engineer gradually needs to master the following skills:

Java Programming Basics

Android Programming Basics

App reverse

App hulling

Break the encryption

.

From beginner to full stack

Common packet capture tools

fiddler

mitmproxy

Charles

Fiddler setup and use

Download: telerik-fiddler.s3.amazonaws.com/fiddler/Fid…

Installation: One Next

Main interface introduction:

Session list interface:

In the monitoring panel, after clicking a request in the session list interface, the following two interfaces will appear:

Request panel:

Response Panel:

The local CA certificate is installed
  • Click tools-options-https – check Capture HTTPS CONNECTs and Decrypt HTTPS trafic. A message will be displayed asking you to install the certificate. One point is or determine the installation can be.
  • To restart Fiddler, click the Actions on the right to see a drop-down menu. Click Export Root Certificate to Desktop. The Certificate will be generated on the Desktop, named fiddlerroot.cer
  • Click on the install
The configuration required to connect the mobile phone to capture packets

Set the PC fiddler first:

Then, the mobile phone needs to access the IP address of the local host + the configured port, and install the certificate. After the certificate is successfully installed, packets can be captured.

That’s all you need to do to get Fiddler up and running, and there’s a lot more to go on.

Mitmproxy installation and use

Mitmproxy is slightly different in Linux and Windows versions.

PIP install MitmProxy

Microsoft Visual C++ V14.0 Microsoft Visual C++ V14.0 Microsoft Visual C++ V14.0

Mitmproxy has three components:

Mitmproxy – Packet capture component in Linux

Mitmdump – Python interaction

Mitmweb – Visual interface tool under Windows

On Windows, only the latter two components are supported.

Certificate of configuration

You can see the following files in the installation directory:

The file name File info
mitmproxy-ca.pem Certificate private key in PEM format
mitmproxy-ca-cert.pem PEM certificate, applicable to most non-Windows platforms
mitmproxy-ca-cert.p12 The certificate is in PKCS12 format, applicable to Windows platform
mitmproxy-ca-cert.cer It is the same as mitmproxy-ca-cert.pem, but the suffix has been changed to apply to some Android platforms
mitmproxy-dhparam.pem PEM secret key file, used to enhance SSL security
Window Installation Certificate

Double-click mitmproxy-ca.p12 and confirm all the way to the end. A warning will pop up and you can just click “Confirm”.

Mac Installation Certificate

On Mac, double-click mitmproxy-ca-cert.pem. The keystring management page is displayed, locate the MitmProxy certificate, open the setting option, and select Always Trust

Android/iPhone installation certificate

Method 1: Send mitmproxy-ca-cert.pem to mobile phone and click to install it. For iPhone, click to install the description file.

Method 2: Start mitmProxy in Linux, run the mitmproxy -p 8889 command, and set the mobile proxy to the IP address and port of Linux. Then access mitm.it and install the certificate.

Simple to use

Example for using the MitmProxy filtering function:

Type Z to clear all the packages on the screen type F to go to edit mode, you can edit conditions at the bottom,ESC or Enter to exit editing! (~c200) # display all requests that do not return 200! (~c200) & ~d baidu.com # display domain contains baidu.com, return a request other than 200
~m post & ~u baidu The link to display the request contains the Post request of Baidu~d baidu.com (http://baidu.com) Filters all packages whose domain name contains baidu.com (http://baidu.com)Copy the code

Examples of using the MitmProxy breakpoint function:

Enter I to Enter the editing mode. You can modify the criteria at the bottom. The criteria for exit the editing breakpoint are the same as those for filtering links Press Enter to Enter the details page. In the details page, Enter E to Enter the mode. After all data modification is complete, return to the request display list and Enter A to release the request.1.Request the replay2.Select the request to be played back and type R to either play back the request or edit it and then play back3.Enter Q to exit the programCopy the code

Mitmproxy is often used with Appium:

First we need to write a packet capture script like this:

import json
def response(flow) :
    if 'aweme/v1/user/follower/list/' in flow.request.url:
        for user in json.loads(flow.response.text)['followers']:
            info = {}
            nfo['share_id'] = user['uid']
            info['_id'] = user['short_id']
            save_task(info)
Copy the code

Note: The method name here must use Response

After writing the packet capture script, run the mitmdump -p [port] -s [script file] command to start the file, and then use the appium automatic script to implement automatic packet capture.

The use of Charles

I have written about the actual combat before, you can directly see the following article.

Above is a summary of my interview, I hope to help you ~