We have covered much of the case of reptiles. But almost all web crawlers. Even sites that can only be accessed on a mobile phone can be accessed via the Mobile simulation feature of Chrome Developer Tools to analyze requests and crawl them. (For example, 3 minutes to decipher the circle of friends test small game used in the article)
But some apps, such as Tiktok, which went viral this year, don’t offer web services at all. (Some online tutorials also use web-based mobile simulations, but this method is no longer available.)
In this case, can we grab? How do I grab it? Let’s share it today.
Mobile phone caught
The focus of this article is on how to get requests from mobile apps.
Unlike web pages on computers, mobile apps can not directly view relevant information through the browser, and it is not convenient to use tools to debug traffic at the same time on mobile devices. So the common way is to install some “packet capture” software on the computer to display all the Internet requests on the phone.
Then why can a computer see Internet requests on a phone? Here comes the concept of agency. In our previous article, I heard that you wrote a crawler, only to be shut down after catching a few? We also talked about agents. The visual interpretation is literal: all requests you make are no longer sent directly to the destination, but are sent first to the agent, who then sends them for you. So through the proxy, you can realize hidden IP, into the private network, turn… Cough cough that what and so on function, also include what we say today: mobile phone capture bag.
By the way, don’t connect to uncertain free Wi-Fi in public places — they can, in theory, grab your bag.
In this case, we’re going to use a Fiddler. It is a mature free package capture tool. It can capture web pages, desktop software, and mobile App network requests, and can run on Windows, Mac, Linux platforms, supporting iOS and Android. (Although both are supported, I strongly recommend Windows + Android, which I will tease later)
Last week we received many students’ projects and codes in the book sending activity, among which
@ the islandsThe student submitted a tutorial on how to grab a Fiddler phone bag.
Segmentfault.com/a/119000001…
Some of the content and images in this article have been adapted from her article. There are also many articles and study notes on her blog, which can be followed and exchanged. Other students are also welcome to contribute to us.
Download and install
You can easily find fiddler’s website at www.telerik.com/fiddler (fill out the form below).
Windows is successfully installed after being downloaded. If you’re on a Mac, you’ll be prompted with installation steps that tell you to install a framework called Mono so you can execute Fiddller.exe. There are also a few bugs in the Mac version:
1. Run the mono command sudo
Mono –arch=32 Fiddler. Exe (this parameter must also be placed in front of the file name)
3. The first time it works correctly, the program will be stuck for so long that I think it will still hang. Please be patient. (I might have given up if I hadn’t gone away and come back and found success)
4. Even if it works properly, the Mac interface will display various bugs, so don’t switch programs with popovers open, or you won’t find popovers when you come back…
5. Software cannot be copied……
6. You can’t grab HTTPS requests on iOS, you need to create an additional certificate, but this certificate tool can only run on Windows…
So if you can, do it on Windows. Mac also has a relatively well-known tool Charles, useful can leave a comment below.
configuration
After the tool is installed, you need to perform necessary configurations to capture packets.
1. Fiddler configuration
To allow capturing HTTPS packets. Open the downloaded Fiddler, go to Tools – > Options, and select Decrpt HTTPS Traffic under the HTTPS toolbar. Select Ignore Server Certificate Errors under the new option bar. That way, Fiddler grabs HTTPS packets.
Set to allow external devices to send HTTP/HTTPS to Fiddler. Set the port number and select Allow Remote Computers to connect under the Connections TAB.
After the configuration, restart the software.
2. Set the mobile phone proxy
Before capturing a bag, make sure your computer and phone are on a local area network that can communicate with each other. The simplest case is that they are all connected to the same wifi. Special cases will not be discussed here (some commercial wifi cannot communicate with each other).
Open the software and place the mouse on Online in the upper right corner to see the IP address of the local computer. You can also run the ipconfig command (ifconfig for Mac/Linux) on the command line. (Screenshots are for demonstration only, subject to your own IP)
Set the proxy IP address for the mobile phone. Open the mobile phone wireless network connection, select the network connection already connected, click a small circle exclamation mark to enter, you can see the picture below (similar to Android), select configuration agent, enter the IP address just entered, the port is set in Fiddler 8888.
3. Install the certificate
A certificate must be verified to obtain an HTTPS request. PC access: http://localhost:8888/ for installation.
The mobile phone accesses the PC’s IP address and port 8888, for example, http://192.168.23.1:8888
Some Androids require you to manually enter and import the certificate from Settings, otherwise it won’t take effect.
4. Test
When Fiddler is turned on, open any APP of the mobile phone and you can access it normally and see the network request sent in Fiddler.
If you have access but do not see the request, verify that the proxy is in effect. If no, check whether all the certificates have been downloaded and verified. If it still fails, follow the above steps to configure it again.
Analysis of the request
Once you’ve done that, you’re not much different from a web crawler. It’s just a matter of finding the ones we need from those requests.
Fiddler records all the requests. It’s a lot. Before operating the App, remember to clear the existing requests for easy observation. Then with the filter filter, define the filter rules, it will be easier to find the content you need. Once you find the request, view the information you want in the application, or right-click and choose Export the request.
After operation and observation, it can be found that the request to obtain the list of videos uploaded by the user is
https://api.amemv.com/aweme/v1/aweme/post/?...Copy the code
You can view the request’s detailed parameter information from the WebForms column. The return value is a set of JSON data containing the download address of the video.
This is a work of experience, different websites/apps, the rules are different, but the routines are similar. Web crawler is not familiar with the words, first look at the previous article crawler necessary tools, master it to solve half the problem.
The code to grab
After getting the address, after a bit of experimentation in the browser and code, I found the correct way to unlock this request:
1. Provide the following parameters: max_CURSOR =0&user_id=94763945245&count= 20&AID =1128, where user_id is the ID of the user to capture. Other parameters can be fixed.
{‘user-agent’: ‘mobile’}
Request code:
import requests as rs
uid = 94763945245
url = 'https://api.amemv.com/aweme/v1/aweme/post/?max_cursor=0&user_id=%d&count=20&aid=1128' % uid
h = {'user-agent': 'mobile'}
req = rs.get(url, headers=h, verify=False)
data = req.json()
print(data)
Copy the code
Replace the uid with the user ID you want to capture. There is a simple way to get the user ID: select Share on the user page, send the link to wechat, and you can see the user_id when you open the page.
Extract the list of videos and download:
import urllib.request
for video in data['aweme_list']:
name = video['desc'] or video['aweme_id']
url_v = video['video']['download_addr']['url_list'][0]
print(name, url_v, '\n')
urllib.request.urlretrieve(url_v, name + '.mp4')
Copy the code
This method is valid until the Fourth of July holiday and can be simulated using Chrome Developer Tools. There’s no guarantee of how long it’s going to last. Crawler code isn’t going to last forever.
To sum up, the focus is on fiddler fetching, the key is configuration, proxy, certificate, the difficulty is the analysis of the request. The final code consists of two simple steps: get the video list and download the video.
This is my Python communication learning skirt: 862672474. If there is any confusion in learning, please come to me at any time. In learning professional knowledge, remember to communicate with experienced people more, so that you can take many detours and look at the detours that others have taken and save yourself a lot of time.