This is the 10th day of my participation in the August More Text Challenge. For details, see:August is more challenging

Today’s web site

aHR0cHM6Ly93d3cuemRheWUuY29tL0ZyZWVJUExpc3QuaHRtbA==

This site comes from the technology exchange group of salted fish

Packet capture analysis and encryption location

This website is the free agent page of an agent. What we want to achieve is the extraction of free agents on this page.

So what are the anti-crawl measures?

It can be seen that IP is not fully displayed in response. The last bit of IP shows wait. A result can be obtained first by checking CSS and font files

To determine if this is a js retrocrawl, we can disable the js file of the current site and see if the display of the page changes.

When javascript is disabled on the page, you can see that the results displayed on the page can not show the full IP address, so you can judge that the IP display on the page is controlled by JS

Now that we need to locate the JS logic, let’s see if retrieving wait can find the relevant content

I’m just going to talk about the results, because it’s a lot of work to retrieve, 90 matches are not very relevant, so let’s look at some other methods first, and then come back to the analysis if they don’t work.

From the above diagram, it is very special to see that the replacement position has the value of V, but after the above retrieval, you can give up if you want to find the encryption by this method.

So go back to the packet capture interface and try again.

In the packet capture interface, careful analysis can still see a related item, this request link is very strange.

This long list of links is suspicious, but the request returns a value without much valuable information, so take a look at what the value is used for

Since it’s an XHR request, it’s convenient to just use the XHR breakpoint

After the break point is typed, something interesting happens

Two stacks up, you can see that the return value needs to be operated on

If you look at it on the web, you can see that the logic here is a little bit more obvious, that you’re directly manipulating the document

The code is tested in the console to get the value with wait

After the entire function is run, you can directly get the full IP

Now you just need to find this set of string replacement logic to get the full IP

Encryption analysis

How is the request made? Is the domain fixed in the previous part of the figure

The link in the latter section is spliced with two parameters

What is not known here are the parameters mk, showm, ak

The mk is a fixed value = xxxmxxxxxxxxxm398mxxx1m402 here

This value is later found to be the encrypted request IP

The AK here is dynamic, so you also need to build a request dynamic fetch

Take a look at the showm encryption, select can jump

This is simple and can be run directly in the Node environment

The result of the local operation is as follows

So you get the link to the request and you get the value returned by the request and you plug it into the operation

Once you have that value, you can see what you’re going to do next

Next, you can see that the out() method gets the length of the current IP list

function out() {
    var myTb = document.getElementById("ipc");
    if (myTb) {
        return myTb.getElementsByTagName("tr").length - 1
    } else
        return 0
}
Copy the code

Out1 () takes out the value of V in each line of IP, which is consistent with the fact that we want to start from V to find the encryption logic in the packet capture part.

This value, along with the value obtained by the build request above, is substituted into the DSFGSD operation to obtain the restored IP

So the correct order of requests is as follows

Ask base.js to get mk and ak values. Construct the following request and get the returned values

2, request the free agent page, get the IP with wait character and the corresponding V

3. Pass each value returned in v and 1 into DSFGSD for calculation and replace the result with the correct IP

Although it is a free IP proxy, but the station side can be said to be very careful

Well, that’s all for today. See you next time