Author: Ali0th

Date : 2018-03-19

@[TOC]

0.1. Introduction

This is the article I posted on The Prophet, so I’ll move it over today. At the beginning of the crawler through the analysis of JS form to crack, feel a little laborious, now generally do not do so, so this article is probably good to read.

0.2. Level 1: Get javascript content

At first, the browser can’t see the anti-crawler JS code.

Difficult points:

When I first opened the site, it was hard to notice a jump to page 521. Because this page only appears once an hour and for 1500 milliseconds.

Skills:

Block cookies to get javascript

Concrete implementation:

Delete cookies and block

View the source code

Level 2: Get the real code

Difficult points:

By hiding the real code, it discourages those who want to decode it straight away.

Skills:

Find the final output point as eval

Concrete implementation:

1) Beautify using jsbeautifier.org/

2) Understand the code

Level 3: Fix the code

Difficult points:

It would be a mistake to use the method in the second level if you saw eval above, where eval is in the middle of the code and is just used for concatenation. And there’s a hole that’s intentionally left, which is to make the program run wrong, so you have to understand the code, understand the logic.

Skills:

Put it in the IDE and comb through the code. (Ecilpse is used here)

Concrete implementation:

1) Paste the code into ecilpse

Use Eclipse to build a JS test page, refresh the browser at any time to see the results.

2) Clarify the overall logic of the code and remove unimportant code

This is the big logic of the whole code, so we can cut out the unimportant code and keep the important ones and debug them in Eclipse.

The code is:

The final DC output is what we want.

3) Output DC

Write console.log(dc) at the end. Then put it in F12:

See, that’s what we got.

In Python, however, this process sometimes works and sometimes doesn’t. Why is that? We’re going to keep digging.

Level 4: JSFUCK

Difficult points:

Special characters

Skills:

careful

Specific analysis:

CD arrays. Yeah, we’re gonna do CD arrays. It seems to be JSFUCK, which is often encountered in CTF. But how do you break him down here? See this place:

To take a closer look at: f.r everse () [[- ~ []]] [I] (CD), reverse () is the reverse of meaning, can remove, leaves: f [[- ~ []]] [I] (CD)

[] this is in the form of f (), is the f function calls the above form, then the most strange point here [] in [[]] or, is it directly! So I try to look directly at what the value is:

It’s just an array of values one. Look at the value of CD:

Yeah, a bunch of arrays. In Python, there are several layers of arrays:

Why is that? What features of JS cause this? Want to half a day, think of a lot of years ago to see an article [www.freebuf.com/sectool/535…

Well, after this pass, basically a turning point, almost to the end.

0.6. Level 5: List of lists

Difficult points:

F function logic.

Skills:

Understand f function logic, slowly debugging to know the function of each part.

Specific analysis:

In the last level we saw that there are also arrays inside arrays, so we need to look at what values each array corresponds to. After debugging, it is found that there are at most three layers of arrays. We order layers X, Y, and Z respectively. Among them:

The X layer is a string, just concatenate it.

Layer Y1 contains ASCII characters.

Layers Y2 and Z are the NTH character corresponding to the current URL. That’s why level 3 above was not a success.

To be more specific about the Z layer, the following function gets the url of the current page, so every site is different. So this layer personally feels like a pretty good design.

For example, if we are accessing http://localhost/, we will filter out localhost/, and if the z-layer array in the CD is 4, we will get a character. So eventually, the CD array will be converted at each level to:

And then concatenating dc is the final result we want la la la la: