In my book Python3 anti-crawler principle and bypass actual combat, I gave such views as “crawler and anti-crawler are the application of comprehensive technology” and “technology progresses in confrontation”. With the passage of time, the popularization and progress of technology, Web application side adds more and more restrictions to crawler, among which the most significant effect is code confusion.

Simple encryption algorithms or custom character processing functions can no longer meet the defense needs, and Web applications are turning to code obfuscation technology. Code obfuscation has several advantages:

  • Low barriers to operation, readily available and free obfuscation products;
  • The confusion effect is good, after the confusion is really do not even know themselves;
  • The browser can parse the obfuscated code normally, and a small obfuscate of less than 10,000 lines has little impact on performance.
  • The performance impact of obfuscation can be reduced by other optimizations without panic.

Encryption algorithm and string processing function with code confusion, defense straight up. As a simple example, a simple character handling function looks like this:

There are three functions: stringArray returns an array object containing characters, mergeArray concatenates the elements of the array object into a string and returns it, main calls stringArray and mergeArray and prints the resulting string, The output comment below is the run result.

Can a reptile engineer not understand such a clear function call?

Let’s see what happens when the above three functions are confused:

The same functionality, the same output, but the code is completely different and unreadable. If you take out the comments below, you have no idea what’s going on or what the output will be, which is the defense code obturation brings to Web applications.

As a reptile engineer, you now have two options:

  1. Force the associated function calls through an entry function until the call chain is full and the correct output is obtained.
  2. Untangle some of the confusion, untangle the logic from the jumble of code, and either implement it in another language or go back to the first step, depending on the complexity.

The first method is what crawler engineers call “hard clasp”. If you have cross-file function calls and long and complex call chains, the “hard clasp” can really lose hair.

The second method has a slightly higher technical threshold, requiring crawler engineers to understand AST theory and learn to write reductive code, so as to reduce the difficulty and cost of reading the code logic or sorting out the call chain.

What is AST?

Here is baidu Baike’s explanation of AST:

In computer science, an Abstract Syntax Tree (AST), or Syntax Tree for short, is an Abstract representation of the syntactic structure of source code. It represents the syntactic structure of a programming language as a tree, with each node in the tree representing a structure in the source code. The syntax is “abstract” because it does not represent every detail that occurs in real grammar. For example, nested brackets are implicit in the structure of the tree and are not represented as nodes; Conditional jump statements such as if-conditional-then can be represented by nodes with two branches.

Well, this looks a little convoluted, but I’m going to do it with an example. Examples of JavaScript variable declaration and assignment are as follows:

var nick = "vansenb";
Copy the code

This line of code is parsed into a long syntax tree, which can be viewed in the AST Explorer. Here is the mapping between JavaScript statements and syntax trees:

The image is a little blurry, please go to AST Explorer to see the structure clearly.

What is AST useful for?

The syntax tree above shows the program body, declaration type, identifier, literal and other information, from which we can conclude:

  • Var VariableDeclarator variable declaration;
  • Nick-identifier;
  • Vansenb-literal;

From a human reading perspective, this line of code declares a variable named Nick with a value of vansenb.

If you want to change this line of code, change it to:

var nick = "James";
Copy the code

Simply change the value of the value attribute under Literal in the syntax tree, and the semantics of the code become: a variable named Nick with a value of James is declared. With this in mind, we can focus on code obfuscation and undoing.

When you use one of those one-click obfuscation/undo tools, do you just paste code into the input field and click the obfuscation button to get the obfuscated code? And the code with the same structure will have the same structure after being obfuscated, right?

This shows that the one-click obfuscation/restore tool achieves obfuscation/restore effects by changing the abstract syntax tree of the source code, such as adding or removing nodes before and after a node in the tree, or converting a single function that can directly output results into multiple functions that call each other when obfuscation occurs.

Common JavaScript AST parsing libraries

Syntax trees are not unique to JavaScript; almost all programming languages have syntax trees, such as Golang, Python, and Java. The grammar tree of JavaScript appears frequently, which is caused by the difference of JavaScript skipping grammar and the compatibility that has to be considered. ES5 and ES6 skipping grammar will need to be converted in practical application, which makes the grammar tree can play a role in practical scenarios.

The syntax tree acts like A adapter, converting code representation A to representation B

The AST parsing libraries commonly used in JavaScript include Babel, Esprima, Espree, and Acorn. Engineers can choose their own libraries according to their preferences and styles.

These libraries are often used by front-end developers to write code conversion tools or code obfuscating tools, or even to compile React and Vue engineering code into JavaScript code that browsers can run, while crawler engineers are most likely to use them to help reverse their JavaScript code.

AST Node type noun base

Syntax trees take a while to learn (maybe a month or two), but if you are interested in them, you can read the following practical articles to see how they are used:

AST Restore obfuscator confusion

Operation AST to restore obtruded code Basics series 3: Hexadecimal string restore

Restoring obfuscated code with the AST: Making code analysis so easy

AST Combat: automatic decryption of obfuscator obfuscator obfuscator obfuscator obfuscator obfuscator obfuscator obfuscator

Working with AST to restore confounding code Lesson 9: Restoring simple CallExpression types

The common AST parsing libraries are listed above. Although the structures derived from parsing the same code are not entirely consistent, the nouns used to denote node types are almost the same. For example, VariableDeclaration means that it is a VariableDeclaration, and CallExpression means that it is a CallExpression.

Mastering the nouns of node types helps us to have a clearer understanding of the function and intention of nodes when reading the structure of syntax trees. It can also be said that node nouns are the only way for us to become master of code obfuscation or master of code reverse. It is very important!

Let’s take a look at the common node type nouns in the AST using the code in the figure below.

The code above contains statements commonly used in JavaScript syntax, such as variable declarations, function declarations, teradata expressions, if control flow statements, switch control flow statements, function calls, assignment statements, array declarations, for loops, etc.

Copy the above code to AST Explorer to get the syntax tree. According to the code on the left and the syntax tree on the right, we can count the syntax tree node names and specific descriptions as shown in the following table:

The serial number Type original name Chinese name describe
1 Program application The body of the entire code
2 VariableDeclaration Variable declarations Declare a variable, such as var let const
3 FunctionDeclaration Function declaration Declare a function, such as function
4 ExpressionStatement Expression statement Usually a function is called, such as console.log()
5 BlockStatement Block statements Code wrapped in {} blocks, such as if (condition){var a = 1; }
6 BreakStatement Break statement Usually refers to break
7 ContinueStatement Last statement Usually refers to the continue
8 ReturnStatement Return statement Usually refers to the return
9 SwitchStatement A Switch statement Usually refers to the Switch in a Switch Case statement
10 IfStatement If control flow statement Control flow statements, usually if(condition){}else{}
11 Identifier identifier Identifies, for example, identi in var identi = 5 when declaring a variable
12 CallExpression Call expression Usually refers to calling a function, such as console.log()
13 BinaryExpression Binary expression Usually refers to an operation, such as 1+2
14 MemberExpression Member expression Usually refers to the member of the calling object, such as the log member of the Console object
15 ArrayExpression Array expression Usually an array, such as [1, 3, 5]
16 NewExpression New expressions This usually means using the New keyword
17 AssignmentExpression Assignment expression Usually refers to assigning the return value of a function to a variable
18 UpdateExpression Update expression Usually refers to updating member values, such as i++
19 Literal literal literal
20 BooleanLiteral Boolean literals Boolean values, such as true false
21 NumericLiteral Numeric literals A number, such as 100
22 StringLiteral Character literals A string, for example, vansenb
23 SwitchCase A Case statement Usually refers to a Case in a Switch statement

This is just the usual section, more node type nouns can be added when you need them. I will continue to update the relevant information, interested friends can go to the night team GitHub warehouse github.com/NightTeam/J… Look at it.

With these nouns, it is easy to read the syntax tree structure. If (condition){} if (condition){} if (condition){} if (condition){}

More information about AST theory and practice can be found at NightTeam or team Warehouse github.com/NightTeam.