First of all, why do we need to know AST
As we all know, JS is the most difficult in anti-crawler. In order to crack JS reverse, it is necessary to master AST grammar. At present, some front-end plug-ins or tools commonly used, such as javascript translation, code compression, CSS preprocessor, ELint, Pretiier and other functions are implemented on the basis of AST.
Javascript compiler execution process
The first step of JS execution is to read the character stream in the JS file, then generate the token through lexical analysis, then generate the AST (Abstract Syntax Tree) through syntactic analysis, and finally generate the machine code execution.
Syntax analysis
Syntax analysis is also called scanner. In simple terms, it calls the next () method to read characters one by one, and then compares them with the defined Javascript key characters to generate the corresponding Token. The smallest indivisible unit, such as the three characters var, can only be considered as a whole and can no longer be semantically decomposed, so it is a Token. In lexical analysis, each keyword is a Token, each identifier is a Token, each operator is a Token, and each punctuation mark is a Token. In addition, comments and whitespace characters (newlines, Spaces, tabs, and so on) in the source program are filtered out. Eventually, the entire code is split into a list of tokens (or a one-dimensional array).
n * n;
[
{ type: { ... }, value: "n", loc: { ... } },
{ type: { ... }, value: "*", loc: { ... } },
{ type: { ... }, value: "n", loc: { ... } },
...
]
Copy the code
Each type has a set of attributes that describe the token:
{
type: {
label: 'name',
keyword: undefined,
beforeExpr: false,
startsExpr: true,
rightAssociative: false,
isLoop: false,
isAssign: false,
prefix: false,
postfix: false,
binop: null,
updateContext: null
},
...
}
Copy the code
Syntax analysis
The syntax analysis will transform the tokens from lexical analysis into an abstract syntax tree structure with grammatical meaning. Also, validate the syntax and throw syntax errors if there are any.
What is an AST (Abstract Syntax Tree)
Abstract Syntax Tree (AST), or Syntax Tree for short, is an Abstract representation of the syntactic structure of source code. It represents the syntactic structure of a programming language as a tree, with each node in the tree representing a structure in the source code.
function square(n) {
return n * n;
}
Copy the code
This code forms a tree:
FunctionDeclaration:
id:
Identifier:
name: square
params [1]
Identifier
name: n
body:
BlockStatement
body [1]
ReturnStatement
argument
BinaryExpression
operator: *
left
Identifier
name: n
right
Identifier
name: n
Copy the code
Or a JavaScript Object like this:
{
type: "FunctionDeclaration",
id: {
type: "Identifier",
name: "square"
},
params: [{
type: "Identifier",
name: "n"
}],
body: {
type: "BlockStatement",
body: [{
type: "ReturnStatement",
argument: {
type: "BinaryExpression",
operator: "*",
left: {
type: "Identifier",
name: "n"
},
right: {
type: "Identifier",
name: "n"
}
}
}]
}
}
Copy the code
You’ll notice that each layer of the AST has the same structure:
{ type: "FunctionDeclaration", id: {... }, params: [...] , body: {... } } { type: "Identifier", name: ... } { type: "BinaryExpression", operator: ... , left: {... }, right: {... }}Copy the code
Each of these layers is also called a Node. An AST can consist of a single node or hundreds or thousands of nodes. Together, they describe program syntax for static analysis.
Each node has the following Interface:
interface Node {
type: string;
}
Copy the code
The type field is a string representing the type of the node (for example, “FunctionDeclaration”, “Identifier”, or “BinaryExpression”). Each type of node defines additional attributes that further describe the node type.
AST Node Introduction
ldentifier
Identifiers are the names we define when we write JS, such as variable names, function names, and attribute names, all belong to identifiers. The corresponding interface looks like this:
interface Identifier <: Expression, Pattern {
type: "Identifier";
name: string;
}
Copy the code
An identifier may be an expression or a deconstruction pattern (deconstruction syntax in ES6). We will see Expression and Pattern later.
Literal
Literals, not [] or {}, but literals that semantically represent a value, such as 1, “hello”, true, and regular expressions (with an extended Node to represent regular expressions) such as /\d? /. Let’s look at the definition of the document:
interface Literal <: Expression {
type: "Literal";
value: string | boolean | null | number | RegExp;
}
Copy the code
RegExpLiteral
This for regular literal, in order to better to parse the regular expression content, add one more regex fields, which will include regular itself, as well as the regular flags.
interface RegExpLiteral <: Literal {
regex: {
pattern: string;
flags: string;
};
}
Copy the code
Programs
This is usually used as a trailing node, which represents a complete program code tree.
interface Program <: Node {
type: "Program";
body: [ Statement ];
}
Copy the code
The body property is an array containing multiple Statement nodes.
Functions
Function declaration or function expression node.
interface Function <: Node {
id: Identifier | null;
params: [ Pattern ];
body: BlockStatement;
}
Copy the code
The id is the function name, and the params property is an array representing the parameters of the function. Body is a block statement. It’s worth noting that you won’t find the type: “Function” node during testing, but you will find the type: “FunctionDeclaration” and the type: “FunctionExpression”, because functions appear either as declarations or function expressions, which are combination types of node types. FunctionDeclaration and FunctionExpression will be mentioned later. This gives the impression that the document is well laid out, with function names, arguments, and function blocks being part of the function, while declarations or expressions have their own needs.
Statement
A statement node is nothing special; it is just a node, a distinction, but there are many kinds of statements, which are described below.
interface Statement <: Node { }ExpressionStatement
Copy the code
ExpressionStatement
Expression statement nodes, where a = a+ 1 or a++ have an expression property that refers to an expression node object (we’ll talk about expressions later).
interface ExpressionStatement <: Statement {
type: "ExpressionStatement";
expression: Expression;
}
Copy the code
BlockStatement
Block statement nodes, for example: if (…) {// Here is the contents of a block}, a block can contain multiple other statements, so there is a body attribute, which is an array representing multiple statements in the block.
interface BlockStatement <: Statement {
type: "BlockStatement";
body: [ Statement ];
}
Copy the code
EmptyStatement
An empty statement node that does not execute any useful code, such as a separate semicolon;
interface EmptyStatement <: Statement {
type: "EmptyStatement";
}
Copy the code
DebuggerStatement
Debugger, that’s what it means. Nothing else.
interface DebuggerStatement <: Statement {
type: "DebuggerStatement";
}
Copy the code
WithStatement
The with statement node contains two special attributes. Object represents the object (which can be an expression) to be used with, and body represents the statement to be executed after with, which is usually a block statement.
interface WithStatement <: Statement {
type: "WithStatement";
object: Expression;
body: Statement;
}
Copy the code
Here is the control flow statement:
ReturnStatement
Returns the statement node. The argument property is an expression that represents what is returned.
interface ReturnStatement <: Statement {
type: "ReturnStatement";
argument: Expression | null;
}
Copy the code
LabeledStatement
The label statement, for example:
loop: for(let i = 0; i < len; i++) { // ... for (let j = 0; j < min; j++) { // ... break loop; }}Copy the code
We can use a break loop in the loop nesting to specify which loop to break. So the label statement refers to loop:… This one. A label statement node has two attributes, a label attribute indicating the name of the label and a body attribute pointing to the corresponding statement, usually a loop or switch statement.
interface LabeledStatement <: Statement {
type: "LabeledStatement";
label: Identifier;
body: Statement;
}
Copy the code
BreakStatement
The break statement node has a label attribute indicating the desired label name, or null when no label is needed (which is usually not required).
interface BreakStatement <: Statement { type: "BreakStatement"; label: Identifier | null; }Copy the code
ContinueStatement
A continue statement node, similar to a break.
interface ContinueStatement <: Statement {
type: "ContinueStatement";
label: Identifier | null;
}
Copy the code
Here are the conditional statements:
IfStatement
If statement nodes, typically, have three attributes, the test attribute representing if (…). Expressions in parentheses. Possession property is an execution statement that represents a condition true, which is usually a block statement. The alternate property is used to represent an else statement node, usually a block statement, but also an if statement node, such as if (a) {//… } else if (b) { // … }. Alternate can of course be null.
interface IfStatement <: Statement {
type: "IfStatement";
test: Expression;
consequent: Statement;
alternate: Statement | null;
}
Copy the code
SwitchStatement
A Switch statement node has two attributes. The discriminant attribute indicates the discriminant expression immediately following a switch statement, which is usually a variable. The Cases attribute is an array of case nodes, which represents each case statement.
interface SwitchStatement <: Statement {
type: "SwitchStatement";
discriminant: Expression;
cases: [ SwitchCase ];
}
Copy the code
SwitchCase
Case node of the switch. The test attribute represents the judgment expression for the case, and aggressively is the execution statement for the case. When the test property is null, it represents the default case node.
interface SwitchCase <: Node {
type: "SwitchCase";
test: Expression | null;
consequent: [ Statement ];
}
Copy the code
Here are the exception related statements:
ThrowStatement
The argument property is used to indicate the expression immediately following the throw.
interface ThrowStatement <: Statement {
type: "ThrowStatement";
argument: Expression;
}
Copy the code
TryStatement
A try statement node whose block property represents the execution statement of a try, usually a block statement. The Hanlder attribute refers to the catch node. Finalizer refers to the finally statement node. If the Hanlder is null, Finalizer must be a block statement node.
interface TryStatement <: Statement {
type: "TryStatement";
block: BlockStatement;
handler: CatchClause | null;
finalizer: BlockStatement | null;
}
Copy the code
CatchClause
The catch node, param, represents the argument after the catch, and body represents the execution statement after the catch, usually a block statement.
interface CatchClause <: Node {
type: "CatchClause";
param: Pattern;
body: BlockStatement;
}
Copy the code
Here are the loop statements:
WhileStatement
The while statement node, where test represents the expression in parentheses, and body represents the statement to loop through.
interface WhileStatement <: Statement {
type: "WhileStatement";
test: Expression;
body: Statement;
}
Copy the code
Do /while statement node, similar to the while statement.
interface DoWhileStatement <: Statement {
type: "DoWhileStatement";
body: Statement;
test: Expression;
}
Copy the code
ForStatement
The for loop node, init/test/update, represents the three expressions in the parentheses of the for statement, the initialization value, the loop judgment condition, and the variable update statement (init can be a variable declaration or expression) executed each time the loop executes. All three attributes can be null, for(;;) {}. The body attribute is used to indicate the statement to loop through.
interface ForStatement <: Statement {
type: "ForStatement";
init: VariableDeclaration | Expression | null;
test: Expression | null;
update: Expression | null;
body: Statement;
}
Copy the code
ForInStatement
For /in statement nodes, with the left and right attributes representing statements around the IN keyword (the left side can be a variable declaration or expression). The body is still the statement to loop through.
interface ForInStatement <: Statement {
type: "ForInStatement";
left: VariableDeclaration | Pattern;
right: Expression;
body: Statement;
}
Copy the code
Declarations
Declaration statement nodes, which are also statements, are just refinements of a type. The various declaration statement types are described below.
interface Declaration <: Statement { }
Copy the code
FunctionDeclaration
Function declarations, unlike Function declarations above, cannot have id null.
interface FunctionDeclaration <: Function, Declaration {
type: "FunctionDeclaration";
id: Identifier;
}
Copy the code
VariableDeclaration
Variable declarations. The kind attribute indicates what type of declaration it is, since ES6 introduced const/let. Declarations represent multiple descriptions of declarations, since we can do this: let a = 1, b = 2; .
interface VariableDeclaration <: Declaration {
type: "VariableDeclaration";
declarations: [ VariableDeclarator ];
kind: "var";
}
Copy the code
VariableDeclarator
Description of a variable declaration, where id represents the variable name node and init represents an expression for the initial value, which can be null.
interface VariableDeclarator <: Node {
type: "VariableDeclarator";
id: Pattern;
init: Expression | null;
}
Copy the code
Expressions
Expression node.
interface Expression <: Node { }
Copy the code
ThisExpression
According to this.
interface ThisExpression <: Expression {
type: "ThisExpression";
}
Copy the code
ArrayExpression
The elements property is an array representing multiple elements of the array, each of which is an expression node.
interface ArrayExpression <: Expression {
type: "ArrayExpression";
elements: [ Expression | null ];
}
Copy the code
ObjectExpression
Object expression node. The property property is an array representing each key-value pair of the object. Each element is an attribute node.
interface ObjectExpression <: Expression {
type: "ObjectExpression";
properties: [ Property ];
}
Copy the code
Property
Property node in an object expression. Key represents a key, value represents a value, and since ES5 syntax has get/set, there is a kind attribute that indicates a normal initialization, or get/set.
interface Property <: Node {
type: "Property";
key: Literal | Identifier;
value: Expression;
kind: "init" | "get" | "set";
}
Copy the code
FunctionExpression
Function expression node.
interface FunctionExpression <: Function, Expression {
type: "FunctionExpression";
}
Copy the code
Below is the expression section related to unary operators
UnaryExpression
Unary expression nodes (++/– is the update operator, not in this category), operator represents the operator, and prefix indicates whether or not it is a prefix operator. Argument is the expression to perform the operation.
interface UnaryExpression <: Expression {
type: "UnaryExpression";
operator: UnaryOperator;
prefix: boolean;
argument: Expression;
}
Copy the code
UnaryOperator
Unary operator, enumeration type, all values as follows:
enum UnaryOperator {
"-" | "+" | "!" | "~" | "typeof" | "void" | "delete"
}
Copy the code
UpdateExpression
The update expression node, ++/–, is similar to the unary operator, except that the type of node object that operator points to is the update operator.
interface UpdateExpression <: Expression {
type: "UpdateExpression";
operator: UpdateOperator;
argument: Expression;
prefix: boolean;
}
Copy the code
UpdateOperator
The update operator, with a value of ++ or –, is used with the prefix attribute of the UPDATE expression node to indicate before and after.
enum UpdateOperator {
"++" | "--"
}
Copy the code
Here is the part of the expression associated with binary operators:
BinaryExpression
Binary operation expression node, left and right represent two expressions left and right of the operator, and operator represents a binary operator.
interface BinaryExpression <: Expression {
type: "BinaryExpression";
operator: BinaryOperator;
left: Expression;
right: Expression;
}
Copy the code
BinaryOperator
Binary operator, all values are as follows:
enum BinaryOperator { "==" | "! = "|" = = = "|"! ==" | "<" | "<=" | ">" | ">=" | "<<" | ">>" | ">>>" | "+" | "-" | "*" | "/" | "%" | "|" | "^" | "&" | "in" | "instanceof" }Copy the code
AssignmentExpression
Assignment expression node, the operator property represents an assignment operator, left and right are expressions around the assignment operator.
interface AssignmentExpression <: Expression {
type: "AssignmentExpression";
operator: AssignmentOperator;
left: Pattern | Expression;
right: Expression;
}
Copy the code
AssignmentOperator
Assignment operator, all values as follows :(not many commonly used)
enum AssignmentOperator {
"=" | "+=" | "-=" | "*=" | "/=" | "%="
| "<<=" | ">>=" | ">>>="
| "|=" | "^=" | "&="
}
Copy the code
LogicalExpression
A logical operation expression node, and an assignment or binary operation type, except that operator is a logical operator type.
interface LogicalExpression <: Expression {
type: "LogicalExpression";
operator: LogicalOperator;
left: Expression;
right: Expression;
}
Copy the code
LogicalOperator
Logical operator, two values, namely and or.
enum LogicalOperator {
"||" | "&&"
}
Copy the code
MemberExpression
A member expression node is a statement that refers to an object member, object is an expression node that refers to an object, property is an attribute name, computed, if false, means. The property should be an Identifier node, or [] if the computed property is true, that is, the property is an Expression node whose name is the resulting value of the Expression.
interface MemberExpression <: Expression, Pattern {
type: "MemberExpression";
object: Expression;
property: Expression;
computed: boolean;
}
Copy the code
Here are some other expressions:
ConditionalExpression
Conditional expressions, often called ternary operands, Boolean? True, false. Attribute reference condition statement.
interface ConditionalExpression <: Expression {
type: "ConditionalExpression";
test: Expression;
alternate: Expression;
consequent: Expression;
}
Copy the code
CallExpression
Function call expressions that represent statements of type func(1, 2). Arguments is an array, and the element is an expression node, representing the function argument list.
interface CallExpression <: Expression {
type: "CallExpression";
callee: Expression;
arguments: [ Expression ];
}
Copy the code
NewExpression
New expressions
interface NewExpression <: CallExpression {
type: "NewExpression";
}
Copy the code
SequenceExpression
This is the expression (the exact name is unknown) constructed by the comma operator, and the expressions attribute is an array representing the multiple expressions that make up the entire expression, separated by commas.
interface SequenceExpression <: Expression {
type: "SequenceExpression";
expressions: [ Expression ];
}
Copy the code
Xiaobian himself is a Python development engineer. I spent three days to organize a set of Python learning tutorials from the most basic Python scripts to Web development, crawlers, data analysis, data visualization, machine learning, etc. These materials have the desired friendsClick on theCan receive