The introduction

“Code analysis transformation” is a relatively small skill tree in front-end development. Ali Mom front-end technology team (MUX) where I work also encountered the problem of converting code in batches in the process of migrating architecture of a large number of businesses, so we conducted some research on the principles and tools. Recently, we found that a number of articles discussing this issue in the community have also received attention, so we would like to share more of our experience here. In fact, the AST analysis process is inseparable from every development student’s work, from a small ESLint syntax check to a large framework upgrade, are involved in this. Simple, individual conversions can be identified by human eyes and modified manually, and batch simple conversions can be regular matching and string substitution, but for more complex conversions, AST based conversions are the most efficient solution.

AST based code analysis transformation

Overview of AST


Abstract Syntax Tree (AST) is a tree-like representation of the Syntax structure of a programming language. Each node in the Tree represents a structure in the source code. JavaScript engines start by parsing code into an AST. Tools like Babel, ESLint, Prettier, and so on are all based on AST.



Parsing source code into AST consists of two steps, lexical analysis and syntax analysis



1. Lexical analysis, the process of converting character sequences into word (Token) sequences.



2. Grammar analysis, combining Token sequences into various grammatical phrases, such as Program, Statement, Expression, etc



Analyze the advantages of transformation based on AST

AST will ignore the code style and parsing the code for the purest syntax tree, therefore, based on the AST transformation is more accurate and precise, and use regular expressions to analyze the transformation code, unable to effectively analyze the context of a piece of code, even for a simple rule matches all need to consider too much and boundary condition, compatible with all kinds of code style. For a simple example, converting all variables defined by var to let can be done easily based on AST, while using regex requires a variety of considerations and written expressions are difficult to debug and read.

AST based transformation process

  1. Parse source code through lexical analysis, parsing parsing into AST
  2. Transform Analyzes and transforms the AST
  3. Generate outputs the final AST as code



1 and 3 of them have mature tools in the community that can be used. Step 2 requires developers to operate the AST themselves. There are popular tools in the community, but there are problems.

Current situation and problems of community popular programs

Popular schemes in the community include Babel, Jscodeshift, Esprima, Recast, Acorn, estraverse, etc. This paper selects the most representative Babel and Jscodeshift to analyze.

Analysis of Babel plug-in scheme

Without Babel there would be no JS community thriving on language specifications today, and Babel/Parser is a very good parser. Many developers rely on the Babel plugin for code analysis and transformation, but I personally see several problems with the way Babel is currently written 1. 2. The logic of matching and generating nodes is complex and the code is large 3. Code readability is bad for maintenance. Specifically:

1. Difficulty in getting started and high learning cost

You need to understand the AST specification, the types and properties of AST nodes before you start developing the Babel plug-in. Refer to babel-types and Babel node type, 200 + node types. The configuration of babelrc and the way the Babel Plugin is written are basic, in addition to the concepts of visitor, scope, state, Excit, Enter, babel-types, babel-traverse, Builder, etc.

2. The logic of matching and constructing nodes is complicated and the amount of code is large

Matching nodes requires layer by layer comparison of node types and attributes, which can be even more complicated if context information needs to be determined. The construction of nodes is also strictly typed and structured. It takes a lot of time to operate the AST without focusing on the core logic of analysis and transformation.

  • Match self.doedit (‘price’)(this, ‘100’) with Babel as follows
MemberExpression(path) {
  if (path.node.object.name == 'self' && path.node.property.name == 'doEdit') {
    const firstCallExpression = path.findParent(path= > path.isCallExpression());
    if(! firstCallExpression) {return;
    }
    if(! firstCallExpression.node.arguments[0]) {
      return;
    }
    let secondCallExpression = null
    if (firstCallExpression.node.arguments[0].type == 'StringLiteral'
        && firstCallExpression.node.arguments[0].value == 'price') {
      secondCallExpression = firstCallExpression.findParent(
        path= > path.isCallExpression()
      )
    }
    if(! secondCallExpression) {return;
    }
    if(secondCallExpression.node.arguments.length ! =2
        || secondCallExpression.node.arguments[0].type ! ='ThisExpression') {
      return;
    }
    const pId = secondCallExpression.node.arguments[0].value; }}Copy the code
  • Use Babel to construct ‘var varName = require(“moduleName”)’
types.variableDeclaration('var', [
  types.variableDeclarator(
    //t.variableDeclarator(id, init)
    / / id is the identifier
    Init must be an Expression
    types.identifier('varName'),
    //t.callExpression(callee, arguments)
    types.callExpression(
      types.identifier('require'),
      [types.stringLiteral('moduleName'))));Copy the code

3. Poor readability of the code is not conducive to maintenance

Looking at the above two examples, you can see that not only is the code large and not very readable, but even if you are very familiar with AST and Babel, you need to carefully understand each sentence.

Jscodeshift analysis

Jscodeshift’s advantage over Babel is that it is easier to match nodes and easier to use chain operations. Match self.doedit (‘price’)(this, ‘100’) as follows

const callExpressions = root.find(j.CallExpression, {
  callee: {
    callee: {
      object: {
        name: 'self'
      },
      property: {
        name: 'doEdit'}},arguments: [{
      value: 'price'}},arguments: [{
    type: 'ThisExpression'
  }, {
    value: '100'}]})Copy the code

The transformation and construction of nodes are written in a similar way as Babel and will not be repeated. You can see that jscodeshift doesn’t quite solve the three problems mentioned above either. So building on our valuable community experience, we developed a new tool called GoGoCode. The goal is to enable developers to complete code analysis transformation with maximum efficiency and minimum cost.

Another solution is GoGoCode

An overview of the

GoGoCode is a tool to manipulate AST. It can lower the threshold of using AST and help developers to focus on the development of code analysis transformation logic. Simple replacements don’t even need to learn the AST, and more complex analysis transformations can be done once you have learned the AST node structure (see THE AST viewer).

thought

GoGoCode borrows ideas from JQuery, and our mission is to make code conversion as easy as using JQuery. JQuery greatly facilitates the efficiency of DOM operation on the basis of native JS. There is no complicated configuration process, it can be used for use, and there are many excellent design ideas worth learning: $() instantiation, selector idea, chain operation, etc. In addition, we applied the simple idea of replace to AST, and the effect was also very good.

$() instantiates the method

With $(), both source code and AST nodes can be instantiated as AST objects, which can chain-call any function mounted on the instance

$(code: string)

$('var a = 1')

$(node: ASTNode)

$({ type: 'Identifier'.name: 'a' }).generate()
Copy the code

Code selector

DOM trees and AST trees are both tree structures. JQuery can match nodes with a variety of selectors. Can THE AST also match real nodes with simple selectors? So we’ve defined code selectors, so whatever code you’re looking for, you can match it directly through the code selectors, right

$(code).find('import a from "./a"')

$(code).find('function a(b, c) {}')

$(code).find('if (a && sth) { }')
Copy the code

If the code you want to match contains an indeterminate part

Replace the uncertainty with the wildcard, which is represented by $_$. I wish you all the best for Halloween

$(code).find('import $_$ from "./a"')

$(code).find('function $_$(b, c) {}')

$(code).find('if ($_$ && sth) { }')
Copy the code

The chain operation

Most of the APIS provided by GoGoCode can be chain-called, making the code more concise and elegant. It is more convenient for us to apply multiple conversion rules to the whole code

$(sourceCode)
  .replace('const $_$1 = require($_$2)'.'import $_$1 from $_$2')
  .find('console.log()')
	.remove()
	.root()
	.generate()
Copy the code

Method overload:.attr()

Node attributes can be obtained and modified, which is much friendlier than manual traversal, layer upon layer judgment to operate attributes and nodes

$(code).attr('id.name')  // Returns the value of the name attribute in the id attribute of this node

$(code).attr('declarations.0.id.name'.'c') // Change the value of the name attribute
Copy the code

Simply replace

Simpler, more powerful, and better to use than replacing with a re. $_$n is similar to the capture group in the re, and $$$is similar to the REST parameter

$(code).replace('{ text: $_$1, value: $_$2, $$$ }'.'{ name: $_$1, id: $_$2, $$$ }')

$(code).replace(`import { $$$ } from "@alifd/next"`.`import { $$$ } from "antd"`)

$(code).replace(`<View $$$1>$$$2</View>`.`<div $$$1>$$$2</div>`)

$(code).replace(`Page({ $$$1 })`.`Page({ init() { this.data = {} }, $$$1 })`
)
Copy the code

Core API

Based on the API Get node API Operation node
The $() .find() .attr()
$.loadFile .parent() .replace()
.generate() .parents() .replaceBy()
.siblings() .after()
.next() .before()
.nextAll() .append()
.prev() .prepend()
.prevAll() .empty()
.root() .remove()
.eq() .clone()
.each()

Compare with community popular programs

In the previous example, to match self.doedit (‘price’)(this, ‘100’), use GoGoCode as follows

$(code).find(`self.doEdit('price')(this, '100')`)
Copy the code

Construct ‘var varName = require(“moduleName”)’ using GoGoCode as follows

$('var varName = require("moduleName")')
Copy the code

As a complete example, compare GoGoCode with the Babel plug-in: For this code, we want to do different things with different console.logs

  1. willconsole.logDelete the call of
  2. console.log()When used as the initial value of a variablevoid 0 
  3. console.logAs the initial value of a variable to an empty method

The code transformation results in the following:



The code implemented using GoGoCode is as follows:

$(code)
  .replace(`var $_$ = console.log()`.`var $_$ = void 0`)
  .replace(`var $_$ = console.log`.`var $_$ = function(){}`)
  .find(`console.log()`)
  .remove()
  .generate();
Copy the code

The core code implemented using Babel is as follows:

/ / code source: https://zhuanlan.zhihu.com/p/32189701
module.exports = function({ types: t }) {
return {
    name: "transform-remove-console".visitor: {
    CallExpression(path, state) {
        const callee = path.get("callee");

        if(! callee.isMemberExpression())return;

        if (isIncludedConsole(callee, state.opts.exclude)) {
        // console.log()
        if (path.parentPath.isExpressionStatement()) {
            path.remove();
        } else {
        //var a = console.log()path.replaceWith(createVoid0()); }}else if (isIncludedConsoleBind(callee, state.opts.exclude)) {
        // console.log.bind()path.replaceWith(createNoop()); }},MemberExpression: {
        exit(path, state) {
        if( isIncludedConsole(path, state.opts.exclude) && ! path.parentPath.isMemberExpression() ) {//console.log = func
            if (
            path.parentPath.isAssignmentExpression() &&
            path.parentKey === "left"
            ) {
            path.parentPath.get("right").replaceWith(createNoop());
            } else {
            //var a = console.logpath.replaceWith(createNoop()); }}}}}};Copy the code

IsIncludedConsole, isIncludedConsoleBind, createNoop and other methods need to be developed and introduced

As you can see, GoGoCode has the following advantages over community tools:

  1. Easy to get started: No need to know all the AST node specifications, no need to know the various stages of traversing and accessing the AST, no need for additional tools, just read a simple GoGoCode document. GoGoCode is the only AST processing tool for developers other than AST constructs.
  2. Very little code: Allows you to focus on the core logic of analysis and transformation and not spend a lot of time on AST operations. Matching, modifying, and constructing nodes are simple and can be done in a few lines of code.
  3. Readable: By contrast, gogoCode-based code is intuitive, easy to understand, and easier to maintain over time.
  4. Flexibility: GoGoCode can do between hift and Babel faster than it can with jscodeshift. In addition to JS, GoGoCode also supports HTML processing and vUE processing, which other popular tools in the community do not have.

Use effect

Based on the early version of GoGoCode, we developed the upgrade suite of mother’s self-developed framework Magix, which includes the conversion of 78 simple rules and 30 complex rules, and automatically converts Magix1 code (left) to Magix3 code (right), improving the efficiency of the framework upgrade



One of the 20 or so lines of conversion logic we tried to write in Babel took nearly 200 lines of code to complete.



As the saying goes, sharpener is not mistaken woodcutter, here to write automatic conversion rules is sharpener, the implementation of conversion is woodcutter. If the sharpening time is close to the time of cutting wood directly, people will choose to give up sharpening knives. Transcoding is often a solution to a specific problem within our team or system, and in most cases it is even a one-off (it is not possible to spread the cost of plug-in development by applying a common set of rules on a large scale as in ES6 to ES5), which requires us to sharpen our knives efficiently.



Recently, we tried to convert alipay small program code into PC framework code. Students who don’t know much about AST in the team can quickly learn to use it within one hour. 80% JS logic conversion is completed in less than 200 lines of code. It can be seen that the reduction in difficulty, efficiency and code volume are significant.

conclusion

GoGoCode has advantages in code size, readability, and flexibility, and we will continue to hone them to enhance tool robustness and ease of use. We hope that everyone can understand and manipulate the abstract syntax tree through GoGoCode, so as to complete the code analysis and conversion logic, better control of the code, and achieve a multi-terminal, smoother framework upgrade…… At the same time, I hope that more students can participate in the relevant field at the lowest cost and contribute their own strength, so as to provide a better solution to the industry ecology. In addition to the syntax checks, multi-coding, and framework upgrades mentioned earlier, there are many scenarios to analyze and transform code

  • Analyze the association of pages or views with asynchronous requests
  • Analyzing module complexity
  • Analyzing module dependencies
  • Clean up useless code
  • Automatic generation of buried code
  • Single test pile file generation
  • Automatically fix code problems
  • .

If you need to analyze and transform code, and if you want to quickly implement requirements that Babel doesn’t already have, you’re welcome to use and co-create GoGoCode. If you use GoGoCode is inconvenient to solve or make mistakes, we hope you can mention to us

QQ group: 735216094 Nail group: 34266233

Github:github.com/thx/gogocod… IO Playground: Play.gogocode. IO / Ali’s mother out of the new tool, to batch modify the project code to reduce the pain of “GoGoCode field” learn 30 AST code replacement tips