To write a WXML formatter

Those who have done small program (not limited to wechat small program) development should know that each company’s small program will use its own XXML syntax, such as WXML of wechat small program, TTML of Toutiao small program, etc. These applets have basically the same template syntax (” inherited “from VUE), so they can be successfully parsed using the same set of XML parsing rules, from which they can be formatted

The Formatter we developed is based on and modified from the open source library HTMLParser2.

This article USES theXXMLTo collectivelywxml.ttml.swan.axmlA class whose name differs by a few characters but has the same meaningXMLApplets DSL.

htmlparser2

The library is a very lightweight and simple HTML parser that does a very simple job: iterating through incoming strings and calling our onOpenTag hook function when we encounter an OpenTag; When closeTag is encountered, the onCloseTag hook function we passed in is called; When we encounter plain text, we call our onText hook function…

Let’s get started:

Implement the basic Formatter

Determine the data structure

XML node data structure is not much to say, tag name + attribute + child node three pieces

type Attrs = {
  [attrName: string] :any;
};

type Comment = {
  type: string;
  value: string;
};

type Node = {
  name: string;
  attrs: Attrs;
  children: Array<Node | string | Comment>;
  indent: number;
};
Copy the code

Since we need to do formatting, we add an indent property to the Node to hold the indent information

Initialize theparser

import { Parser } from 'htmlparser2';

this.parser = new Parser(
  {
    onopentag: this.onopentag,
    onclosetag: this.onclosetag,
    ontext: this.ontext,
    oncomment: this.oncomment,
  }, { xmlMode: true});Copy the code

XmlMode: true – When enabled, special HTML tags are not processed specially

Because the Parser itself does very simple, stateless things that don’t record relationships between tags, we need a stack to record those relationships

Initialize the stack

type Stack = Node[] & {
  getLast: () = > Node | void;
};

private initStack() {
  this.stack = new Array<Node>() as Stack;
  this.stack.getLast = () = > {
    return this.stack[this.stack.length - 1];
  };
}
Copy the code

To make things easier, stack adds a getLast method to the array that returns the last element.

First, let’s consider how to get a JSON object that describes the XXML code string passed in.

For example the following XXML code:

<view class="container">
  <text style="color:skyblue;">Hello World</text>
</view>
Copy the code

It maps to a JSON:

[{"name": "view"."attrs": {
      "class": "container"
    },
    "children": [{"name": "text"."attrs": {
          "style": "color:skyblue;"
        },
        "children": ["Hello World"]}]}Copy the code

We only implement the JSON generation process in the trunk code, and put the string-level formatting logic in the method starting with handle, so that the code is decouple and the logic is clear

onopentaghook

  private onopentag = (name: string, attrs: Attrs) = > {
    const stack = this.stack;
    const { tabSize } = this.opt;

    const indent = stack.length * tabSize;
    const newNode: Node = { name, attrs, children: [], indent };

    this.handleOpentag(newNode);

    stack.push(newNode);
  };
Copy the code

When OpenTag fires, it indicates that the Parser encountered a new tag, and the tag generates a node, which is pushed onto the stack

onclosetaghook

  private onclosetag = (name: string) = > {
    const { result } = this;
    const stack = this.stack;
    const node = stack.pop();

    if(! node) {throw `Parse error: no open tag for close tag ${name}`;
    } else {
      if(node.name ! == name) {throw `Parse error: close tag does not match open tag: ${name}`;
      } else {
        const lastNode = stack.getLast();

        if (lastNode) {
          // The parent node of the node is found
          lastNode.children.push(node);
        } else {
          // Node is the top-level node
          result.push(node);
        }

        this.handleClosetag(node); }}};Copy the code

Closetag is triggered, indicating that the Parser encountered a tag closure. The current node is pushed first. Then pull out lastNode, the last element in the stack. If lastNode exists, then lastNode is the parent of the current node, inserting the current node into the children of lastNode. If lastNode does not exist, the current node is the top-level element and is inserted directly into result.

ontexthook

  const IGNORE_TAGS = ['text'.'inline-text'];

  private ontext = (text: string) = > {
    const { stack } = this;
    const lastNode = stack.getLast();

    if (lastNode) {
      // Parent node of the text
      if (IGNORE_TAGS.includes(lastNode.name)) {
        lastNode.children.push(text); // Ignore the text in the tag without trim

        this.handleText(text);
      } else {
        const trimedText = text.trim();
        if (trimedText.length > 0) {
          lastNode.children.push(trimedText);
          this.handleText(trimedText); }}}};Copy the code

The most complicated aspect of formatter is that the rules for parsing text within a text tag are uncertain, so it is important to ensure that the text tag in the source code looks as long as it is formatted.

When text is encountered, fetch the last node of the stack. If the node exists, it is the parent of text. If Node is a node that needs to ignore processing internal elements, insert text directly into Node.children. Otherwise, text trim should be used to filter out newlines, Spaces and other characters in the source code and insert text after trim into Node.children.

Oncomment hooks

  private oncomment = (comment: string) = > {
    const { stack } = this;
    const lastNode = stack.getLast();

    if (lastNode) {
      // Annotated parent node
      lastNode.children.push({
        type: 'comment'.value: comment,
      });
    }

    this.handleComment(lastNode, comment);
  };
Copy the code

Annotation processing is very simple, directly determine whether there is a parent element, if there is, insert the children of the parent element.

HandleOpentag method

  private handleOpentag(newNode: Node) {
    const {
      stack,
      opt: { tabSize, maxLength },
    } = this;

    let opentagStr = ' ';

    const attrsTextWithoutBreak = generateAttrsText(newNode, false, tabSize);
    const { name, attrs, indent } = newNode;
    const lastNode = stack.getLast();
    if (lastNode && IGNORE_TAGS.includes(lastNode.name)) {
      opentagStr += ` <${name}${attrsTextWithoutBreak}> `;
    } else {
      opentagStr = generateBlankSpace(indent);

      const opentagLength = getOpentagLength(
        name,
        attrsTextWithoutBreak,
        indent
      );
      if (opentagLength > maxLength) {
        // If the maximum length limit is exceeded, each attR must be newline
        if (Object.keys(attrs).length === 0) {
          opentagStr += ` <${name}> `;
        } else {
          opentagStr += ` <${name}`;
          opentagStr += generateAttrsText(newNode, true, tabSize);
          opentagStr += `${BR}${generateBlankSpace(indent)}> `; }}else {
        opentagStr += ` <${name}${attrsTextWithoutBreak}> `;
      }
      // Non-ignored elements should be wrapped
      if (this.resultStr.length > 0) {
        opentagStr = `${BR}${opentagStr}`; }}this.resultStr += opentagStr;
  }
Copy the code

Opentag consists of the < + attribute string + >. Using the idea of Prettier, wrap each attribute string when length (including indentation) > maxLength is on a single line. Get lastNode, the parent of newNode. If lastNode is ignored, no special processing is required for newNode and simply concatenate the element into a single line according to the basic rules. Otherwise, indent generates the corresponding number of whitespace characters as indent. Then determine if the length of the line is greater than maxLength. If greater, then each property is written to a new line (the second argument to generateAttrsText is whether the property is newline). If it’s not, I’m just going to write it on one line. For a non-ignored element, you need a line break, so you add a line break in front of it.

HandleClosetag method

  private handleClosetag(node: Node) {
    const { name, indent } = node;
    let closetagStr = ' ';

    if (IGNORE_TAGS.includes(name)) {
      // Ignore the element and end in place
      closetagStr = ` < /${name}> `;
    } else {
      if (node.children.length === 0) {
        // No child element, no newline
        closetagStr = ` < /${name}> `;
      } else {
        // There are child elements, newline
        closetagStr = `${BR}${generateBlankSpace(indent)}</${name}> `; }}this.resultStr += closetagStr;
  }
Copy the code

If node ignores elements, it ends. Otherwise, it checks if it has children, and if it does, it wraps.

handleTextmethods

  private handleText(text: string) {
    // text does not break lines
    this.resultStr += text;
  }
Copy the code

Because text only exists inside the ignore tag (it cannot be rendered elsewhere), no processing is required and it is added directly to the resultStr.

HandleComment method

  private handleComment(parentNode: Node | void, comment: string) {
    if (parentNode && IGNORE_TAGS.includes(parentNode.name)) {
      this.resultStr += ` <! --${comment}-- > `;
    } else {
      // comment occupies a single line
      const indent = this.stack.length * this.opt.tabSize;
      let shouldBreak = this.resultStr.length > 0;
      if (shouldBreak) {
        this.resultStr += `${BR}${generateBlankSpace(indent)}<! --${comment}-- > `;
      } else {
        this.resultStr += `${generateBlankSpace(indent)}<! --${comment}-- > `; }}}Copy the code

If the parent node ignores the element, it is spelled directly to resultStr. Otherwise, it’s a line grab, just like openTag, you don’t need a line break if you’re already on the first line.

Handling Mustache Template

At this point, the Formatter can handle most cases. But because XXML has the ability to parse templates, mustache {{}} has a bad case when you consider it:

<text>{{ a<1 ? 1 : 0 }}</text>
Copy the code

When htmlParser2 parses to < in A <1, it thinks it has found an OpenTag named 1. You can’t have an OpenTag with a tag name of 1, so there will be bugs. Similarly, when a < appears in {{}}, there may be a bug.

Modify htmlparser2

Htmlparser2 maintains a state of its own. When a closeTag is parsed, it will be in the Text state, which means that any string encountered will be treated as Text. When the Text parser encounters <, it enters the BeforeTagName state and waits to receive a tagName as openTag. So what we need to do is when the Parser passes {{, the state is not Text, but another custom state, InExpr. Parser under InExpr does nothing with < >. Until}} is encountered, the state is restored.

  _stateText(c: string) {
    if (c === '<') {
      if (this._index > this._sectionStart) {
        this._cbs.ontext(this._getSection());
      }
      this._state = State.BeforeTagName;
      this._sectionStart = this._index;
    } else if (
      this._decodeEntities &&
      this._special === Special.None &&
      c === '&'
    ) {
      if (this._index > this._sectionStart) {
        this._cbs.ontext(this._getSection());
      }
      this._baseState = State.Text;
      this._state = State.BeforeEntity;
      this._sectionStart = this._index;
    } else if (c === '{' && this._buffer.charAt(this._index + 1) = = ='{') {
      this._stateBeforeEnterExpr = this._state;
      this._state = State.InExpr; }}_stateInExpr(c: string) {
    if (c === '} ' && this._buffer.charAt(this._index + 1) = = ='} ') {
      this._state = this._stateBeforeEnterExpr; }}Copy the code

The final step of the optional transformation

At this point, the Formatter can handle almost any scenario. But experienced developers who look at the {{and}} treatment above will immediately recognize a problem: if the source code looks like this

<text>{{'}}<a>'}}</text> 
Copy the code

It was expected to render the 5 characters}} on the page, but because the Parser passed}}, the state returned to Text, and then treated as an OpenTag, causing the bug again.

In this case, the best practice is naturally to store the string}}\ in a variable, and place the variable inside {{}}. But formatter can also be forced to accommodate this scenario.

{{}} is a valid JavaScript expression, so we save the string between {{and}} and use a JavaScript parser like Acorn to parse the string. If it succeeds, it means it is a complete template expression. Before restoring the state to Text

_stateBeforeEnterExpr: State = State.Text; 
_exprStartIndex: number = 0; 
_exprEndIndex: number = 0;

_getExpr() {
  return this._buffer.substring(this._exprStartIndex, this._exprEndIndex); 
}

static checkExpr(expr: string) { 
  try { 
    parse(expr); 
    return true; 
  } catch (err) { 
    return false; }}_stateInExpr(c: string) {
  if (c === '} ' && this._buffer.charAt(this._index + 1) = = ='} ') {
    this._exprEndIndex = this._index;
    const expr = this._getExpr();
    if (Tokenizer.checkExpr(expr)) { 
      this._state = this._stateBeforeEnterExpr; }}}Copy the code

This is not recommended due to the performance cost of JavaScript Parser. So it’s best practice to keep strings in variables and render them.

conclusion

Create an XXML formatter.

parsingxxml, save each node’sThe node name,attribute,The indentation, and parent-child element relationships between nodes
foropentag, joining together<,The node name,attribute,>According to the situation, determine the stitching method of attributes, that is, whether line breaks are needed
forclosetag, joining together<,The node name,/>, determine whether to break lines according to the situation
fortext, do not need to do processing, direct splicing
forcomment, determine whether to break lines according to the situation

Points to note:

text,inline-textThe inner element of the special tag does not require any processing
{{}}An internal expression may exist< >And other special characters need special processing
{{ 和 }}It can also appear in the{{}}In internal expressions, this should use the string variable method, rather than modificationformatterCompatible with

htmlparser2

Implement the basic Formatter

Handling Mustache Template

The final step of the optional transformation

conclusion

Related Posts

Linear lists (sequential lists, linked lists, etc.) and their implementation (C# version)

From handwriting Bundler to analyzing Webpack results

Top 10 Common Node.js developer Mistakes