Topic describes

Compile JSX into AST, name a bit spooky… But only the basic function is required, requires

ABC

or deep HTML into the abstract syntax tree, we define the AST data structure as

// Contains open and close tags. Close tags. Text Type children
const types = {
    openingElement: 'JSXOpeningElement'.closingElement: 'JSXClosingElement'.element: 'JSXElement'.textElement: 'JSXText',
    children
};
Copy the code

Input:

<div id="test" class="test1"><div><div>1-1</div><div>1-2</div></div><div>2</div><div>3</div></div>

The output is as follows: Topic link

Their thinking

The problem can be summarized as follows:

  1. Input a JSX output as an AST like JSON (HTML to tree), for example

    1

    AST, The input JSX output consists of openingElement and closingElement Children

  2. Chilren consists of JSXText and JSXElement, each of which represents a type, as in the problem

const types = {
    openingElement: 'JSXOpeningElement'.closingElement: 'JSXClosingElement'.element: 'JSXElement'.textElement: 'JSXText'};Copy the code
  • JSXTextNo open and close tags and subsets, output{type: JSXText, value: 1}
  • JSXElementThere are open and closed labels or there are subsets, output{type: JSXElement, openingElement: {type: JSXOpeningElement, tagName: div}, closingElement: {type: JSXClosingElement, tagName: div}, children: {... Repeat the previous process}}

In AstExplorer, you can see the output data structure is the best way to parse the string is the state machine. In order to save code, I do not consider the case of tag newline. Here is a simple example:

<div>
    <div>
        <div>1-1</div>
        <div>1 to 2</div>
    </div>
    <div>2</div>
    <div>3</div>
</div>
Copy the code

Parsing process from left to right:

  • found<Parsing begins, while appears>To get the start tag<div>
  • And found that<Start, while appears>To get the start tag<div>
  • And found that<Start, while appears>To get the start tag<div>
  • Found not<The beginning represents JSXText, while appears again<, get the string1-1
  • .

You can find several states < start tag > end tag >< end tag, start tag >1 End tag, start string 1< end string, Tag start

The desired result is {type: 'JSXOpeningElement'.name: 'div' } * 3
{ type: 'JSXClosingElement'.name: 'div' }
{ type: 'JSXOpeningElement'.name: 'div' }
{ type: 'JSXClosingElement'.name: 'div' } * 2
{ type: 'JSXOpeningElement'.name: 'div' }
{ type: 'JSXClosingElement'.name: 'div' }
{ type: 'JSXOpeningElement'.name: 'div' }
{ type: 'JSXClosingElement'.name: 'div' } * 2Since the core idea of the state machine is to parse STR [index] and generate a new index, so every time you enter the function, start from index to find the label -> label endreturn
Copy the code
/ / pseudo code
function tagParser() {
    while(index < str.length) { startIndex = index; . => index++if (str[index] === '>') {
            endIndex = index;
            break;
        }
    }
    name = str.slice(startIndex, endIndex);
    return { type: 'JSXOpeningElement or JSXClosingElement'.name: 'tagName' };
}
Copy the code

Full code comment on:

while (index < str.length) {
    if (str[index] === '<') { tagParser(index); }}// Set tagParser to STR [index] === '<'
function tagParser() {
    index++;
    let [startIndex, endIndex] = [null.null]; // Define the start and end ranges
    let isCloseTag = false; // Close or close the label

    // each STR character needs to be parsed. The index may be 5, 10, 15, 20, but all within str.length
    while (index < str.length) {
        // Find the start position of the open and close label => startIndex
        if(! startIndex && str[index] ! = =' '&& str[index] ! = ='/') {
            startIndex = index;
        }

        // Find that '/' represents the beginning of the closing tag, be sure to add a judgment condition that is not the opening and closing tag, may/appear in the previous JSXText
        if(! startIndex && str[index] ==='/') {
            isCloseTag = true;
        }

        // Find the closing tag as the closing tag
        // startIndex && STR [index] === "" => The opening and closing label && Is a space.
        Div id="test"> 
      
// If the tag is >, it must end if ((startIndex && str[index] === ' ') || str[index] === '>') { endIndex = index; break; } // if startIndex is present and endIndex is found, there will be no further evaluation, just go index++ index++; } // get startIndex and endIndex // Now we want to output the label information and where index goes const tag = { type: isClosing ? types.closingElement : types.openingElement, name: str.slice(startIndex, endIndex).trim(), }; // in order to be compatible with the entire JSX, it does not end with a closing tag, possibly with Spaces and so on, endIndex++ while(isClosing && endIndex < str.length && str[endIndex] ! = ='>') { endIndex++; } return tag; } Copy the code

We’ve done the tag extraction, parsing every index that passes through the tag, so we want to parse the attributes inside the tag as well, for example:

{
  type: 'JSXOpeningElement'.name: 'div'.attributes: {'class': 'test1'.'id': 'test'} Div (id="test", class="test1")
}
Copy the code

Tags are closed and open. Tags are closed and open. Functions to parse attributes are kept inside tagParser.

// Open and close tags and it is not the 
      
case that triggers attribute parsing
if(! isClosing && str[endIndex] ! = ='>') { attrParser(); } Copy the code
function attrParser() {
    let endIndex = null;
    while (index < str.length) {
        // Each iteration checks to see if it is finished.
        if (str[index] === '>') {
            endIndex = index;
            break;
        }

        index++;
    }

    let attributes = {};

    // There must be an end tag, which means there must be an endIndex
    if(endIndex ! =null) {
        Attr id="test" class="test1"
        // We just need to parse these strings into the data structure we want
        const attrStr = str.slice(index, endIndex).trim();

        attributes = attrStr.split(/\s/).reduce((memo, item) = > {
            const [key, value] = item.split('=');
            memo[key] = value;
            return memo;
        }, {});
    }

    return attributes;
}
Copy the code

In addition to opening and closing tags,

1-1
1-2

2
3

But with tags and attributes we can use the state machine to parse strings so it should be a small case

if{jsxTextParser(); jsxTextParser(); }function jsxTextParser() {
    const startIndex = index;
    while(index < str.length && str[index] ! = ='<') {
        index++;
    }

    return { type: 'JSXText'.value: str.slice(startIndex, index) };
};
Copy the code

< span style = “max-width: 100%; clear: both; min-width: 1pt;

{ type: '开'.name: 'div' } * 3
{ type: 'closed'.name: 'div' }
{ type: '开'.name: 'div' }
{ type: 'closed'.name: 'div' } * 2
{ type: '开'.name: 'div' }
{ type: 'closed'.name: 'div' }
{ type: '开'.name: 'div' }
{ type: 'closed'.name: 'div' } * 2
Copy the code

A1, B1 (C1, C2) (D1, D2) B2, A2 Maybe it’s better for us to find the pattern. Let’s look from left to right. First we open A, open B, open C, closed C, and now we have our first closed C, which needs to be combined with open C to minimize child:

{
    openingElement: {type: 'Open and close label'.name: 'div'},
    closeingElement: {type: 'Close tag'.name: 'div'},
    children: [{type: 'JSXText'.value: 'C tag copy '}}]Copy the code

Then open D close D, you can find the rule, each closed tag will be paired with an open and close tag, the idea into code, let open and close complete the pairing

const arr = [];
arr.push(Open A ' ');
arr.push('open B');
arr.push('open C');

constOpen closed C = {close: 'closed C'.open: arr.pop(), type: 'JSXElement'}; Open and close arr. Push (C);// arr=[open A open B open C]

arr.push('open D');

constOpen closed D = {close: 'D closed'.open: arr.pop(), type: 'JSXElement'}; Open and close arr. Push (D);// arr=[open A open B open C open D]

const children = [];
children.push(arr.pop()); / / D
children.push(arr.pop()); / / C
constOpen closed B = {close:'B' closed.open: arr.pop(),
    children,
    type: 'JSXElement'}; Open and close arr. Push (B);// arr=[open A open B]

const children = [];
children.push(arr.pop()); / / B
constOpen close A = {close: 'A' closed.open: arr.pop();
    children,
    type: 'JSXElement'}; Arr. Push (open and close A);// arr=[open closed A]

if (arr.length === 1) return arr[0]; / / open and close A
Copy the code

At this point, the core idea is pretty much there. Go to the code directly. I will fill in the comments of the core function Parser in the code

AC code

To save space, tagParser attrParser jsxTextParser will not be posted:

function parse(str) {
    // Define stack as arR above, for open and close label caching
    let stack = [];
    let index = 0;

    // State machine routine: Ensure that every character is parsed
    while (index < str.length) {
        // Start parsing tags
        if (str.charAt[index] === '<') {
            // This will output all the tag information
            const tag = parseTag();

            // As mentioned above, when a closed tag is found, it will match its open and close tags to fill in the text
            if (tag.type === types.closingElement) {
                let children = [];

                
      
C
D C D
// Open closed D open closed C text 1 text 2... Until we find open B while (stack.length && stack[stack.length - 1].type ! == types.openingElement) { children.push(stack.pop()); }// This is the core part. After each reorganization, it needs to be pushed to the stack so that it can be paired with the next closed tag check stack.push({ openingElement: stack.pop(), children, closingElement: elem, type: types.element, }); } // Open/close tag push else{ stack.push(elem); }}// If is finished, it means the tag is finished, either the tag or the text else { consttext = parseText(); stack.push(text); }}return stack[0]; } Copy the code

conclusion

For me, this question once again consolidated the use of state machine. State machine was also used in the second handwritten Json. parse. At the end of that article, I summarized several application scenarios of state machine, including AST boundary value judgment and so on. In the process of reading questions, we often encounter problems using state machines to solve strings. Similar to implementing indexOf to parse HTTP request information, these are relatively simple and pure cases of state machines. It also involves combinatorial algorithms and ast, which I find to be quite difficult, so it’s quite challenging.

This article is participating in the “Nuggets 2021 Spring Recruitment Campaign”, click to see the details of the campaign