Topic describes
Compile JSX into AST, name a bit spooky… But only the basic function is required, requires
or deep HTML into the abstract syntax tree, we define the AST data structure as
// Contains open and close tags. Close tags. Text Type children
const types = {
openingElement: 'JSXOpeningElement'.closingElement: 'JSXClosingElement'.element: 'JSXElement'.textElement: 'JSXText',
children
};
Copy the code
Input:
<div id="test" class="test1"><div><div>1-1</div><div>1-2</div></div><div>2</div><div>3</div></div>
The output is as follows: Topic link
Their thinking
The problem can be summarized as follows:
-
Input a JSX output as an AST like JSON (HTML to tree), for example
1AST, The input JSX output consists of openingElement and closingElement Children
-
Chilren consists of JSXText and JSXElement, each of which represents a type, as in the problem
const types = {
openingElement: 'JSXOpeningElement'.closingElement: 'JSXClosingElement'.element: 'JSXElement'.textElement: 'JSXText'};Copy the code
JSXText
No open and close tags and subsets, output{type: JSXText, value: 1}
JSXElement
There are open and closed labels or there are subsets, output{type: JSXElement, openingElement: {type: JSXOpeningElement, tagName: div}, closingElement: {type: JSXClosingElement, tagName: div}, children: {... Repeat the previous process}}
In AstExplorer, you can see the output data structure is the best way to parse the string is the state machine. In order to save code, I do not consider the case of tag newline. Here is a simple example:
<div>
<div>
<div>1-1</div>
<div>1 to 2</div>
</div>
<div>2</div>
<div>3</div>
</div>
Copy the code
Parsing process from left to right:
- found
<
Parsing begins, while appears>
To get the start tag<div>
- And found that
<
Start, while appears>
To get the start tag<div>
- And found that
<
Start, while appears>
To get the start tag<div>
- Found not
<
The beginning represents JSXText, while appears again<
, get the string1-1
- .
You can find several states < start tag > end tag >< end tag, start tag >1 End tag, start string 1< end string, Tag start
The desired result is {type: 'JSXOpeningElement'.name: 'div' } * 3
{ type: 'JSXClosingElement'.name: 'div' }
{ type: 'JSXOpeningElement'.name: 'div' }
{ type: 'JSXClosingElement'.name: 'div' } * 2
{ type: 'JSXOpeningElement'.name: 'div' }
{ type: 'JSXClosingElement'.name: 'div' }
{ type: 'JSXOpeningElement'.name: 'div' }
{ type: 'JSXClosingElement'.name: 'div' } * 2Since the core idea of the state machine is to parse STR [index] and generate a new index, so every time you enter the function, start from index to find the label -> label endreturn
Copy the code
/ / pseudo code
function tagParser() {
while(index < str.length) { startIndex = index; . => index++if (str[index] === '>') {
endIndex = index;
break;
}
}
name = str.slice(startIndex, endIndex);
return { type: 'JSXOpeningElement or JSXClosingElement'.name: 'tagName' };
}
Copy the code
Full code comment on:
while (index < str.length) {
if (str[index] === '<') { tagParser(index); }}// Set tagParser to STR [index] === '<'
function tagParser() {
index++;
let [startIndex, endIndex] = [null.null]; // Define the start and end ranges
let isCloseTag = false; // Close or close the label
// each STR character needs to be parsed. The index may be 5, 10, 15, 20, but all within str.length
while (index < str.length) {
// Find the start position of the open and close label => startIndex
if(! startIndex && str[index] ! = =' '&& str[index] ! = ='/') {
startIndex = index;
}
// Find that '/' represents the beginning of the closing tag, be sure to add a judgment condition that is not the opening and closing tag, may/appear in the previous JSXText
if(! startIndex && str[index] ==='/') {
isCloseTag = true;
}
// Find the closing tag as the closing tag
// startIndex && STR [index] === "" => The opening and closing label && Is a space.
Div id="test">
// If the tag is >, it must end
if ((startIndex && str[index] === ' ') || str[index] === '>') {
endIndex = index;
break;
}
// if startIndex is present and endIndex is found, there will be no further evaluation, just go index++
index++;
}
// get startIndex and endIndex
// Now we want to output the label information and where index goes
const tag = {
type: isClosing ? types.closingElement : types.openingElement,
name: str.slice(startIndex, endIndex).trim(),
};
// in order to be compatible with the entire JSX, it does not end with a closing tag, possibly with Spaces and so on, endIndex++
while(isClosing && endIndex < str.length && str[endIndex] ! = ='>') {
endIndex++;
}
return tag;
}
Copy the code
We’ve done the tag extraction, parsing every index that passes through the tag, so we want to parse the attributes inside the tag as well, for example:
{
type: 'JSXOpeningElement'.name: 'div'.attributes: {'class': 'test1'.'id': 'test'} Div (id="test", class="test1")
}
Copy the code
Tags are closed and open. Tags are closed and open. Functions to parse attributes are kept inside tagParser.
// Open and close tags and it is not the
case that triggers attribute parsing
if(! isClosing && str[endIndex] ! = ='>') {
attrParser();
}
Copy the code
function attrParser() {
let endIndex = null;
while (index < str.length) {
// Each iteration checks to see if it is finished.
if (str[index] === '>') {
endIndex = index;
break;
}
index++;
}
let attributes = {};
// There must be an end tag, which means there must be an endIndex
if(endIndex ! =null) {
Attr id="test" class="test1"
// We just need to parse these strings into the data structure we want
const attrStr = str.slice(index, endIndex).trim();
attributes = attrStr.split(/\s/).reduce((memo, item) = > {
const [key, value] = item.split('=');
memo[key] = value;
return memo;
}, {});
}
return attributes;
}
Copy the code
In addition to opening and closing tags,
But with tags and attributes we can use the state machine to parse strings so it should be a small case
if{jsxTextParser(); jsxTextParser(); }function jsxTextParser() {
const startIndex = index;
while(index < str.length && str[index] ! = ='<') {
index++;
}
return { type: 'JSXText'.value: str.slice(startIndex, index) };
};
Copy the code
< span style = “max-width: 100%; clear: both; min-width: 1pt;
{ type: '开'.name: 'div' } * 3
{ type: 'closed'.name: 'div' }
{ type: '开'.name: 'div' }
{ type: 'closed'.name: 'div' } * 2
{ type: '开'.name: 'div' }
{ type: 'closed'.name: 'div' }
{ type: '开'.name: 'div' }
{ type: 'closed'.name: 'div' } * 2
Copy the code
A1, B1 (C1, C2) (D1, D2) B2, A2 Maybe it’s better for us to find the pattern. Let’s look from left to right. First we open A, open B, open C, closed C, and now we have our first closed C, which needs to be combined with open C to minimize child:
{
openingElement: {type: 'Open and close label'.name: 'div'},
closeingElement: {type: 'Close tag'.name: 'div'},
children: [{type: 'JSXText'.value: 'C tag copy '}}]Copy the code
Then open D close D, you can find the rule, each closed tag will be paired with an open and close tag, the idea into code, let open and close complete the pairing
const arr = [];
arr.push(Open A ' ');
arr.push('open B');
arr.push('open C');
constOpen closed C = {close: 'closed C'.open: arr.pop(), type: 'JSXElement'}; Open and close arr. Push (C);// arr=[open A open B open C]
arr.push('open D');
constOpen closed D = {close: 'D closed'.open: arr.pop(), type: 'JSXElement'}; Open and close arr. Push (D);// arr=[open A open B open C open D]
const children = [];
children.push(arr.pop()); / / D
children.push(arr.pop()); / / C
constOpen closed B = {close:'B' closed.open: arr.pop(),
children,
type: 'JSXElement'}; Open and close arr. Push (B);// arr=[open A open B]
const children = [];
children.push(arr.pop()); / / B
constOpen close A = {close: 'A' closed.open: arr.pop();
children,
type: 'JSXElement'}; Arr. Push (open and close A);// arr=[open closed A]
if (arr.length === 1) return arr[0]; / / open and close A
Copy the code
At this point, the core idea is pretty much there. Go to the code directly. I will fill in the comments of the core function Parser in the code
AC code
To save space, tagParser attrParser jsxTextParser will not be posted:
function parse(str) {
// Define stack as arR above, for open and close label caching
let stack = [];
let index = 0;
// State machine routine: Ensure that every character is parsed
while (index < str.length) {
// Start parsing tags
if (str.charAt[index] === '<') {
// This will output all the tag information
const tag = parseTag();
// As mentioned above, when a closed tag is found, it will match its open and close tags to fill in the text
if (tag.type === types.closingElement) {
let children = [];
C
D C D
// Open closed D open closed C text 1 text 2... Until we find open B
while (stack.length && stack[stack.length - 1].type ! == types.openingElement) { children.push(stack.pop()); }// This is the core part. After each reorganization, it needs to be pushed to the stack so that it can be paired with the next closed tag check
stack.push({
openingElement: stack.pop(),
children,
closingElement: elem,
type: types.element,
});
}
// Open/close tag push
else{ stack.push(elem); }}// If is finished, it means the tag is finished, either the tag or the text
else {
consttext = parseText(); stack.push(text); }}return stack[0];
}
Copy the code
conclusion
For me, this question once again consolidated the use of state machine. State machine was also used in the second handwritten Json. parse. At the end of that article, I summarized several application scenarios of state machine, including AST boundary value judgment and so on. In the process of reading questions, we often encounter problems using state machines to solve strings. Similar to implementing indexOf to parse HTTP request information, these are relatively simple and pure cases of state machines. It also involves combinatorial algorithms and ast, which I find to be quite difficult, so it’s quite challenging.
This article is participating in the “Nuggets 2021 Spring Recruitment Campaign”, click to see the details of the campaign