A diagram illustrates how the Ejs template engine works
The diagram above Outlines the principles of a simple template engine (in this case, EJS). This article describes how a simple template engine works. Contains the key steps of implementation and the thinking behind them.
This is basically how the template engine works, but the ideas are universal. If you look at vue’s template compiler source code, you can apply these ideas and methods as well.
Basic API Design
We will implement a simplified version of EJS that supports these tags:
-
<% script %> – Script execution. Generally used in control statements, do not output values such as
<% if (user) { %> <div>some thing</div> <%} % > Copy the code
-
<%= expression %> – Prints the value of the expression, but escapes HTML:
<title>{%= title %}</title> Copy the code
-
<% -expression %> – Same as <%= expr %> except that HTML is not escaped
-
<%% and %%> – indicate label escape, such as <%% is printed as <%
-
<%# comment %> – No content output
Here is a complete template example, based on which the rest of the article will be explained:
<html>
<head><% = title% ></head>
<body>
<% %Escape % % >
<% #Here is the comment %>
<% - before% >
<% if (show) { %>
<div>root</div>
<%} % >
</body>
</html>
Copy the code
Basic API Design
We put the Template parsing and rendering logic into a Template class with the following basic interface:
export default class Template {
public template: string;
private tokens: string[] = [];
private source: string = "";
privatestate? : State;privatefn? :Function;
public constructor(template: string) {
this.template = template;
}
/** * Template compilation */
public compile() {
this.parseTemplateText();
this.transformTokens();
this.wrapit();
}
/** * render method, the user specifies an object to render the string */
public render(local: object) { }
/ token resolution * * * * to < % if (codintion) {% > * resolved as a token arrays, for example [' < % ', 'the if (condition) {',' % > '] * /
private parseTemplateText() {}
/** * convert Token to Javascript statement */
private transformTokens() {}
/** * Encapsulates the Javascript statement converted from the previous step into a render method */
private wrapit() {}
}
Copy the code
Token parsing
The first step is to parse all the start tags and end tags. We expect the parsing result to look like this:
[
"\n\n "."< % ="." title "."% >"."\n \n "."< % %"."Escape"."% % >"."\n "."< % #"."Here's a comment."."% >"."\n "."< % -"." before "."% >"."\n "."< %"." if (show) { "."% >"."\n
root
\n "."< %"."}"."% >"."\n \n\n"
]
Copy the code
Because our template engine syntax is very simple, we don’t need to parse into an abstract syntax tree (AST) at all. Labels can be extracted directly through regular expressions.
Start by defining regular expressions that match all of our supported tags:
// <%% %%> is used for escape
/ / < % script
// <%= Outputs script values
// <%- Prints the script value, unescape
/ / < % # comments
// %> End tag
const REGEXP = | / (< % % % % > < % = | | - | < < % # % | | < % % >) /;
Copy the code
Use regular expressions to match strings one by one and break them up. The code is also simple:
parseTemplateText() {
let str = this.template;
const arr = this.tokens;
// The exec method retrieves the matching position, or returns null if the match fails
let res = REGEXP.exec(str);
let index;
while (res) {
index = res.index;
// Prefix string
if(index ! = =0) {
arr.push(str.substring(0, index));
str = str.slice(index);
}
arr.push(res[0]);
// Truncate the string to continue the match
str = str.slice(res[0].length);
res = REGEXP.exec(str);
}
if(str) { arr.push(str); }}Copy the code
Simple grammar check
Ok, once you’ve parsed out the tags, you’re ready to convert them into ‘render’ functions.
First, do a simple syntax check to see if the tag is closed:
const start = "< %"; // Start tag
const end = "% >"; // End tag
const escpStart = "< % %"; // Start label escape
const escpEnd = "% % >"; // End label escape
const escpoutStart = "< % ="; // Escaped expression output
const unescpoutStart = "< % -"; // Unescaped expression output
const comtStart = "< % #"; / / comment
if(tok.includes(start) && ! tok.includes(escpStart)) { closing =this.tokens[idx + 2];
if (closing == null| |! closing.includes(end)) {throw new Error(`${tok}The corresponding closing label 'was not found); }}Copy the code
conversion
Now start iterating through the token. We can use a finite-state machine (FSM) to describe the logic of the transformation.
A state machine is a mathematical model that represents a finite number of states and behaviors such as transitions and actions between these states. In simple terms, a finite state machine consists of a set of states, an initial state, inputs, and transition functions based on the inputs and the existing state to the next state. It has three characteristics:
- The total number of states is finite.
- In one state at any one time.
- It goes from one state to another under certain conditions
For a bit of analysis, our template engine state transition diagram looks like this:
The following states can be extracted from the figure above:
enum State {
EVAL, // Script execution
ESCAPED, // Expression output
RAW, // Expression output is not escaped
COMMENT, / / comment
LITERAL // A literal output
}
Copy the code
Ok, now start iterating through tokens:
this.tokens.forEach((tok, idx) = > {
// ...
switch (tok) {
/** * Label recognition */
case start:
// The script starts
this.state = State.EVAL;
break;
case escpoutStart:
// Escape output
this.state = State.ESCAPED;
break;
case unescpoutStart:
// Non-escape output
this.state = State.RAW;
break;
case comtStart:
/ / comment
this.state = State.COMMENT;
break;
case escpStart:
// Label escape
this.state = State.LITERAL;
this.source += `; __append('<%'); \n`;
break;
case escpEnd:
this.state = State.LITERAL;
this.source += `; __append('%>'); \n`;
break;
case end:
// Restore the initial state
this.state = undefined;
break;
default:
/** * convert output */
if (this.state ! =null) {
switch (this.state) {
case State.EVAL:
/ / code
this.source += `;${tok}\n`;
break;
case State.ESCAPED:
// stripSemi removes extra semicolons
this.source += `; __append(escapeFn(${stripSemi(tok)})); \n`;
break;
case State.RAW:
this.source += `; __append(${stripSemi(tok)}); \n`;
break;
case State.LITERAL:
// Because we put strings in single quotes, transformString converts single quotes, newlines, and escapes from tok
this.source += `; __append('${transformString(tok)}'); \n`;
break;
case State.COMMENT:
// Do nothing
break; }}else {
/ / literal
this.source += `; __append('${transformString(tok)}'); \n`; }}});Copy the code
After the above transformation, we can get the result like this:
; __append('\n\n '); ; __append(escapeFn( title )); ; __append('\n \n '); ; __append('< %'); ; __append('escape'); ; __append('% >'); ; __append('\n '); ; __append('\n '); ; __append( before ); ; __append('\n ');
; if(show) { ; __append('\n
root
\n ');
; }
;__append('\n \n\n');
Copy the code
The last step is to generate the function
Now we wrap the transformation result in a function:
wrapit() {
this.source = `\
const __out = [];
const __append = __out.push.bind(__out);
with(local||{}) {
The ${this.source}} return __out.join(''); \ `;
this.fn = new Function("local"."escapeFn".this.source);
}
Copy the code
The with statement is used to wrap the code above so that the local object can access the qualified prefix.
The render method is simple, calling the function wrapped above directly:
render(local: object) {
return this.fn.call(null, local, escape);
}
Copy the code
run
const temp = new Template(` < HTML > < head > < % = title % > < / head > < body > < % % escaping % % > < % # are comments here % > < % - before % > < % if (show) {% > < div > root < / div > < %} %> `);
temp.compile();
temp.render({ show: true.title: "hello".before: "<div>xx</div>" })
// <html>
// hello
//
// <% 转义 %>
//
//
xx
//
//
root
//
//
// </html>
Copy the code
You can run the complete code in CodeSandbox:
conclusion
In fact, this paper was inspired by the -super-tiny-Compiler and implemented a minimalist template engine. In fact, the template engine is also a compiler in nature. It can be learned from the above that there are three steps to compile a template engine:
-
Parsing parses template code into an abstract representation. Complex compilers have Lexical Analysis and Syntactic Analysis.
Lexical parsing, the process of parsing template content into tokens above can be considered as’ lexical parsing ‘. It splits the source code into token arrays. Tokens are small units representing independent ‘grammar fragments’.
Syntax parsing. The Syntax parser takes token arrays and reformats them as Abstract Syntax trees (AST). Abstract Syntax trees can be used to describe Syntax units and the relationships between units. Syntax problems can be found in the parsing phase.
(Photo credit: ruslanspivak.com/lsbasi-part…)
The template engine described in this article does not need an AST intermediate representation because the syntax is too simple. Convert directly on Tokens
-
The transformation transforms the representation abstracted from the previous step into what the compiler expects. For example, the template engine will convert the statement to the corresponding language. Sophisticated compilers’ transform ‘based on the AST, that is,’ add, delete, modify ‘the AST. The nodes of the AST are typically traversed/accessed in conjunction with the Visitors pattern
-
Code generation transforms the transformed abstract representation into new code. A template engine, for example, wraps the last step into a rendering function. Sophisticated compilers convert the AST into object code
The compiler stuff is really interesting, and I’ll have a chance to talk about how to write the Babel plug-in later.
extension
- Ejs source code
- the-super-tiny-compiler
- Let’s Build A Simple Interpreter. Part 7: Abstract Syntax Trees