Author: Ji Zhi
preface
In order to understand the source code for MarkdownIt, it is necessary to understand the two base classes, Ruler & Token.
Token
Commonly known as lexical unit.
Md receives a string, turns it into one token after a series of parser processing, then calls the rule corresponding to render, takes token as input, and finally outputs HTML string.
Let’s start with the definition of Token, which is located in lib/token.js.
function Token(type, tag, nesting) {
this.type = type;
this.tag = tag;
this.attrs = null;
this.map = null;
this.nesting = nesting;
this.level = 0;
this.children = null;
this.content = ' ';
this.markup = ' ';
this.info = ' ';
this.meta = null;
this.block = false;
this.hidden = false;
}
Copy the code
-
type
The token types, for example, paragraph_open, paragraph_close, and hr, go to
,
, and
, respectively.
-
tag
Tag names such as P, strong, ”(empty string). Stands for words, etc.
-
attrs
The property of an HTML tag element, if present, is a two-dimensional array, such as [[“href”, “http://dev.nodeca.com”]]
-
map
The array has only two elements, the first is the start line and the second is the end line.
-
nesting
Label type: 1 indicates an open label, 0 indicates a self-closing label, and -1 indicates a closed label. For example,
,
,
.
-
level
The level of compaction.
-
children
Token. Only tokens with type inline or image will have children. Since inline tokens also go through a parser to extract more detailed tokens, as in the following scenario.
const src = '__advertisement__' const result = md.render(src) // Get the following token first{... .content:"__Advertisement :)__".children: [Token, ...] } // Content needs to be parsed and extracted with "__", "__" needs to be rendered with tags. Therefore inline children is used to store child tokens. Copy the code
-
content
Place content between labels.
-
markup
Some markup for a particular syntax. For example, “” indicates a code block. **” is the grammar of emphasis.” -” or “+” is a list.
-
info
A token whose type is fence has an info attribute. What is a fence, it goes like this:
/** ```js let md = new MarkdownIt() ``` **/ Copy the code
Inside the above comment is a fence token. Its info is JS, and markup is “.”
-
meta
Plugins are used to store arbitrary data.
-
block
The block for the token generated by ParserCore is true, and the block for the token generated by ParserInline is true.
-
hidden
If true, the token is not rendered.
Let’s look at the prototype approach.
-
attrIndex()
Token.prototype.attrIndex = function attrIndex(name) { var attrs, i, len; if (!this.attrs) { return - 1; } attrs = this.attrs; for (i = 0, len = attrs.length; i < len; i++) { if (attrs[i][0] === name) { returni; }}return - 1; }; Copy the code
Returns the index based on attribute name.
-
attrPush()
Token.prototype.attrPush = function attrPush(attrData) { if (this.attrs) { this.attrs.push(attrData); } else { this.attrs = [ attrData ]; }};Copy the code
Add a [name, value] pair.
-
attrSet
Token.prototype.attrSet = function attrSet(name, value) { var idx = this.attrIndex(name), attrData = [ name, value ]; if (idx < 0) { this.attrPush(attrData); } else { this.attrs[idx] = attrData; }};Copy the code
Overrides or adds a [name, value] pair.
-
attrGet
Token.prototype.attrGet = function attrGet(name) { var idx = this.attrIndex(name), value = null; if (idx >= 0) { value = this.attrs[idx][1]; } return value; }; Copy the code
Returns the property value based on name
-
attrJoin
Token.prototype.attrJoin = function attrJoin(name, value) { var idx = this.attrIndex(name); if (idx < 0) { this.attrPush([ name, value ]); } else { this.attrs[idx][1] = this.attrs[idx][1] + ' '+ value; }};Copy the code
Concatenate the current value to the previous value based on name.
Token summary
Token is the most basic class within MarkdownIt and the smallest unit of division. It is the product of parse and the basis for output.
Ruler
Take a look at MarkdownIt’s other class, Ruler. It can be thought of as the manager of the responsibility chain function. As it stores many rule functions internally, the functions of rule are divided into two types: parse rule, which is used to parse the string passed in by users and generate tokens, and render rule, which is used to generate tokens. Call different Render rules based on the token type, and finally spit out the HTML string.
Let’s start with Constructor.
function Ruler() {
this.__rules__ = [];
this.__cache__ = null;
}
Copy the code
-
__rules__
To hold all rule objects, it has the following structure:
{ name: XXX, enabled: Boolean.// Whether to enable fn: Function(), // handle the function alt: [ name2, name3 ] // Name of the responsibility chain to which it belongs } Copy the code
Some people will be confused by Alt, but I’ll leave this a bit of a hole and we’ll talk about it in detail when we look at __compile__ methods.
-
cache
It stores information about the rule chain. Its structure is as follows:
{Responsibility chain name: [rule1.fn, rule2.fn,...] }Copy the code
Note: There is a default rule chain named an empty string (“) whose value is an array containing all of the rules.fn.
Let’s look at the effects of each approach on the prototype.
-
__find__
Ruler.prototype.__find__ = function (name) { for (var i = 0; i < this.__rules__.length; i++) { if (this.__rules__[i].name === name) { returni; }}return - 1; }; Copy the code
Find its index in __rules__ based on the rule name.
-
__compile__
Ruler.prototype.__compile__ = function () { var self = this; var chains = [ ' ' ]; // collect unique names self.__rules__.forEach(function (rule) { if(! rule.enabled) {return; } rule.alt.forEach(function (altName) { if (chains.indexOf(altName) < 0) { chains.push(altName); }}); }); self.__cache__ = {}; chains.forEach(function (chain) { self.__cache__[chain] = []; self.__rules__.forEach(function (rule) { if(! rule.enabled) {return; } if (chain && rule.alt.indexOf(chain) < 0) { return; } self.__cache__[chain].push(rule.fn); }); }); }; Copy the code
Generate responsibility chain information.
- First, use the __rules__ rule to find all key names corresponding to rule chains. This is where the Alt attribute of the rule becomes particularly important, because it indicates that it belongs to the responsibility chain of the Alt in addition to the default responsibility chain. By default, there is a chain of responsibilities with an empty string key (“) to which any rule-fn belongs.
- The rule-fn is then mapped to the corresponding key property and cached on the __cache__ property.
Here’s an example:
let ruler = new Ruler() ruler.push('rule1', rule1Fn, { alt: 'chainA' }) ruler.push('rule2', rule2Fn, { alt: 'chainB' }) ruler.push('rule3', rule3Fn, { alt: 'chainB' }) ruler.__compile__() // We can get the following structure ruler.__cache__ = { ' ': [rule1Fn, rule2Fn, rule3Fn], 'chainA': [rule1Fn], 'chainB': [rule2Fn, rule3Fn], } // Get three rule chains: '', 'chainA', 'chainB'. Copy the code
-
at
Ruler.prototype.at = function (name, fn, options) { var index = this.__find__(name); var opt = options || {}; if (index === - 1) { throw new Error('Parser rule not found: ' + name); } this.__rules__[index].fn = fn; this.__rules__[index].alt = opt.alt || []; this.__cache__ = null; }; Copy the code
Replace the FN of a rule or change the chain name to which it belongs.
-
before
Ruler.prototype.before = function (beforeName, ruleName, fn, options) { var index = this.__find__(beforeName); var opt = options || {}; if (index === - 1) { throw new Error('Parser rule not found: ' + beforeName); } this.__rules__.splice(index, 0, { name: ruleName, enabled: true.fn: fn, alt: opt.alt || [] }); this.__cache__ = null; }; Copy the code
Insert a new rule before a rule.
-
after
Ruler.prototype.after = function (afterName, ruleName, fn, options) { var index = this.__find__(afterName); var opt = options || {}; if (index === - 1) { throw new Error('Parser rule not found: ' + afterName); } this.__rules__.splice(index + 1.0, { name: ruleName, enabled: true.fn: fn, alt: opt.alt || [] }); this.__cache__ = null; }; Copy the code
Insert a new rule after a rule.
-
push
Ruler.prototype.push = function (ruleName, fn, options) { var opt = options || {}; this.__rules__.push({ name: ruleName, enabled: true.fn: fn, alt: opt.alt || [] }); this.__cache__ = null; }; Copy the code
Increase the rule.
-
enable
Ruler.prototype.enable = function (list, ignoreInvalid) { if (!Array.isArray(list)) { list = [ list ]; } var result = []; // Search by name and enable list.forEach(function (name) { var idx = this.__find__(name); if (idx < 0) { if (ignoreInvalid) { return; } throw new Error('Rules manager: invalid rule name ' + name); } this.__rules__[idx].enabled = true; result.push(name); }, this); this.__cache__ = null; return result; }; Copy the code
Enabling rules listed in the list does not affect other rules.
-
enableOnly
Ruler.prototype.enableOnly = function (list, ignoreInvalid) { if (!Array.isArray(list)) { list = [ list ]; } this.__rules__.forEach(function (rule) { rule.enabled = false; }); this.enable(list, ignoreInvalid); }; Copy the code
Disable all other rules first, and enable only the rules corresponding to list.
-
getRules
Ruler.prototype.getRules = function (chainName) { if (this.__cache__ === null) { this.__compile__(); } return this.__cache__[chainName] || []; }; Copy the code
Obtain the corresponding FN function queue based on the key of the rule chain.
Ruler summary
It can be seen that Ruler is quite flexible. Whether it is at, before, after, enable or other methods, it has given the Ruler great flexibility and expansibility. As a user, it can use these excellent architectural designs to meet specific requirements.
conclusion
After analyzing the base classes Token and Ruler, we will further uncover the source code of MarkdownIt. In future articles, we’ll look at how tokens are generated from the SRC string parse, and how tokens are output to renderer.render as the final string. In the next article, we will enter MarkdownIt’s entry point parser — CoreParser analysis.