preface

In the first article of this series, we explained how Vuepress made Markdown support Vue components, but didn’t mention how the rest of the non-VUE components are parsed.

Today, we’ll take a look at how Vuepress uses Markdown-it to parse markdown code.

Introduction to the markdown – it

Markdown-it is a library to assist in parsing markdown, and can complete the conversion from # test to

test

.

It supports both the browser and Node environments, and is essentially similar to Babel, except that it parses JavaScript.

Speaking of parsing, Markdown-it has an online example that allows you to visualize the results of markdown parsing. For example, using # test again, you get the following result:

[{"type": "heading_open"."tag": "h1"."attrs": null."map": [
      0.1]."nesting": 1."level": 0."children": null."content": ""."markup": "#"."info": ""."meta": null."block": true."hidden": false
  },
  {
    "type": "inline"."tag": ""."attrs": null."map": [
      0.1]."nesting": 0."level": 1."children": [{"type": "text"."tag": ""."attrs": null."map": null."nesting": 0."level": 0."children": null."content": "test"."markup": ""."info": ""."meta": null."block": false."hidden": false}]."content": "test"."markup": ""."info": ""."meta": null."block": true."hidden": false
  },
  {
    "type": "heading_close"."tag": "h1"."attrs": null."map": null."nesting": - 1."level": 0."children": null."content": ""."markup": "#"."info": ""."meta": null."block": true."hidden": false}]Copy the code

After tokenizes, we acquire a token:

We could also execute the following code manually to get the same result:

const md = new MarkdownIt()
let tokens = md.parse('# test')
console.log(tokens)
Copy the code

Main API Introduction

model

Markdown-it provides three modes: CommonMark, Default, and Zero. Correspond to the strictest, GFM, and loosest parsing modes respectively.

parsing

The parsing rules for markdown-it are roughly divided into blocks and inline. MarkdownIt. Block corresponds to ParserBlock, MarkdownIt. Inline corresponds to ParserInline, MarkdownIt. MarkdownIt. The renderer. Render and MarkdownIt. The renderer. RenderInline correspond by blocks and inline rules generated HTML code.

The rules

Renderer has a special attribute: rules, which represents the rendering rules for tokens and can be updated or extended by the user:

var md = require('markdown-it') (); md.renderer.rules.strong_open =function () { return '<b>'; };
md.renderer.rules.strong_close = function () { return '</b>'; }; var result = md.renderInline(...) ;Copy the code

For example, this code updates the rules for rendering strong_open and Strong_close tokens.

Plug-in system

Markdown-it officially said:

We do a markdown parser. It should keep the “markdown spirit”. Other things should be kept separate, in plugins, for example. We have no clear criteria, sorry. Probably, you will find CommonMark forum a useful read to understand us better.

In a nutshell, Markdown-it does pure Markdown parsing, and if you want more functionality you’ll have to write your own plugins.

So, they provided an API: Markdownit.use

It can load the specified plug-in into the current parser instance:

var iterator = require('markdown-it-for-inline');
var md = require('markdown-it')()
            .use(iterator, 'foo_replace'.'text'.function (tokens, idx) {
              tokens[idx].content = tokens[idx].content.replace(/foo/g, 'bar');
            });
Copy the code

This example code replaces all foo in the Markdown code with bar.

For more information

You can access Chinese documents translated during the Chinese Celebration, or official API documents.

Applications in VuePress

Vuepress makes use of markdown-it’s community plug-ins, such as highlighting code, code block wrapping, emoji, etc., and also writes its own markdown-It plug-ins, such as vUE component recognition, internal and external chain differentiation rendering, etc.

Related to the source code

This article was written during the National Day of 2018, and the corresponding vuepress code version is V1.0.0-alpha.4.

The entrance

The source code mainly does the following five things:

  1. Use community plug-ins such as Emoji recognition, anchors, and TOC.
  2. Use custom plug-ins, more on that later.
  3. Use markdown-it-chain to support chained calls to markdown-it, similar to the Webpack-chain I mentioned in the second article.
  4. Parameters can be passed to beforeInstantiate and afterInstantiate hooks to expose markdown-it instances externally.
  5. DataReturnable custom render:
module.exports.dataReturnable = function dataReturnable (md) {
  // override render to allow custom plugins returndata const render = md.render md.render = (... args) => { md.__data = {} const html = render.call(md, ... args)return {
      html,
      data: md.__data
    }
  }
}
Copy the code

This allows __data to be used as a global variable, storing the data used by each plug-in.

Identify vUE components

The source code

Just one thing: replace the default htmlBlock rules so you can use custom VUE components at the root level.

module.exports = md= > {
  md.block.ruler.at('html_block', htmlBlock)
}
Copy the code

What are the key differences between this htmlBlock function and html_block in native Markdown-it?

The answer is to add two elements to the HTML_SEQUENCES regular array:

// PascalCase Components
[/^<[A-Z]/, />/, true],
// custom elements with hyphens
[/^<\w+\-/, />/, true].Copy the code

Obviously, this is the component that matches the PASCAL notation (e.g. ) and the hyphen notation (e.g.
).

Piece of content

The source code

This component actually uses the community markdown-it-Container plugin to define render functions for tip, Warning, Danger, and V-pre:

render (tokens, idx) {
  const token = tokens[idx]
  const info = token.info.trim().slice(klass.length).trim()
  if (token.nesting === 1) {
    return `<div class="${klass} custom-block"><p class="custom-block-title">${info || defaultTitle}</p>\n`
  } else {
    return `</div>\n`
  }
}
Copy the code

The two attributes of the token need to be explained here.

  1. Info The string followed by three backquotes.

  2. Nesting properties:

  • 1That means the label is open.
  • 0Means the label is automatically closed.
  • - 1Indicates that the tag is being closed.

Highlighting code

The source code

  1. With the help of the PrismJS library
  2. Think of Vue and HTML as the same language:
if (lang === 'vue' || lang === 'html') {
	lang = 'markup'
}
Copy the code
  1. Compatible with language abbreviations, such as MD, TS, py
  2. Use the wrap function to wrap the generated highlighting code one more time:
function wrap (code, lang) {
  if (lang === 'text') {
    code = escapeHtml(code)
  }
  return `<pre v-pre class="language-${lang}"><code>${code}</code></pre>`
}
Copy the code

Highlight lines of code

The source code

  1. Based on someone else’s code.
  2. Rewrote the md.renderer.rules.fence method, the key is to get the lines of code to be highlighted with a regular judgment:
const RE = /{([\d,-]+)}/

const lineNumbers = RE.exec(rawInfo)[1]
      .split(', ')
      .map(v => v.split(The '-').map(v => parseInt(v, 10)))
Copy the code

Then conditional render:

if (inRange) {
   return `<div class="highlighted">&nbsp; </div>` }return '<br>'
Copy the code

Finally return the highlighted line + plain code.

Script ascension

The source code

Rewrite md.renderer.rules.html_block rule:

const RE = /^<(script|style)(? =(\s|>|$))/i md.renderer.rules.html_block = (tokens, idx) => { const content = tokens[idx].content const hoistedTags = md.__data.hoistedTags || (md.__data.hoistedTags = [])if (RE.test(content.trim())) {
	  hoistedTags.push(content)
	  return ' '
	} else {
	  return content
	}
}
Copy the code

Store style and script tags in a pseudo-global variable called __data. This data will be used in the markdownLoader.

The line Numbers

The source code

Md.renderer.rules.fence rule rewrite mD.renderer.rules.fence rule, calculate the number of lines of code by the number of line breaks, and wrap another layer:

const lines = code.split('\n')
const lineNumbersCode = [...Array(lines.length - 1)]
  .map((line, index) => `<span class="line-number">${index + 1}</span><br>`).join(' ')

const lineNumbersWrapperCode =
  `<div class="line-numbers-wrapper">${lineNumbersCode}</div>`
Copy the code

Finally, the final code:

const finalCode = rawCode
  .replace('<! --beforeend-->'.`${lineNumbersWrapperCode}<! --beforeend-->`)
  .replace('extra-class'.'line-numbers-mode')

return finalCode
Copy the code

Internal and external chain differentiation

The source code

An A link can be a hop in or a hop out of the site. Vuepress makes a distinction between the two types of links, and eventually the outer chain will render one more icon than the inner chain:

To do this, Vuepress overwrites the md.renderer.rules.link_open and md.renderer.rules.link_close rules.

Look at the md. The renderer. Rules. Link_open:

if (isExternal) {
	Object.entries(externalAttrs).forEach(([key, val]) = > {
	  token.attrSet(key, val)
	})
	if (/_blank/i.test(externalAttrs['target'])) {
	  hasOpenExternalLink = true}}else if (isSourceLink) {
	hasOpenRouterLink = true
	tokens[idx] = toRouterLink(token, link)
}
Copy the code

If isSourceLink is true, an inner link is passed in and the whole token will be replaced with toRouterLink:

function toRouterLink (token, link) {
	link[0] = 'to'
	let to = link[1]

	// convert link to filename and export it for existence check
	const links = md.__data.links || (md.__data.links = [])
	links.push(to)

	const indexMatch = to.match(indexRE)
	if (indexMatch) {
	  const [, path, , hash] = indexMatch
	  to = path + hash
	} else {
	  to = to
	    .replace(/\.md$/, '.html')
	    .replace(/\.md(#.*)$/, '.html$1')
	}

	// relative path usage.
	if(! to.startsWith('/')) {
	  to = ensureBeginningDotSlash(to)
	}

	// markdown-it encodes the uri
	link[1] = decodeURI(to)

	// export the router links for testing
	const routerLinks = md.__data.routerLinks || (md.__data.routerLinks = [])
	routerLinks.push(to)

	return Object.assign({}, token, {
	  tag: 'router-link'})}Copy the code

Href is replaced with to, and then to is replaced with a valid link at the end of.html.

Md.renderer.rules.link_close:

if (hasOpenRouterLink) {
  token.tag = 'router-link'
  hasOpenRouterLink = false
}
if (hasOpenExternalLink) {
  hasOpenExternalLink = false
  // add OutBoundLink to the beforeend of this link if it opens in _blank.
  return '<OutboundLink/>' + self.renderToken(tokens, idx, options)
}
return self.renderToken(tokens, idx, options)
Copy the code

Obviously, the inner chain renders the router-link tag, and the outer chain renders the OutboundLink tag, which is the link component with that little icon added.

Code block wrap

The source code

This plugin overrides the md.renderer.rules.fence method to wrap the

 tag again:
md.renderer.rules.fence = (. args) = > {
	const [tokens, idx] = args
	const token = tokens[idx]
	constrawCode = fence(... args)return ` <! --beforebegin--><div class="language-${token.info.trim()} extra-class">` +
	` <! --afterbegin-->${rawCode}<! --beforeend--></div><! --afterend-->`
}
Copy the code

Split the fence code into four parts: beforeBegin, AfterBEGIN, beforeEnd, AfterEnd. This provides hooks for users to customize markdown-it plug-ins.

Anchor point non-ASCII character processing

The source code

This code was originally designed to solve the problem of anchors with Chinese or special characters not jumping correctly.

Non-acsii characters are processed in order: diacritics -> C0 control characters -> special characters -> short bars (-) that occur more than 2 times in a row -> short bars that are used at the beginning or end.

Finally, underline the leading digits and turn all of them into lower case.

Snippet introduction

The source code

It adds a snippet rule before md.block.ruler. Fence to parse code like <<< @/filepath:

const start = pos + 3
const end = state.skipSpacesBack(max, pos)
const rawPath = state.src.slice(start, end).trim().replace(/^@/, root)
const filename = rawPath.split(/[{:\s]/).shift()
const content = fs.existsSync(filename) ? fs.readFileSync(filename).toString() : 'Not found: ' + filename
Copy the code

It takes out the file path and puts it together with the root path and reads the file contents. Because can also parse < < < @ / test/markdown/fragments could/snippet. Js {2} this line with highlighting code snippets, so need to use the split capture the real name of the file.

conclusion

As an interpretive language, Markdown can help people describe things better. At the same time, it acts as a bridge to HTML, resulting in beautiful, minimalist pages.

Markdown-it’s parser, renderer, and plugin systems allow developers to add more charm to Markdown based on their imagination.