Author: imyzf

An overview of the

Content as Structured data. — Unified official website inscription

Unified is a text processing ecosystem that can handle Markdown, HTML, natural language, and more with its ecologically related plug-ins. The Unified library itself acts as a unified execution interface, acting as an executor, calling its ecologically related plug-ins to complete processing tasks.

As can be seen from the official website of Unified, currently Unified is widely used, including Prettier, node. js official website and Gatsby all use unified’s ability to complete some functions.

Figure: An example of unified’s official website

Common usage scenarios include:

  • Generate HTML pages and sites based on Markdown
  • Markdown/HTML content processing
  • Markdown syntax check and formatting
  • A tool that encapsulates a particular scenario as an underlying library

Since there are very few articles about unified system in China, this article will introduce the ecology and working principle of unified plug-in, and analyze some usage examples to help readers understand the ability, principle and use of Unified system.

Plug-in ecological

Figure: Plug-in related to Unified Ecology

remark

Remark is a collection of Markdown related plug-ins that provide Markdown parsing, modifying, and converting to HTML.

Some common plugins currently available:

  • Remark-parse: Provides the ability to parse Markdown
  • Remark-gfm: It provides the GitHub markdown (GFM) support
  • Remark-lint: Provides Markdown code checking capabilities
  • Remark-toc: Provides the Markdown document catalog generation function
  • Remark-html provides the ability to compile Markdown into HTML

The complete list of plug-ins can be found here, with more than 150 plug-ins to choose from.

We can use this convenient way to call remark in a project:

remark() // Initialize the Markdown parser with one click
  .processSync('# Hello, world! ') // Process text synchronously
Copy the code

Equivalent to the following:

unified() // Use unified interfaces
  .use(remarkParse)  // Use the Markdown parser plug-in
  .use(remarkStringify) // Use the Markdown text serialization plug-in
  .processSync('# Hello, world! ')
Copy the code

Figure: Example of remark usage and conversion

Note that gnAB/Remark is a project with the same name on GitHub and its official website noted Js.com. Although it is also a tool related to Markdown, remark in Unified Ecology has nothing to do with remark. Remark’s official website is remark. Js.org. Avoid confusion when searching relevant information through the search engine.

rehype

Like Remark, Rehype is a collection of HTML-related plug-ins that provide HTML formatting, compression, document generation, and more.

Rehype’s plugins are relatively small in comparison, with just over 40, and a detailed list of plugins can be found in the Plugins list documentation

At the same time, we can also use rehype-remark and remark-rehype to convert plug-in systems between the two languages. For example, we can convert stdin input HTML content to Markdown:

import { unified } from 'unified'
import { stream } from 'unified-stream'
import rehypeParse from 'rehype-parse'
import rehypeRemark from 'rehype-remark'
import remarkStringify from 'remark-stringify'

const processor = unified()
  .use(rehypeParse)     / / parse HTML
  .use(rehypeRemark)    // Switch to remark
  .use(remarkStringify) // Convert the syntax tree to a Markdown string

process.stdin.pipe(stream(processor)).pipe(process.stdout)
Copy the code

other

Retext and REDOT are two relatively niche systems with less usage and less active development. Their uses are as follows:

  • Retext: Provides natural language processing capabilities, including spell checking, error correction, readability checking, and more
  • Redot: Provides the parsing capabilities of Graphviz

In addition, in the Markdown domain, there are two systems named not beginning with RE, MDX and Micromark, which correspond to specific Markdown usage scenarios:

  • MDX: Provides the ability to write JSX in Markdown documents, introduce various components into documents, and write interactive documents
  • Micromark: A minimalist Markdown transformation library that supports a small number of extensions for simple Markdown to HTML scenarios. Remark also reuses Micromark’s parsing capabilities

Specific information can be found in the project documentation, which is not described here.

The working principle of

The core mechanism of Unified is based on the AST (Abstract Syntax trees), which is passed to the plug-in when it executes and can be processed in a variety of ways. At the same time, the AST can also be used for various language transformations, such as parsing Markdown documents, converting them to HTML for processing, and then going back to Markdown.

Figure: Unified workflow

For example, we could iterate through the AST in the plugin and print out all the heading nodes:

module.exports = () = > tree= > {
  visit(tree, 'heading'.node= > {
    console.log(node)
  })
}
Copy the code

The visit method in the above example comes from the unist-util-visit tool, which provides the capability to traverse nodes. Unified uses an AST standard called UNIST or UST to make the same tools available in different languages. For example, AST for Markdown and HTML, since they are based on the same standard, we can use the same VISIT API to achieve the same functionality:

visit(markdownAST, 'images', transformImages)
visit(htmlAST, 'img', transformImgs)
Copy the code

Scenario, for example,

Next, I will list some usage scenarios based on the Unified Ecology to help you understand its use.

Node. Js’s official website

Node.js uses Unified for syntax checking and document construction:

  • Use remark-CLI to check the Markdown document and refer to its package.json script configuration
  • Document building with Unified, refer to the code in generate.mjs

dumi

Dumi is a customized documentation tool for component development scenarios, whose core function is to convert Markdown documents into HTML pages. If you look at the source code, you can see that it uses Unified as a converter, introduces Unified in Remark /index.ts, and calls a series of custom or community-provided plug-ins for processing.

Because of the large number of custom plug-ins used, the Dumi source code serves as an excellent reference for Unified plug-in development. For example, refer to Link. ts to learn how to add external links in Markdown. By modifying the AST, add a small link icon in the generated page to remind users that this is a link to an external site.

Source code:

[Cloud Music official website] (https://music.163.com/)
Copy the code

To:

<a target="_blank" rel="noopener noreferrer" href="https://music.163.com/">Cloud Music official website<svg class="__dumi-default-external-link-icon">...</svg>
</a>
Copy the code

react-markdown

As part of remark architecture, React-MarkDown is an upper-layer package based on the Unified ecology and provides a React component that can render Markdown. In the React framework, react-MarkDown is safer, more reliable, and easier to use than remark to convert Markdown into HTML and then use dangerouslySetInnerHTML to render.

Figure: How react- Markdown works

The diagram above shows how react-MarkDown works, as follows:

  1. Converts Markdown to the corresponding AST — MDAST through remark
  2. The REMARK plug-in is used to process mdAST
  3. Remark-rehype converts MDAST to AST — HTML
  4. Hast is processed using the Rehype plugin
  5. Render hast for React elements using the React component

The whole process above is actually a general processing process of Markdown rendering to HTML, which can also be used as a reference when implementing similar libraries.

About the author

There are currently 333 open source projects in Unified Ecology (as of 2022.01.05) with Titus Wormer as the core developer. According to his website, Wormer is originally from the Netherlands and is a graduate and former lecturer at the University of Applied Sciences in Amsterdam. As a full-time open source contributor, I have maintained more than 535 projects, with 50% of my time and effort devoted to unified. It is admirable to be able to contribute so much to the open source community on one’s own. Refer to the Unified Collective documentation for further information on how he manages the Unified organization.

This article is published from NetEase Cloud Music big front end team, the article is prohibited to be reproduced in any form without authorization. Grp.music – Fe (at) Corp.Netease.com We recruit front-end, iOS and Android all year long. If you are ready to change your job and you like cloud music, join us!