How to build a static open source site construction tool

All code covered in this article can be found in docSite’s open source repository github.com/txd-team/do… If you are helpful, welcome to Star to pay attention to us.

background

The rise of static hosting services such as Github Pages, static generation + hosting low requirements for hosting environment, simple maintenance, can cooperate with version control, but flexible, this series of advantages, make the static site generator has a great development in recent years, emerged a series of excellent static site generator.

The author is responsible for the open source site construction of the whole department, and it is impossible to improve the development efficiency without a weighing tool. The tool used to build a site must meet the following requirements:

  • Simple and easy to learn
  • Both PC and mobile terminals are supported
  • Support internationalization of Chinese and English
  • Support SEO
  • Support markdown documents
  • Support open source site common home page, documentation page, blog list page, blog details page, community page
  • Support site style customization, including site theme style, document code highlighting style customization
  • Support custom pages

Looking at a series of open source static site building tools, there is always one or another function does not meet the needs, so set about building a static site building tool. Because it is mainly used for static site construction and supports Markdown documents, the author named the tool docsite.

Selection of technical scheme

Docsite tools

Overall, DocSite needs to be able to support site project initialization, local development, and local build. For the front-end students, using NodeJS to implement a command line tool, is an effective way. For this purpose, docsite needs to implement at least three commands, docsite init, docsite start, and docsite build.

  • docsite initYou need to initialize the project, copy the built-in templates to the current working directory, and install the dependencies.
  • docsite startYou need to implement a native development environment that can be recompiled as the associated code and Markdown files change.
  • docsite buildYou need to implement the build of the resource to generate the final usable code.

Built-in template

Initially, the solution was pure JS rendering logic for react+hashRouter. The advantage of this is simplicity, and the interaction between docSite and site projects is simple in actual project development. However, the disadvantages are obvious. HashRouter uses hash values to distinguish between different pages, and Google ignores tags after #, even if hashBang (#! Hash route), which the Google crawler can recognize. Such as www.example.com/ajax.html#! Key = value such an address, Google crawler to identify for www.example.com/ajax.html?_escaped_fragment_=key=value. However, in order for the crawler to collect the address, the server must return a specific content in the form of the URL of the latter, which is obviously unrealistic for static sites without a back end.

How about browserRouter? BrowserRouter has the same URL format as regular urls, with the only exception of 404 issues when the page is refreshed after the URL changes. At present, the mainstream static hosting provides a custom 404 page function, that is, when a 404 response code appears at an address of the site, the custom 404 page can be returned to the client as a response.

Seems to see a silver lining, however, the reality is harsh. Although the use of this mechanism can achieve the blank page refresh problem, but 404 response code is not friendly to search engines, directly affect the page included.

So, front-end routing this way is blocked, can only go to the form of multiple pages. In addition, static sites are mostly hosted on Github Pages. At present, the speed of domestic access is still relatively slow, pure JS rendering site, need to load JS resources first, and then page rendering. In the process of loading JS, the whole page is blank, affecting the user experience. In addition, in order to make it easier for others to find your site, SEO support is particularly important. And the domestic search engine Baidu to JS rendering content grab ability is simply weak chicken. Considering that most Chinese developers are not able to use Google’s search engine smoothly, support for Baidu’s search engine is very necessary.

React has a number of advantages:

  • Rich lifecycle methods
  • Unified event binding
  • Manipulate the DOM by manipulating data
  • .

But why give up the convenience of React in order to achieve SEO and reduce white screen time?

Reactdomserver. render: ReactdomServer. render: ReactDOMServer. In order to achieve HTML generation, we need to use the template engine, this project uses EJS.

The technical implementation

Project directory

After determining the technical solution, you need to plan the directory structure of the site. Adopt ES6+React technical solution, support SEO and internationalization at the same time, the final template directory structure is as follows:

... ├ ─ ─ babelrc ├ ─ ─ docsite ├ ─ ─ the eslintrc ├ ─ ─ the gitignore ├ ─ ─ the README. Md ├ ─ ─ blog │ ├ ─ ─ en - us │ └ ─ ─ useful - cn ├ ─ ─ docs │ ├ ─ ─ En - us │ └ ─ ─ useful - cn ├ ─ ─ gulpfile. Js ├ ─ ─ img ├ ─ ─ package - lock. Json ├ ─ ─ package. The json ├ ─ ─ redirect the ejs ├ ─ ─ site_config │ ├ ─ ─ Blog. Js │ ├ ─ ─ community. JSX │ ├ ─ ─ docs. Js │ ├ ─ ─ home. The JSX │ └ ─ ─ site. Js ├ ─ ─ the SRC │ ├ ─ ─ components │ ├ ─ ─ markdown. SCSS │ ├ ─ ─ pages │ │ ├ ─ ─ blog │ │ ├ ─ ─ blogDetail │ │ ├ ─ ─ community │ │ ├ ─ ─ documentation │ │ └ ─ ─ home │ ├ ─ ─ reset. The SCSS │ └ ─ ─ The variables. The SCSS ├ ─ ─ the template. The ejs ├ ─ ─ utils │ └ ─ ─ index. The js └ ─ ─ webpack. Config. JsCopy the code

Now from top to bottom to the main files, folders for explanation.

.docsite

Empty file used to determine whether the current project has been initialized.

template.ejs

Template for all generated HTML pages, changes to all pages (except redirected pages).

redirect.ejs

Redirect page template in which you can configure redirect logic. Index.html and 404.html (a custom 404 page feature for some statically managed sites) are generated from this template in the project root directory by default.

blog

Markdown is a directory for storing the blog’s Markdown documents and related image resources, in Both Chinese and English.

docs

Markdown documents and related image resources are stored in two directories, Chinese and English.

img

Store the pictures of some sites not used by Markdown, in which system stores some non-business related pictures.

site_config

Site. js is used to configure some global data, and other files are used to configure language packages for different pages in the Pages directory.

src

SCSS refers to the style file of markdown document, variable. SCSS refers to some public SCSS variables, components refers to public components, pages refers to different pages of corresponding sites, and utils stores some public methods.

internationalization

Internationalization is divided into two parts: internationalization of the Markdown documents and internationalization of the rest of the site.

  • Internationalization of Markdown documents

Markdown documents are divided into description documents and blog documents, which are stored in zh-CN and EN-US directories according to different language versions.

  • Internationalization of the rest of the site

By configuring the language package corresponding to different pages in the site_config directory, different language copy can be read according to different language versions, so as to achieve internationalization.

File change monitor

Webpack’s monitoring of JSX and SCSS code changes takes up one process. What about markdown files and EJS template changes, starting a separate process? No, NodeJS can start a child process that listens on Markdown documents and templates. So how do you implement file listening?

Fs. watch and fs.watchFile methods are available in the Node.js standard library to handle file monitoring. But fs.watch and fs.watchfile have the following problems:

  • The OS X environment does not report file name changes
  • When editors such as Sublime are used in OS X, no events are reported
  • Two incidents are often reported
  • Most event notifications arerename
  • You cannot simply recursively monitor a file tree
  • The CPU usage is high
  • There are plenty of other problems

A library dedicated to file monitoring was needed to address these shortcomings, and Chokidar was the perfect candidate for the task. The way to use it is simple. We just need to listen for files to be added, modified, and deleted.


const watcher = chokidar.watch('file, dir, glob, or array', {
  ignored: / (^ | [\ / \ \]) \.. /.persistent: true
});

watcher
  .on('add', path => log(`File ${path} has been added`))
  .on('change', path => log(`File ${path} has been changed`))
  .on('unlink', path => log(`File ${path} has been removed`));
Copy the code

When adding, modifying, and deleting files, execute the corresponding commands.

Markdown file parsing

metadata

For markdown files, in addition to the basic syntax, we also want to be able to put some additional data to describe the contents of markdown files, such as title, keywords, description, etc., when generating HTML pages, these data can be injected into the file to facilitate search engine inclusion of the page. For that, we need to make some arrangements.

The data between the top (at least three) of the markdown document is considered metadata, with each key taking up a row and the basic form being as follows:

---
title: demo title
keywords: keywords1,keywords2,keywords3
description: some description
---
Copy the code

This metadata can be easily retrieved through simple string matching.

Convert to HTML string

Once you get the contents of Markdown, how do you convert the Markdown syntax into an HTML string? Now it’s markdown-it’s turn. It is by far the most extensible and active Markdown Parser available. It’s easy to use:

const Mkit = require('markdown-it');
const hljs = require('highlight.js'); // For code highlighting
const md = new Mkit({
  html: true.linkify: true.highlight: function (str, lang) {
    if (lang && hljs.getLanguage(lang)) {
      try {
        return hljs.highlight(lang, str).value;
      } catch(err) {
        console.log(err)
      }
    }
    return ' '; // use external default escaping
  }
})
.use(plugin1)
.use(plugin2);
Copy the code

If basic syntax parsing is not sufficient, you can use plug-ins in the ecology that begin with markdown-it- to further improve markdown-it functionality.

In the end, a markdown file is parsed into a JSON file, such as /blog/zh-cn/demo.md:

---
title: demo title
keywords: keywords1,keywords2,keywords3
description: some description
---

## the title
Copy the code

/zh-cn/blog/ will generate a demo.json file with the following contents:

{
  "title": "demo title"."keywords": "keywords1,keywords2,keywords3"."description": "some description"."__html": "<h2>the title</h2>"."filename": "demo.md",}Copy the code

Markdown document display styles and code highlighting

Markdown parsed HTML string with some classes by default. The next step is to style these classes, which our predecessors have already done for us. Github.com/sindresorhu… Provides github style presentation effects. In addition, for code highlighting, highlightjs.org/static/demo… There is a rich palette for us to choose from.

React converts to HTML

As mentioned earlier, to use React and support SEO, you need to convert the React code to HTML strings. The React to HTML conversion can be easily implemented with server-side rendering provided by the React-DOM/Server, but there are a few caveats.

In the front-end code, we used a lot of ES6/7 syntax, JSX syntax, CSS resources, image resources, and finally packaged into a file with webpack and various loaders to run in the browser environment. However, nodeJS does not support import and JSX syntax, and does not recognize module references to CSS and image resource suffixes. So how to handle these static resources? We need tools and plug-ins to enable the Node.js parser to load and execute this type of code. To do this, you need to configure the following environment.

  1. The library babel-Polyfill was first introduced to provide the Regenerator runtime and core-JS to simulate a fully functional ES6 environment.
  2. Introduce babel-register, a require hook that automatically transcodes js files loaded by the require command in real time.
  3. Introduces CSS-modules-require-hooks, also hooks, for style files only.
  4. Asset-require-hook is introduced to identify image resources. Images smaller than 8K are converted to base64 strings, and images larger than 8K are converted to path references.

// Provide custom regenerator runtime and core-js
require('babel-polyfill');

// Javascript required hook
require('babel-register') ({extensions: ['.es6'.'.es'.'.jsx'.'.js'].presets: ['es2015'.'react'.'stage-0'].plugins: ['transform-decorators-legacy']});// Css required hook
require('css-modules-require-hook') ({extensions: ['.scss'.'.css'].preprocessCss: (data, filename) = >
        require('node-sass').renderSync({
            data,
            file: filename
        }).css,
    camelCase: true.generateScopedName: '[name]__[local]__[hash:base64:8]'
});

// Image required hook
require('asset-require-hook') ({extensions: ['jpeg'.'jpg'.'png'.'gif'.'webp'].limit: 8000
});
Copy the code

Emulating the browser environment

Some browser-specific objects are used in the code, so in the Node environment, you need to simulate these objects in the browser or you will get an error. Of course, jsDOM was born for this purpose, and it can be used as follows:

const jsdom = require('jsdom');
const { JSDOM } = jsdom;
const dom = new JSDOM('
      
      ');
const {window} = dom;
const copyProps = (src, target) = > {
    const props = Object.getOwnPropertyNames(src)
        .filter(prop= > typeof target[prop] === 'undefined')
        .map(prop= > Object.getOwnPropertyDescriptor(src, prop));
    Object.defineProperties(target, props);
}
global.window = window;
global.document = window.document;
global.HTMLElement=window.HTMLElement;
global.navigator = {
    userAgent: 'node.js'}; copyProps(window, global);
Copy the code

All objects under the window are copied to the Global object under the Node environment, so as to realize the simulation of the browser environment under the Node environment.

other

Do not have undefined or unrecognized variables and methods, including dependent components, in the life cycle methods called by the server render, such as constructor, componentWillMount, render, etc., or errors will occur.

HTML file generation

Each individual page needs to generate an HTML file, so we need a template engine. Docsite uses EJS as a template engine for rendering. The contents of this template look like this:



      
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0, the maximum - scale = 1.0, user - scalable = no">
    <meta name="keywords" content="<%= keywords %>" />
    <meta name="description" content="<%= description %>" />
    <! -- Page tag title -->
    <title><% = title% ></title>
    <link rel="shortcut icon" href="<%= rootPath %>/img/docsite.ico"/>
    <link rel="stylesheet" href="<%= rootPath %>/build/<%= page %>.css" />
</head>
<body>
    <div id="root"><% - __html% ></div>
    <script src="https://f.alicdn.com/react/15.4.1/react-with-addons.min.js"></script>
    <script src="https://f.alicdn.com/react/15.4.1/react-dom.min.js"></script>
    <script>
        window.rootPath = '<%= rootPath %>';
  </script>
    <script src="<%= rootPath %>/build/<%= page %>.js"></script>
</body>
</html>
Copy the code

Docsite injects variables into it during the build process. Keywords, description, and title are metadata defined in the Markdown file. RootPath is the rootPath of the site, which is described later. Page is a resource that corresponds to different pages and is named the same as a level 1 folder in the Pages directory. __html is the injected HTML string, including the react conversion and markdown conversion.

__html injection

  • HTML page corresponding to the Markdown file

The HTML page corresponding to the MarkDown file, including the content of the page component and the HTML string converted by the Markdown file. The page component takes precedence over the HTML string injected from props (which is injected at build time by DocSite to build the concrete HTML file). At the same time, to ensure that different Markdown files share a react page component, in the actual browser environment, the request tool is used to load the JSON file generated by construction, so as to obtain the HTML string corresponding to the Markdown file.

  • HTML pages corresponding to the rest of the page components

Use reactdomServer. render directly to generate the file.

SEO and performance

Generate an HTML for each page, including markdown files, which not only solves the problem that search engines include pages, but also can display pages without loading JS files, thus solving the problem of long blank screen caused by slow loading of JS files.

Path to deal with

Path rules

Because the entire site supports internationalization, you need to start each accessible path with /zh-cn or /en-us, so the HTML files for all accessible pages are in those two folders.

Path prefix

When the site is deployed on some statically managed sites, the root path is not /. For example, github Pages, whose root path is usually /repertory_name/, can be a nightmare if you need to deploy resources to multiple platforms. To do this, docsite extracts the rootPath and places it in the rootPath field of site_config/site.js for configuration. The configuration rules are as follows:

  • When the deployment root path is/Is set to' 'An empty string will do.
  • When the deployment root path is not/Is set to the specific root path/A beginning, but no end/.

Reference addresses within the site all start with a /, and in final processing, are concatenated with the globally injected window. RootPath in the template to get the final access address.

Cross-references within markdown files

Sometimes, a Markdown file needs to reference another Markdown file, and it is not practical for users to specify the actual online address of the site after it goes live. It may be more common to specify the relative directory relationship between files directly. These path conversions do not need to take place in markDown conversions to HTML strings. The mapping between the markdown file path and the page path is as follows:

/docs/zh-cn/dir/demo.md <=> /zh-cn/docs/dir/demo.html

Therefore, it is easy to infer the actual access path corresponding to the Markdown file from this transformation rule. Combined with rootPath, the actual page access address is finally obtained.

redirect

On the one hand, when sharing a site address with others, you may need to do a language version jump, For example, jump from https://txd-team.github.io/docsite-doc-v1/ to https://txd-team.github.io/docsite-doc-v1/zh-cn/. Or when a user visits the site and visits a page that doesn’t exist on the site, a 404.html page is required to redirect to a normal page.

By default, DocSite generates index.html and 404.html from the template redirect.ejs in the project root directory (for some static site-hosting platforms to customize 404 pages). Ejs is configured with redirect logic for accessing the root directory. As follows:

<script>
  window.rootPath = '<%= rootPath %>';
  window.defaultLanguage = '<%= defaultLanguage %>';
  var lang = Cookies.get('docsite_language');
  if(! lang) { lang ='<%= defaultLanguage %>';
  }
  window.location = window.rootPath + '/' + lang + '/docs/installation.html';
</script>
Copy the code

Custom page

The built-in docSite template contains the home page, documentation page, blog list page, blog details page and community page by default, corresponding to home, Documentation, blog, blogDetail and community in the SRC/Pages directory respectively. For JS and CSS resources, docSite will take the folder name in SRC/Pages directory as the name of JS and CSS resources when building, generate the corresponding JS and CSS files in build directory, and inject them into the page when generating HTML pages through EJS.

conclusion

At present, the official version of Docsite has been released, serving the establishment of several open source sites of the department and receiving good feedback. Txd-team.github. IO /docsite-doc… .

Welcome to pay attention to Alibaba TXD team wechat public number yo, more content (Mei Zi) waiting for you to lift ~