Why are we doing this

Recently, I participated in the translation project of nuggets and produced a large number of Markdown articles. I hope to set up a personal website to store some of my own articles

After looking at some ready-made solutions, I feel that there are too many limitations, which is not conducive to the later customization. In addition, I am a super fan of building wheels, so 👀

I built my own processing system

Don’t ask me why I do this, I think it’s fun (, and the ability to customize super, can add control at will!!

Demand function

  • Render the Markdown tag
  • Custom page
  • Post can simply update all the article pages
  • To facilitate transplantation

The tools used this time are:

The front part

The basic front-end page construction uses native Vanilla JavaScript + CSS + HTML to construct a static article presentation system.

SCSS is used for styling, and GitHub’s theme @primer library has been tweaked

The backend part

The combination of Java + Kotlin is used in article processing (tag capture, image crawl), mainly because I am quite familiar with Kotlin, after all, I have been working on Kotlin project for a year.

To construct

The back-end processing

Metadata processing

Java mainly deals with metadata, including the reading of the tag (classification) of the article, the parsing and crawling of the article picture, as well as the reading of the article title introduction and other information, the specific ideas are as follows:

SequenceDiagram Participant Z as Markdown Activate Z Alt with L ->> Z: download else NOT L ->> Z: skip End Z ->> L: Tag data activate Z Alt with L ->>z: skip else not L ->> Z: skip Load data (GitHub/local) end End

Take a look at the template content. The default template:

> * Tag: Tag A, tag B # titleCopy the code

Translation template:

> []() > *[]()
> * From: [Nuggets translation project] (https://github.com/xitu/gold-miner)
> *Permanent link to this article: [https://github.com/xitu/gold-miner/blob/master/article/2021/.md] (https://github.com/xitu/gold-miner/blob/master/article/2021/.md)
> * * *Reviewer:

# titles
Copy the code

Here all of the code are stored in the here: PassionPenguin: PageGenerator/IO hoarfroster, first is to realize the access to all files:

fun main(args: Array<String>) {
    var downloadImage = false
    var inputDir: String? = null

    for (name in args) {
        if (name.contains(Regex("--input=(.+?) ")))
            inputDir = name.substring(8)}if (inputDir == null)
        return

    val dir = File("${inputDir}/documents/")
    val files = dir.listFiles { _, name -> name.endsWith(".md")}}Copy the code

Then use mapIndexed to iterate over all Markdown files and read the files:

files? .mapIndexed { index, it ->var sourceMarkdown = it.readText()
    val document = Jsoup.parse(HtmlRenderer.builder().build().render(Parser.builder().build().parse(sourceMarkdown)))
}
Copy the code

Here, commonMark and Jsoup are used to parse Markdown files into HTML files (it’s not necessary, full-text regular matches, but it still feels a bit cumbersome).

The information we need is:

  • The title
  • The label
  • Introduction to the
  • The file name
  • Last Modified time
  • (Remote link)
  • (translator)
The title
document.selectFirst("h1").text()
Copy the code
The label
  1. Making the labels

URLConnection = URLConnection; URLConnection = URLConnection;

package io.hoarfroster

import org.jsoup.Jsoup
import java.net.URL
import javax.net.ssl.HttpsURLConnection

class RetrieveResult(val tags: MutableList<Tag>)

fun retrieveResult(repoUrl: String): RetrieveResult {
    println(" - Processing tags data")
    val tags: MutableList<Tag> = mutableListOf()
    val connection = URL(repoUrl.replace("blob"."commits").replace(""."% 20")).openConnection() as HttpsURLConnection
    val document = Jsoup.parse(connection.inputStream.bufferedReader().readText())
    document.select("[data-hovercard-type=\"pull_request\"][data-url].issue-link.js-issue-link")
        .filter { e ->
            Regex("# ([0-9] +)? $").matches(e.html())
        }.map { e ->
            Regex("# ([0-9] +)? $").find(e.html())? .groupValues? .get(1)
        }.forEach { it ->
            Thread.sleep(1000)
            val conn = URL("https://github.com/xitu/gold-miner/pull/$it").openConnection() as HttpsURLConnection
            val doc = Jsoup.parse(conn.inputStream.bufferedReader().readText())
            if (doc.select(".js-issue-labels > *").size > 0
                && doc.selectFirst(".js-issue-labels").text().contains("Translation completed")
            ) {
                doc.select(".js-issue-labels > *").forEach {
                    if(! it.text().contains("Translation completed"))
                        tags.add(Tag(it.text()))
                }
            }
        }
    return RetrieveResult(tags = tags)
}
Copy the code
  1. In the label

Even simpler, direct RegEx:

val tags = mutableListOf<Tag>()
Regex("Tag :(.+?) \n").find(sourceMarkdown)? .groupValues? .get(1)? .split("、")? .forEach { tags.add(Tag(it)) }Copy the code
Introduction to the

Read the first paragraph directly as the introduction:

var description = ""
for (e in document.select("p")) {
    if (e.text().isNotBlank()) {
        description = e.text()
        break}}Copy the code
The file name
it.path.replace("${inputDir}/documents/"."")
Copy the code
Last Modified time
Date(it.lastModified()).toString()
Copy the code
(Remote link)
Regex("Permalink: \\[.+?] \ \ ((. +?) \ \").find(sourceMarkdown)? .groupValues? .get(1) ?: ""
Copy the code
(translator)
\\[(.+?)] ").find(sourceMarkdown)? .groupValues? .get(1) ? : ""Copy the code

Images are downloaded

Pretty simple = = :

println(" - Processing image")
document.select("img").forEach { img ->
    /* Download external resources */
    val alt = img.attr("alt")
    val urlString = img.attr("src")

    with(
        File(
        "${inputDir}/images/${it.path.replace("${inputDir}/documents/"."")}-${urlString.getLastSegment()}")) {/* Only download the image if the file is not existed */
        if ((!this.isFile || !this.exists()) && ! urlString.startsWith(".. /images/")) {
            Thread.sleep(1000)
            println("   - Processing image $urlString")
            if (!this.parentFile.isDirectory || this.parentFile.exists())
                this.parentFile.mkdirs()
            this.createNewFile()
            val imageUrlConn = URL(urlString).openConnection()
            imageUrlConn.setRequestProperty("referer", URL(urlString).host)
            imageUrlConn.setRequestProperty(
                "user-agent"."Mozilla / 5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36"
            )
            imageUrlConn.setRequestProperty("origin"."https://www.medium.com/")

            val `in`: InputStream = BufferedInputStream(imageUrlConn.getInputStream())

            val out = ByteArrayOutputStream()
            val buf = ByteArray(1024)
            var n: Int
            while (-1! = `in`.read(buf).also { n = it }) {
                out.write(buf, 0, n)
            }
            out.close()
            `in`.close()
            val response = out.toByteArray()
            this.outputStream().write(response)
            sourceMarkdown = sourceMarkdown.replace(
                "" "! [$alt] ($urlString"" "."" "! [$alt] (.. /images/${
                    it.path.replace(
            "${inputDir}/documents/"."")}-${urlString.getLastSegment()}"" "
            )
            it.writeText(sourceMarkdown)
        }
    }
}
Copy the code

Then each time it runs, execute directly:

  • / Users/penguin/Desktop/PageGenerator/build/libs/PageGenerator jar: Gradle generated jar file
  • Warehouse/Library/WebServer/Documents: file
java -jar /Users/penguin/Desktop/PageGenerator/build/libs/PageGenerator.jar --input=/Library/WebServer/Documents/ --downloadImage
Copy the code

Generate HTML file:

The key code for the markdown-it library is pretty simple:

exports.render = async (config) => {
  const mdFilePath = path.resolve(config.cwd, config['mdFile'])
  let renderContent = md.render(fse.readFileSync(mdFilePath, 'utf-8'))
  let parser = new DOMParser()
  let document = parser.parseFromString(renderContent, 'text/html')
  let title = document.getElementsByTagName('h1') [0].textContent
  let html = ` <! doctype html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, User - scalable = no, initial - scale = 1.0, the maximum - scale = 1.0, Minimum scale=1.0"> <meta http-equiv=" x-UA-compatible "content=" IE =edge"> <title>${title}- Hoarfroster</title> <link rel="stylesheet" href="/assets/styles/post.css"> <link rel="stylesheet" A href = "/ / cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/styles/atom-one-light.min.css" > < / head > < body > < div class="container post markdown-body">${renderContent}</div>
<div class="footer"></div>
</body>
<script src="/assets/scripts/index.js"></script>
<script>init()</script>
</html>
`
  config.out = path.resolve(config.cwd, config.out)
  const fileReg = / [^ / \ \] *) \. [^ / \ \] + $/
  if(! config.out.match(fileReg)) {// if no file suffix, use the same as markdown file
    config.out = path.resolve(
      config.out,
      mdFilePath.match(fileReg)[1] + '.html'
    )
  }
  fse.writeFileSync(config.out, html)
}
Copy the code

The front part

Here’s the big catch: GitHub’s styling is completely different from the @Primer library’s content! I was forced to console out thousands of lines of CSS color variables:

As a result, GitHub recently launched a new theme to squeeze the penguins dry.

Finally, I used JavaScript to add headers and footers to each post:

The current effect

The home page

404 Not Found

About

Body content

GitHub CI

name: Page Automator

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Set up JDK 1.8
        uses: actions/setup-java@v1
        with:
          java-version: 1.8

      - name: Use Node.js 15.x
        uses: actions/setup-node@v1
        with:
          node-version: 15.x

      - name: setup git config
        run: | git config --global user.name "Hoarfroster Bot" git config --global user.email "
      
       "
      @qq.com>
      - uses: actions/checkout@master
        with:
          repository: PassionPenguin/PageGenerator
          path: ./page-generator

      - uses: actions/checkout@master
        with:
          path: ./documents

      - name: Build with Gradle
        run: | cd ./page-generator ./gradlew build
      - name: Generate Structure
        run: | echo "Processing Markdown Files" java -jar page-generator/build/libs/PageGenerator.jar --input=./documents cd ./documents
          git add *
          if [[ -n $(git status -uno --porcelain) ]]
          then
            git commit -m "Generate Structure"
            git push origin master
          fi

      - name: NPM Install
        run: | npm i -g markdown-html-gen
      - name: Generate HTML Pages with Markdown Files
        run: | echo "Generating HTML Files" cd ./documents for f in documents/*.md do htmlpath=${f/documents/archive} htmlpath=${htmlpath/md/html} md2html "$f" -o ./archive echo " - Generated $htmlpath" done
          git add *
          if [[ -n $(git status -uno --porcelain) ]]
          then
            git commit -m "Build Pages"
            git push origin master
          fi
Copy the code

Reflection and summary

It’s good to try, but it needs to be improved:

  • Follow the Tag (prototype is built)
  • search
  • Optimize CI/CD (now still manually run local, ready to use WebHook, let the server connect to GitHub WebHook, crawl, and then Push back to GitHub)

There is a problem that should be optimized:

  • Kotlin’s PageGenerator code is messed up

In these three months, translation + proofreading + original articles to 100, sa Hua Sa Hua, Wuhuhu ~ 🎉🎉

Finally, continue to promote the Nuggets Translation Project, a community that translates quality Internet technical articles from English sharing articles on Nuggets. The content covers blockchain, artificial intelligence, Android, iOS, front-end, back-end, design, product, algorithm and other fields, as well as various large quality official documents and manuals, readers for the new cutting-edge developers who love new technologies.

At present, the project has translated more than 2345 articles, 13 official documents and manuals, with more than 1,000 translators contributing to the translation and proofreading.

Welcome to join us. Ow!

This article is participating in the “Nuggets 2021 Spring Recruitment Campaign”, click to see the details of the campaign