Why are we doing this
Recently, I participated in the translation project of nuggets and produced a large number of Markdown articles. I hope to set up a personal website to store some of my own articles
After looking at some ready-made solutions, I feel that there are too many limitations, which is not conducive to the later customization. In addition, I am a super fan of building wheels, so 👀
I built my own processing system
Don’t ask me why I do this, I think it’s fun (, and the ability to customize super, can add control at will!!
Demand function
- Render the Markdown tag
- Custom page
- Post can simply update all the article pages
- To facilitate transplantation
The tools used this time are:
The front part
The basic front-end page construction uses native Vanilla JavaScript + CSS + HTML to construct a static article presentation system.
SCSS is used for styling, and GitHub’s theme @primer library has been tweaked
The backend part
The combination of Java + Kotlin is used in article processing (tag capture, image crawl), mainly because I am quite familiar with Kotlin, after all, I have been working on Kotlin project for a year.
To construct
The back-end processing
Metadata processing
Java mainly deals with metadata, including the reading of the tag (classification) of the article, the parsing and crawling of the article picture, as well as the reading of the article title introduction and other information, the specific ideas are as follows:
SequenceDiagram Participant Z as Markdown Activate Z Alt with L ->> Z: download else NOT L ->> Z: skip End Z ->> L: Tag data activate Z Alt with L ->>z: skip else not L ->> Z: skip Load data (GitHub/local) end End
Take a look at the template content. The default template:
> * Tag: Tag A, tag B # titleCopy the code
Translation template:
> []() > *[]()
> * From: [Nuggets translation project] (https://github.com/xitu/gold-miner)
> *Permanent link to this article: [https://github.com/xitu/gold-miner/blob/master/article/2021/.md] (https://github.com/xitu/gold-miner/blob/master/article/2021/.md)
> * * *Reviewer:
# titles
Copy the code
Here all of the code are stored in the here: PassionPenguin: PageGenerator/IO hoarfroster, first is to realize the access to all files:
fun main(args: Array<String>) {
var downloadImage = false
var inputDir: String? = null
for (name in args) {
if (name.contains(Regex("--input=(.+?) ")))
inputDir = name.substring(8)}if (inputDir == null)
return
val dir = File("${inputDir}/documents/")
val files = dir.listFiles { _, name -> name.endsWith(".md")}}Copy the code
Then use mapIndexed to iterate over all Markdown files and read the files:
files? .mapIndexed { index, it ->var sourceMarkdown = it.readText()
val document = Jsoup.parse(HtmlRenderer.builder().build().render(Parser.builder().build().parse(sourceMarkdown)))
}
Copy the code
Here, commonMark and Jsoup are used to parse Markdown files into HTML files (it’s not necessary, full-text regular matches, but it still feels a bit cumbersome).
The information we need is:
- The title
- The label
- Introduction to the
- The file name
- Last Modified time
- (Remote link)
- (translator)
The title
document.selectFirst("h1").text()
Copy the code
The label
- Making the labels
URLConnection = URLConnection; URLConnection = URLConnection;
package io.hoarfroster
import org.jsoup.Jsoup
import java.net.URL
import javax.net.ssl.HttpsURLConnection
class RetrieveResult(val tags: MutableList<Tag>)
fun retrieveResult(repoUrl: String): RetrieveResult {
println(" - Processing tags data")
val tags: MutableList<Tag> = mutableListOf()
val connection = URL(repoUrl.replace("blob"."commits").replace(""."% 20")).openConnection() as HttpsURLConnection
val document = Jsoup.parse(connection.inputStream.bufferedReader().readText())
document.select("[data-hovercard-type=\"pull_request\"][data-url].issue-link.js-issue-link")
.filter { e ->
Regex("# ([0-9] +)? $").matches(e.html())
}.map { e ->
Regex("# ([0-9] +)? $").find(e.html())? .groupValues? .get(1)
}.forEach { it ->
Thread.sleep(1000)
val conn = URL("https://github.com/xitu/gold-miner/pull/$it").openConnection() as HttpsURLConnection
val doc = Jsoup.parse(conn.inputStream.bufferedReader().readText())
if (doc.select(".js-issue-labels > *").size > 0
&& doc.selectFirst(".js-issue-labels").text().contains("Translation completed")
) {
doc.select(".js-issue-labels > *").forEach {
if(! it.text().contains("Translation completed"))
tags.add(Tag(it.text()))
}
}
}
return RetrieveResult(tags = tags)
}
Copy the code
- In the label
Even simpler, direct RegEx:
val tags = mutableListOf<Tag>()
Regex("Tag :(.+?) \n").find(sourceMarkdown)? .groupValues? .get(1)? .split("、")? .forEach { tags.add(Tag(it)) }Copy the code
Introduction to the
Read the first paragraph directly as the introduction:
var description = ""
for (e in document.select("p")) {
if (e.text().isNotBlank()) {
description = e.text()
break}}Copy the code
The file name
it.path.replace("${inputDir}/documents/"."")
Copy the code
Last Modified time
Date(it.lastModified()).toString()
Copy the code
(Remote link)
Regex("Permalink: \\[.+?] \ \ ((. +?) \ \").find(sourceMarkdown)? .groupValues? .get(1) ?: ""
Copy the code
(translator)
\\[(.+?)] ").find(sourceMarkdown)? .groupValues? .get(1) ? : ""Copy the code
Images are downloaded
Pretty simple = = :
println(" - Processing image")
document.select("img").forEach { img ->
/* Download external resources */
val alt = img.attr("alt")
val urlString = img.attr("src")
with(
File(
"${inputDir}/images/${it.path.replace("${inputDir}/documents/"."")}-${urlString.getLastSegment()}")) {/* Only download the image if the file is not existed */
if ((!this.isFile || !this.exists()) && ! urlString.startsWith(".. /images/")) {
Thread.sleep(1000)
println(" - Processing image $urlString")
if (!this.parentFile.isDirectory || this.parentFile.exists())
this.parentFile.mkdirs()
this.createNewFile()
val imageUrlConn = URL(urlString).openConnection()
imageUrlConn.setRequestProperty("referer", URL(urlString).host)
imageUrlConn.setRequestProperty(
"user-agent"."Mozilla / 5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36"
)
imageUrlConn.setRequestProperty("origin"."https://www.medium.com/")
val `in`: InputStream = BufferedInputStream(imageUrlConn.getInputStream())
val out = ByteArrayOutputStream()
val buf = ByteArray(1024)
var n: Int
while (-1! = `in`.read(buf).also { n = it }) {
out.write(buf, 0, n)
}
out.close()
`in`.close()
val response = out.toByteArray()
this.outputStream().write(response)
sourceMarkdown = sourceMarkdown.replace(
"" "! [$alt] ($urlString"" "."" "! [$alt] (.. /images/${
it.path.replace(
"${inputDir}/documents/"."")}-${urlString.getLastSegment()}"" "
)
it.writeText(sourceMarkdown)
}
}
}
Copy the code
Then each time it runs, execute directly:
- / Users/penguin/Desktop/PageGenerator/build/libs/PageGenerator jar: Gradle generated jar file
- Warehouse/Library/WebServer/Documents: file
java -jar /Users/penguin/Desktop/PageGenerator/build/libs/PageGenerator.jar --input=/Library/WebServer/Documents/ --downloadImage
Copy the code
Generate HTML file:
The key code for the markdown-it library is pretty simple:
exports.render = async (config) => {
const mdFilePath = path.resolve(config.cwd, config['mdFile'])
let renderContent = md.render(fse.readFileSync(mdFilePath, 'utf-8'))
let parser = new DOMParser()
let document = parser.parseFromString(renderContent, 'text/html')
let title = document.getElementsByTagName('h1') [0].textContent
let html = ` <! doctype html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, User - scalable = no, initial - scale = 1.0, the maximum - scale = 1.0, Minimum scale=1.0"> <meta http-equiv=" x-UA-compatible "content=" IE =edge"> <title>${title}- Hoarfroster</title> <link rel="stylesheet" href="/assets/styles/post.css"> <link rel="stylesheet" A href = "/ / cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/styles/atom-one-light.min.css" > < / head > < body > < div class="container post markdown-body">${renderContent}</div>
<div class="footer"></div>
</body>
<script src="/assets/scripts/index.js"></script>
<script>init()</script>
</html>
`
config.out = path.resolve(config.cwd, config.out)
const fileReg = / [^ / \ \] *) \. [^ / \ \] + $/
if(! config.out.match(fileReg)) {// if no file suffix, use the same as markdown file
config.out = path.resolve(
config.out,
mdFilePath.match(fileReg)[1] + '.html'
)
}
fse.writeFileSync(config.out, html)
}
Copy the code
The front part
Here’s the big catch: GitHub’s styling is completely different from the @Primer library’s content! I was forced to console out thousands of lines of CSS color variables:
As a result, GitHub recently launched a new theme to squeeze the penguins dry.
Finally, I used JavaScript to add headers and footers to each post:
The current effect
The home page
404 Not Found
About
Body content
GitHub CI
name: Page Automator
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Set up JDK 1.8
uses: actions/setup-java@v1
with:
java-version: 1.8
- name: Use Node.js 15.x
uses: actions/setup-node@v1
with:
node-version: 15.x
- name: setup git config
run: | git config --global user.name "Hoarfroster Bot" git config --global user.email "
"
@qq.com>
- uses: actions/checkout@master
with:
repository: PassionPenguin/PageGenerator
path: ./page-generator
- uses: actions/checkout@master
with:
path: ./documents
- name: Build with Gradle
run: | cd ./page-generator ./gradlew build
- name: Generate Structure
run: | echo "Processing Markdown Files" java -jar page-generator/build/libs/PageGenerator.jar --input=./documents cd ./documents
git add *
if [[ -n $(git status -uno --porcelain) ]]
then
git commit -m "Generate Structure"
git push origin master
fi
- name: NPM Install
run: | npm i -g markdown-html-gen
- name: Generate HTML Pages with Markdown Files
run: | echo "Generating HTML Files" cd ./documents for f in documents/*.md do htmlpath=${f/documents/archive} htmlpath=${htmlpath/md/html} md2html "$f" -o ./archive echo " - Generated $htmlpath" done
git add *
if [[ -n $(git status -uno --porcelain) ]]
then
git commit -m "Build Pages"
git push origin master
fi
Copy the code
Reflection and summary
It’s good to try, but it needs to be improved:
- Follow the Tag (prototype is built)
- search
- Optimize CI/CD (now still manually run local, ready to use WebHook, let the server connect to GitHub WebHook, crawl, and then Push back to GitHub)
There is a problem that should be optimized:
- Kotlin’s PageGenerator code is messed up
In these three months, translation + proofreading + original articles to 100, sa Hua Sa Hua, Wuhuhu ~ 🎉🎉
Finally, continue to promote the Nuggets Translation Project, a community that translates quality Internet technical articles from English sharing articles on Nuggets. The content covers blockchain, artificial intelligence, Android, iOS, front-end, back-end, design, product, algorithm and other fields, as well as various large quality official documents and manuals, readers for the new cutting-edge developers who love new technologies.
At present, the project has translated more than 2345 articles, 13 official documents and manuals, with more than 1,000 translators contributing to the translation and proofreading.
Welcome to join us. Ow!
This article is participating in the “Nuggets 2021 Spring Recruitment Campaign”, click to see the details of the campaign