Hello everyone, I am Han Cao 😈, a grass code ape 🐒. Intermittent blood 🔥, continuous sand sculpture 🌟 if you like my article, you can pay attention to ➕ point like, grow with me ~ wechat: Hancao97, plus my wechat free to see the front-end early chat 10.23 vUE special live
This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.
background
Everyone over the National Day how 🌟, the end of the holiday 😮💨, also should accept the heart, then to see the article of cold grass.
I don’t know if you’ve ever been teased about how uncomfortable and messy the document format is, for example, by showing you a set of typographical comparisons:
- Bad case: I see you this egg is good, give me 10kg.
- Good case: I see you this egg is good, give me 10 kg.
Punctuation errors and no Spaces make people uncomfortable 😭. In fact, from this case, we can see that standard and tidy typesetting can greatly improve the reading experience of articles or documents 🔥.
As a sharer, neat typesetting is more important, which can significantly improve readers’ experience. Therefore, I also planned to improve the quality of my articles, but there are many problems in manually checking whether the typesetting is standard:
- High labor cost
- There is no guarantee of 100% verification
Therefore, I want to improve the reading experience of the document by means of automation, and the way I came up with is through the VScode plug-in 🌟, so I came up with my latest plug-in: Markdown format “can be found in the VScode plug-in store ~ there must be bugs, welcome everyone to help test”.
The document layout specification in the plug-in comes from: translation layout rules refer to north
Plan implementation
Create a VS code-extension project
Start with the familiar process of creating a project with scaffolding:
npm install -g yo generator-code
Copy the code
The inputyo code
Initialization code
Let’s first configure the package.json file:
"activationEvents": [
"onCommand:markdown-format.format"]."main": "./extension.js"."contributes": {
"commands": [{"command": "markdown-format.format"."title": "Format markdown"}]."menus": {
"explorer/context": [{"command": "markdown-format.format"."when": "filesExplorerFocus"."group": "navigation@1"}}},Copy the code
The format command is configured, which is triggered by right clicking on a file.
Determine whether the file type is compliant
The next step is to check whether the file type is markdown. If the file type is not markdown, no operation is performed.
const isMarkdown = (path) = > {
if (isDir(path)) {
return false;
}
if(extname(path) ! ='.md') {
return false;
}
return true;
}
Copy the code
Lexical analysis
The next step is lexical analysis, parsing markdown text into token sequences, which can be divided into the following types:
- The blank space
SPACE
- A newline
ENTER
- Half Angle of symbols
HALF
- The Angle of symbol
FULL
- Chinese
CHINESE
- digital
NUMBER
- English
ENGLISH
If you are not familiar with the principles of compilation, you can read my previous article: Principles of front-end compilation (1) : Introduction to compilation
This is essentially just iterating through the string
function lexicalAnalysis(content) {
const tokenList = [];
let currentStr = ' ';
let currentType = ' ';
const handleChar = (char, type) = > {
if (currentType == type) {
currentStr += char;
} else {
tokenList.push({
type: currentType,
content: currentStr }) currentStr = char; currentType = type; }}for (const char of content) {
if (char == ' ') {
handleChar(char, 'SPACE');
} else if (char == '\n') {
handleChar(char, 'ENTER');
} else if (char.match(/[\x21-\x2f\x3a-\x40\x5b-\x60\x7B-\x7F]/) || char.charCodeAt() === 8212) {
// Half Angle and dash --
handleChar(char, 'HALF');
} else if (char.charCodeAt() === 12290 || (char.charCodeAt() > 65280 && char.charCodeAt() < 65375)) {
// Full Angle.TODO:Why is the charcode for periods so strange
handleChar(char, 'FULL');
} else if (char.match(/[\u4e00-\u9fa5]/)) {
/ / Chinese
handleChar(char, 'CHINESE')}else if (char.match(/ [0-9])) {
handleChar(char, 'NUMBER');
} else {
handleChar(char, 'ENGLISH');
}
}
tokenList.push({
type: currentType,
content: currentStr
})
return tokenList;
}
Copy the code
What is the format of the Token sequence
The markdown content is:
You are a`asdas`Cold -- grass. W mainly 123ni.D... Ah... Couldn't get on makingCopy the code
The Token sequence is:
0: {type: ' '.content: ' '}
1: {type: 'CHINESE'.content: 'are you'}
2: {type: 'HALF'.content: '`}
3: {type: 'ENGLISH'.content: 'asdas'}
4: {type: 'HALF'.content: '`}
5: {type: 'CHINESE'.content: 'cold'}
6: {type: 'HALF'.content: The '-'}
7: {type: 'CHINESE'.content: 'the grass'}
8: {type: 'HALF'.content: '. '}
9: {type: 'ENGLISH'.content: 'w'}
10: {type: 'SPACE'.content: ' '}
11: {type: 'ENTER'.content: '\n'}
12: {type: 'CHINESE'.content: 'Mostly'}
13: {type: 'NUMBER'.content: '123'}
14: {type: 'ENGLISH'.content: 'ni'}
15: {type: 'HALF'.content: '. '}
16: {type: 'ENGLISH'.content: 'D'}
17: {type: 'HALF'.content: '... '}
18: {type: 'SPACE'.content: ' '}
19: {type: 'ENTER'.content: '\n'}
20: {type: 'CHINESE'.content: 'ah'}
21: {type: 'FULL'.content: '... '}
22: {type: 'CHINESE'.content: 'Can't get up'}
23: {type: 'ENGLISH'.content: 'github'}
Copy the code
Processing Token sequences
Here I’m going to operate directly on this Token sequence and then put it together, but I’ll follow each rule separately
Add Spaces between Chinese and English
- Mistake: I like the flexibility of javascript
- True: I like the flexibility of javascript
The idea here is:
- If MY current
token
The type is Chinese and nexttoken
If the type is numeric or English, add Spaces - If MY current
token
The type is Chinese and the following content ismarkdown
Lines of code in syntax are highlighted, bold, italicized and followed by numbers or English, also with Spaces - If the match is now a number or English, and the next
token
If the type is Chinese, add a space - Now if I match theta
markdown
Code lines in the syntax are highlighted, bold, and italicized. If the first digit is In English or a number and the last digit is in Chinese, insert Spaces
if(tokenList[i].type == 'CHINESE') {
if(tokenList[i+1] && ['ENGLISH'.'NUMBER'].includes(tokenList[i+1].type)) {
resList.push(SPACE_TOKEN);
}
if(tokenList[i+2] && ['`.'* *'.The '*'].includes(tokenList[i+1].content) && ['ENGLISH'.'NUMBER'].includes(tokenList[i+2].type)){ resList.push(SPACE_TOKEN); }}/ / after
if(['ENGLISH'.'NUMBER'].includes(tokenList[i].type)) {
if(tokenList[i+1] && tokenList[i+1].type === 'CHINESE') { resList.push(SPACE_TOKEN); }}if(['`.'* *'.The '*'].includes(tokenList[i].content)) {
if(resList[resList.length - 2] && ['ENGLISH'.'NUMBER'].includes(resList[resList.length - 2].type) && tokenList[i+1] && tokenList[i+1].type === 'CHINESE') { resList.push(SPACE_TOKEN); }}Copy the code
Spaces must be added between numbers and units
- Wrong: This basket of eggs weighs 5kg.
- Correct: This basket of eggs weighs 5 kg.
B: But there are problems. C: But there are problems.
TODO: There is a margin of error here, and I am wondering whether to maintain a list of units
if(tokenList[i].type === 'NUMBER' && tokenList[i+1] && tokenList[i+1].type === 'ENGLISH') {
resList.push(SPACE_TOKEN);
}
Copy the code
There is no space between full-angle punctuation marks and other characters
- Mistake: I like you, you are the fairy.
- Correct: I like you, you are a fairy.
The idea here is:
- If the current character is a space and the following character is a full-angle punctuation mark, this space is deleted
- If the current character is a space and the preceding character is a full-angle punctuation mark, this space is deleted
if(resList[resList.length - 1].type == 'SPACE'&&tokenList[i+1]&&tokenList[i+1].type === 'FULL') {
resList.pop();
}
// -- Avoid inconsistent situations
if(resList[resList.length - 1] = ='SPACE'&&resList[resList.length - 2]&&resList[resList.length - 2].type === 'FULL') {
resList.pop();
}
Copy the code
Don’t reuse punctuation marks
- Mistake: Are you undercover? !!!!!!!!!
- Correct: Are you undercover? !
This is only true, right? ,! , [,] is processed
if(tokenList[i].type === 'FULL') {
resList.pop();
let end = ' ';
let str = ' ';
for(const char of tokenList[i].content) {
if(end == char && ["?"."!"."【"."】"].includes(char)){
continue;
} else {
str += char;
}
}
resList.push({
type: 'FULL'.content: str
});
}
Copy the code
ellipsis
- Wrong: There are tomatoes, potatoes, celery…
- Correct: there are tomatoes, potatoes, celery…
The ellipsis should read:…
if(resList[resList.length - 2] && ['CHINESE'.'ENGLISH'].includes(resList[resList.length - 2].type) && tokenList[i].content.match(/ ^ [. \] {2} $/)){
resList.pop();
resList.push({
type: 'HALF'.content: '... '})}Copy the code
Chinese is followed by full – Angle characters
- Hello, my name is Han Cao.
- Correct: Hello, my name is Han Cao.
Temporarily support switching punctuation:
- . .
- .
- ? ?
- ! !
- ; ;
- : :
In consideration of some special cases of closed labels, no processing is performed
if(['. '.'? '.'! '.'; '.':'].includes(tokenList[i].content)) {
if(resList[resList.length - 2] && resList[resList.length - 2].type == 'CHINESE') {
resList.pop();
resList.push({
type: 'FULL'.content: punctuationMap.get(tokenList[i].content) }); }}Copy the code
English is followed by half – corner characters
- 错 : I am hancao.
- Correct: I am hancao.
English is the same as Chinese
if(['. '.'? '.'! '.'; '.':'].includes(tokenList[i].content)) {
if(resList[resList.length - 2] && resList[resList.length - 2].type == 'ENGLISH') {
resList.pop();
resList.push({
type: 'HALF'.content: punctuationMap.get(tokenList[i].content) }); }}Copy the code
Function demonstration
Before formatting:
After formatting:
conclusion
There must be some bugs at present, AND I will continue to improve in my own use
If you like my article, like and follow is my biggest support 🌿, thank you ~
Write in the last
You make my world colorful
class World {
constructor(name) {
this.isColorful = false;
this.owner = name;
}
add(somebody) {
if(somebody === 'YOU' && this.owner === 'cold grass') {
this.isColorful = true; }}}Copy the code
Add my wechat: HancAO97, invite you to join the group, learn to communicate front end together, become a better engineer ~ can also watch the front end early chat for free on October 23 vUE special live, there is especially big share oh ~