Hello everyone, I am Han Cao 😈, a grass code ape 🐒. Intermittent blood 🔥, continuous sand sculpture 🌟 if you like my article, you can pay attention to ➕ point like, grow with me ~ wechat: Hancao97, plus my wechat free to see the front-end early chat 10.23 vUE special live

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

background

Everyone over the National Day how 🌟, the end of the holiday 😮💨, also should accept the heart, then to see the article of cold grass.

I don’t know if you’ve ever been teased about how uncomfortable and messy the document format is, for example, by showing you a set of typographical comparisons:

  • Bad case: I see you this egg is good, give me 10kg.
  • Good case: I see you this egg is good, give me 10 kg.

Punctuation errors and no Spaces make people uncomfortable 😭. In fact, from this case, we can see that standard and tidy typesetting can greatly improve the reading experience of articles or documents 🔥.

As a sharer, neat typesetting is more important, which can significantly improve readers’ experience. Therefore, I also planned to improve the quality of my articles, but there are many problems in manually checking whether the typesetting is standard:

  • High labor cost
  • There is no guarantee of 100% verification

Therefore, I want to improve the reading experience of the document by means of automation, and the way I came up with is through the VScode plug-in 🌟, so I came up with my latest plug-in: Markdown format “can be found in the VScode plug-in store ~ there must be bugs, welcome everyone to help test”.

The document layout specification in the plug-in comes from: translation layout rules refer to north

Plan implementation

Create a VS code-extension project

Start with the familiar process of creating a project with scaffolding:

npm install -g yo generator-code
Copy the code

The inputyo codeInitialization code

Let’s first configure the package.json file:

"activationEvents": [
    "onCommand:markdown-format.format"]."main": "./extension.js"."contributes": {
    "commands": [{"command": "markdown-format.format"."title": "Format markdown"}]."menus": {
      "explorer/context": [{"command": "markdown-format.format"."when": "filesExplorerFocus"."group": "navigation@1"}}},Copy the code

The format command is configured, which is triggered by right clicking on a file.

Determine whether the file type is compliant

The next step is to check whether the file type is markdown. If the file type is not markdown, no operation is performed.

const isMarkdown = (path) = > {
  if (isDir(path)) {
    return false;
  }
  if(extname(path) ! ='.md') {
    return false;
  }
  return true;
}
Copy the code

Lexical analysis

The next step is lexical analysis, parsing markdown text into token sequences, which can be divided into the following types:

  • The blank spaceSPACE
  • A newlineENTER
  • Half Angle of symbolsHALF
  • The Angle of symbolFULL
  • ChineseCHINESE
  • digitalNUMBER
  • EnglishENGLISH

If you are not familiar with the principles of compilation, you can read my previous article: Principles of front-end compilation (1) : Introduction to compilation

This is essentially just iterating through the string

function lexicalAnalysis(content) {
  const tokenList = [];
  let currentStr = ' ';
  let currentType = ' ';
  const handleChar = (char, type) = > {
    if (currentType == type) {
      currentStr += char;
    } else {
      tokenList.push({
        type: currentType,
        content: currentStr }) currentStr = char; currentType = type; }}for (const char of content) {
    if (char == ' ') {
      handleChar(char, 'SPACE');
    } else if (char == '\n') {
      handleChar(char, 'ENTER');
    } else if (char.match(/[\x21-\x2f\x3a-\x40\x5b-\x60\x7B-\x7F]/) || char.charCodeAt() === 8212) {
      // Half Angle and dash --
      handleChar(char, 'HALF');
    } else if (char.charCodeAt() === 12290 || (char.charCodeAt() > 65280 && char.charCodeAt() < 65375)) {
      // Full Angle.TODO:Why is the charcode for periods so strange
      handleChar(char, 'FULL');
    } else if (char.match(/[\u4e00-\u9fa5]/)) {
      / / Chinese
      handleChar(char, 'CHINESE')}else if (char.match(/ [0-9])) {
      handleChar(char, 'NUMBER');
    } else {
      handleChar(char, 'ENGLISH');
    }
  }
  tokenList.push({
    type: currentType,
    content: currentStr
  })
  return tokenList;
}
Copy the code

What is the format of the Token sequence

The markdown content is:

You are a`asdas`Cold -- grass. W mainly 123ni.D... Ah... Couldn't get on makingCopy the code

The Token sequence is:

0: {type: ' '.content: ' '}
1: {type: 'CHINESE'.content: 'are you'}
2: {type: 'HALF'.content: '`}
3: {type: 'ENGLISH'.content: 'asdas'}
4: {type: 'HALF'.content: '`}
5: {type: 'CHINESE'.content: 'cold'}
6: {type: 'HALF'.content: The '-'}
7: {type: 'CHINESE'.content: 'the grass'}
8: {type: 'HALF'.content: '. '}
9: {type: 'ENGLISH'.content: 'w'}
10: {type: 'SPACE'.content: ' '}
11: {type: 'ENTER'.content: '\n'}
12: {type: 'CHINESE'.content: 'Mostly'}
13: {type: 'NUMBER'.content: '123'}
14: {type: 'ENGLISH'.content: 'ni'}
15: {type: 'HALF'.content: '. '}
16: {type: 'ENGLISH'.content: 'D'}
17: {type: 'HALF'.content: '... '}
18: {type: 'SPACE'.content: ' '}
19: {type: 'ENTER'.content: '\n'}
20: {type: 'CHINESE'.content: 'ah'}
21: {type: 'FULL'.content: '... '}
22: {type: 'CHINESE'.content: 'Can't get up'}
23: {type: 'ENGLISH'.content: 'github'}
Copy the code

Processing Token sequences

Here I’m going to operate directly on this Token sequence and then put it together, but I’ll follow each rule separately

Add Spaces between Chinese and English

  • Mistake: I like the flexibility of javascript
  • True: I like the flexibility of javascript

The idea here is:

  • If MY currenttokenThe type is Chinese and nexttokenIf the type is numeric or English, add Spaces
  • If MY currenttokenThe type is Chinese and the following content ismarkdownLines of code in syntax are highlighted, bold, italicized and followed by numbers or English, also with Spaces
  • If the match is now a number or English, and the nexttokenIf the type is Chinese, add a space
  • Now if I match thetamarkdownCode lines in the syntax are highlighted, bold, and italicized. If the first digit is In English or a number and the last digit is in Chinese, insert Spaces
if(tokenList[i].type == 'CHINESE') {
  if(tokenList[i+1] && ['ENGLISH'.'NUMBER'].includes(tokenList[i+1].type)) {
    resList.push(SPACE_TOKEN);
  }
  if(tokenList[i+2] && ['`.'* *'.The '*'].includes(tokenList[i+1].content) && ['ENGLISH'.'NUMBER'].includes(tokenList[i+2].type)){ resList.push(SPACE_TOKEN); }}/ / after
if(['ENGLISH'.'NUMBER'].includes(tokenList[i].type)) {
  if(tokenList[i+1] && tokenList[i+1].type === 'CHINESE') { resList.push(SPACE_TOKEN); }}if(['`.'* *'.The '*'].includes(tokenList[i].content)) {
  if(resList[resList.length - 2] && ['ENGLISH'.'NUMBER'].includes(resList[resList.length - 2].type) && tokenList[i+1] && tokenList[i+1].type === 'CHINESE') { resList.push(SPACE_TOKEN); }}Copy the code

Spaces must be added between numbers and units

  • Wrong: This basket of eggs weighs 5kg.
  • Correct: This basket of eggs weighs 5 kg.

B: But there are problems. C: But there are problems.

TODO: There is a margin of error here, and I am wondering whether to maintain a list of units

if(tokenList[i].type === 'NUMBER' && tokenList[i+1] && tokenList[i+1].type === 'ENGLISH') {
  resList.push(SPACE_TOKEN);
}
Copy the code

There is no space between full-angle punctuation marks and other characters

  • Mistake: I like you, you are the fairy.
  • Correct: I like you, you are a fairy.

The idea here is:

  • If the current character is a space and the following character is a full-angle punctuation mark, this space is deleted
  • If the current character is a space and the preceding character is a full-angle punctuation mark, this space is deleted
if(resList[resList.length - 1].type == 'SPACE'&&tokenList[i+1]&&tokenList[i+1].type === 'FULL') {
  resList.pop();
}
// -- Avoid inconsistent situations
if(resList[resList.length - 1] = ='SPACE'&&resList[resList.length - 2]&&resList[resList.length - 2].type === 'FULL') {
  resList.pop();
}
Copy the code

Don’t reuse punctuation marks

  • Mistake: Are you undercover? !!!!!!!!!
  • Correct: Are you undercover? !

This is only true, right? ,! , [,] is processed

if(tokenList[i].type === 'FULL') {
  resList.pop();
  let end = ' ';
  let str = ' ';
  for(const char of tokenList[i].content) {
    if(end == char && ["?"."!"."【"."】"].includes(char)){
      continue;
    } else {
      str += char;
    }
  }
  resList.push({
    type: 'FULL'.content: str
  });
}
Copy the code

ellipsis

  • Wrong: There are tomatoes, potatoes, celery…
  • Correct: there are tomatoes, potatoes, celery…

The ellipsis should read:…

if(resList[resList.length - 2] && ['CHINESE'.'ENGLISH'].includes(resList[resList.length - 2].type) && tokenList[i].content.match(/ ^ [. \] {2} $/)){
  resList.pop();
  resList.push({
    type: 'HALF'.content: '... '})}Copy the code

Chinese is followed by full – Angle characters

  • Hello, my name is Han Cao.
  • Correct: Hello, my name is Han Cao.

Temporarily support switching punctuation:

  • . .
  • .
  • ? ?
  • ! !
  • ; ;
  • : :

In consideration of some special cases of closed labels, no processing is performed

if(['. '.'? '.'! '.'; '.':'].includes(tokenList[i].content)) {
  if(resList[resList.length - 2] && resList[resList.length - 2].type == 'CHINESE') {
    resList.pop();
    resList.push({
      type: 'FULL'.content: punctuationMap.get(tokenList[i].content) }); }}Copy the code

English is followed by half – corner characters

  • 错 : I am hancao.
  • Correct: I am hancao.

English is the same as Chinese

if(['. '.'? '.'! '.'; '.':'].includes(tokenList[i].content)) {
  if(resList[resList.length - 2] && resList[resList.length - 2].type == 'ENGLISH') {
    resList.pop();
    resList.push({
      type: 'HALF'.content: punctuationMap.get(tokenList[i].content) }); }}Copy the code

Function demonstration

Before formatting:

After formatting:

conclusion

There must be some bugs at present, AND I will continue to improve in my own use

If you like my article, like and follow is my biggest support 🌿, thank you ~

Write in the last

You make my world colorful

class World {
  constructor(name) {
    this.isColorful = false;
    this.owner = name;
  }

  add(somebody) {
    if(somebody === 'YOU' && this.owner === 'cold grass') {
      this.isColorful = true; }}}Copy the code

Add my wechat: HancAO97, invite you to join the group, learn to communicate front end together, become a better engineer ~ can also watch the front end early chat for free on October 23 vUE special live, there is especially big share oh ~