Recently, I received a demand in daily brick moving, which needs to implement a code editor on the web page. The editor supports the govaluate syntax (govaluate syntax introduction please poke here), and needs to have the most basic interaction effects of the code editor, such as code prompt, keyword highlighting, code error capture, hover prompt, automatic formatting, etc. We know that the code editor on the web will use the Monaco Editor, which already has built-in support for major programming languages such as JS, Java, Go, etc. However, the language of this requirement is a go library, which Monaco does not support, so we need to use Monaco’s self-defined language ability to complete this requirement.

After research and a code editor similar to our own implementation language, I learned about antLR, a library that I can use to customize the Monaco language.

About the antlr

ANTLR (Full name: ANother Tool for Language Recognition (ANother Tool for Language Recognition) is a powerful automatic parser generation Tool written in Java Language. It was introduced in 1989 by Dr. Terence Parr et al from university of San Francisco. The iteration is now the fourth generation. So it’s called Antlr4. The tool itself is a Java language tool, but the resulting parser can be in mainstream programming languages including JS and TS, so Antlr4 is basically the most widely used automatic parser generation tool.

Here is a more detailed article on ANTLR and its usage, and interested students can follow the practice of compiling technology on the front end (II) – ANTLR and its application

Pay attention to

In this article, I will not talk much about the use of the Monaco Editor. Instead, I will show you how ANTLR implements a custom Monaco language. Monaco editor website

Technology selection

Use react+ts+antlr4ts+ React-Monaco-Editor.

Initialize project installation dependencies

npm i react-monaco-editor
npm i antlr4ts
npm i antlr4ts-cli -D
Copy the code

The use of Monaco editor

import React from 'react';
import './App.css';
import MonacoEditor from 'react-monaco-editor';

function App() :JSX.Element {
  return (
    <div className="App">
      <MonacoEditor
        width={800}
        height={600}
        options={{
          fontSize: 20,}}language="javascript"
        theme="vs-dark"
      />
    </div>
  );
}

export default App;
Copy the code

To use the Monaco Editor, you need to install the Monaco – Editor-webpack-plugin.

Write a G4 file to generate a parser

Because it’s a small demo, we’re going to use the simplest syntax of addition, subtraction, multiplication, and division, which looks something like this in a G4 file

See the above article on ANTLR for details on how to write the G4 file. In short, I defined the morphology and grammar. The morphology is addition, subtraction, multiplication and division equal to parentheses and numbers respectively. Syntax for parenthesis syntax, addition and subtraction, multiplication and division. After writing the G4 file, you need to use ANTlr4ts – CLI to generate the parser

npx antlr4ts -visitor src/parser/calc.g4
Copy the code

After running this command, you can see that several files have been generated

You can check the contents of these files. And you get a sense of what it does, right

Implement keyword highlighting

Monaco implements highlighting using the setTokensProvider API. We simply need to obtain the location of each keyword in the text and assemble it into the data Monaco wants to achieve. So all you need to do is use a lexical analyzer. For our computed expression syntax we just highlight numbers and operators.

Implement TokenProvider class

Start by declaring a class that Monaco needs to highlight

import * as monaco from 'monaco-editor/esm/vs/editor/editor.api';

function getTokens(input: string) {
  return[]}function tokenForLine(input: string) {
  const tokens = getTokens(input);

  return { tokens, endState: new State() };
}

class State implements monaco.languages.IState {
  clone(): monaco.languages.IState {
    return new State();
  }
  equals(other: State): boolean {
    return true; }}export class TokensProviders implements monaco.languages.TokensProvider {
  tokenize(line: string.state: State): monaco.languages.ILineTokens {
    return tokenForLine(line);
  }

  getInitialState(): monaco.languages.IState {
    return newState(); }}Copy the code

Our main analysis logic is in the getTokens function. We need a return format reference document IToken. First, we need to analyze which positions in the text we send in are configured with the morphology. We use the calcLexer class to get the token stream of text.

import { CharStreams } from 'antlr4ts';
import { calcLexer } from '.. /parser/calcLexer';

// Initialize the lexer
const chars = CharStreams.fromString(input);
const lexer = new calcLexer(chars);
lexer.removeErrorListeners();
// Get the token stream
const tokens = lexer.getAllTokens();

console.log(tokens)
Copy the code

Let’s say 1+1=2 in the editor and see what it prints out

You can see that it prints an array of tokens, so let’s click on the first token and see what’s inside

He’s parsed out all the lexical positions that we put in and his type, which is an index and needs to be converted

const type = lexer.ruleNames[token.type - 1];
Copy the code

That gives us the first word of type number and since our addition and subtraction etc are all operators, we need to convert them all to the same type and pass it to Monaco

export const TokenMap: Record<string.string> = {
  ADD: 'operator'.SUB: 'operator'.DIV: 'operator'.MUL: 'operator'.EQUAL: 'operator'.OpenParen: 'operator'.CloseParen: 'operator'.NUMBER: 'keyword'.UnexpectedCharacter: ' '};Copy the code

We can also capture some morphology that we haven’t configured and turn it red

console errors = [];
lexer.addErrorListener({
   syntaxError(_1, _2, _3, charPositionInLine: number){ errors.push(charPositionInLine); }});Copy the code

Finally, we configured a Monaco theme color to see the highlights

GetTokens complete code

function getTokens(input: string) {
  const lexer = createLexer(input); // Initializing lexer encapsulates a function

  // Catch lexical errors
  const errors: number[] = [];
  lexer.removeErrorListeners();
  lexer.addErrorListener({
    syntaxError(_1, _2, _3, charPositionInLine: number){ errors.push(charPositionInLine); }});// Get the token stream
  const tokens = lexer.getAllTokens();

  console.log(tokens);

  const res: monaco.languages.IToken[] = tokens.map(token= > {
    const type = lexer.ruleNames[token.type - 1];

    const typeName = TokenMap[type] || TokenMap.UnexpectedCharacter;
    return {
      scopes: typeName,
      startIndex: token.charPositionInLine,
    };
  });

  // Add the caught errors to the res
  errors.forEach(point= > res.push({ scopes: 'error'.startIndex: point }));

  return res;
}
Copy the code

To this use of lexical analyzer keyword highlighting is complete. Of course, the actual requirements can be more flexible, such as the detection of parentheses after the word as a function.

Implement code hover prompt

Hover prompt we use a parser to implement. First, implement the Hover class as usual

Implement HoverProvider class

export class HoverProvider implements monaco.languages.HoverProvider {
  provideHover(model: monaco.editor.IModel, position: monaco.Position) {
    return {
      contents: [],}; }}Copy the code

ProviderHover function to return the format of providerHover function to see here we use the parser to pass the text into AST number, and then through the corresponding method to get the mouse to what is the key word, first generate AST

export const getParser = (input: string) = > {
  const lexer = createLexer(input);  // Initializes the lexical parser
  const tokenStream = new CommonTokenStream(lexer);
  const parser = new calcParser(tokenStream);
  parser.removeErrorListeners();
  lexer.removeErrorListeners();
  return parser;
};

export const getAST = (input: string) = > {
  const parser = getParser(input);
  const ast = parser.start();
  return ast;
};
Copy the code

How to analyze the generated AST? We need the ParseTreeWalker provided by ANTlr4 to achieve this

import { ParseTreeWalker } from 'antlr4ts/tree/ParseTreeWalker';

ParseTreeWalker.DEFAULT.walk(finder, AST); AST / / analysis
Copy the code

So the Finder is just a callback class, and that class is the implementscalcListener for the interface. Whatever syntax he parses goes into the corresponding callback.

class HoverFinder implements calcListener { result? : {range: monaco.Range;
    type: 'string'; name? :string;
  };
  private position: monaco.Position;
  constructor(position: monaco.Position) {
    this.position = position;
  }

  enterNumber(ctx: NumberContext) {
    console.log(ctx); }}Copy the code

Let’s print CTX and see what it is

We can get the token via the start attribute and also get the location of the keyword. Use of Monaco. Range. ContainsPosition see if it matches.

const getRangeFromToken = (input: Token) = > {
  const startLineNumber = input.line;
  const startColumn = input.charPositionInLine + 1;
  constlength = input.text? .length ||1;
  return new monaco.Range(startLineNumber, startColumn, startLineNumber, startColumn + length);
};
enterNumber(ctx: NumberContext) {
    if (!this.result) {
      console.log(ctx);
      const range = getRangeFromToken(ctx.start);
      const matched = monaco.Range.containsPosition(range, this.position);
      if (matched) {
        this.result = {
          range,
          type: 'number'.name: ctx.start.text, }; }}}Copy the code

So we can see if the hover popover is triggered by result in the finder, so the complete code is

import { Token } from 'antlr4ts';
import { ParseTreeWalker } from 'antlr4ts/tree/ParseTreeWalker';
import * as monaco from 'monaco-editor/esm/vs/editor/editor.api';
import { getAST } from '.. /common';
import { calcListener } from '.. /parser/calcListener';
import { NumberContext } from '.. /parser/calcParser';

export class HoverProvider implements monaco.languages.HoverProvider {
  provideHover(model: monaco.editor.IModel, position: monaco.Position) {
    const content = model.getValue();
    const AST = getAST(content || ' ');
    const finder = new HoverFinder(position);
    ParseTreeWalker.DEFAULT.walk(finder, AST); / / traverse the AST
    const { result } = finder;
    if (result.type === 'number') {
      return {
        contents: [{value: ` digital${result.name}`],},range: result.range,
      };
    }
    return {
      contents: [],}; }}const getRangeFromToken = (input: Token) = > {
  const startLineNumber = input.line;
  const startColumn = input.charPositionInLine + 1;
  constlength = input.text? .length ||1;
  return new monaco.Range(startLineNumber, startColumn, startLineNumber, startColumn + length);
};
class HoverFinder implements calcListener { result? : {range: monaco.Range;
    type: string; name? :string;
  };
  private position: monaco.Position;
  constructor(position: monaco.Position) {
    this.position = position;
  }

  enterNumber(ctx: NumberContext) {
    if (!this.result) {
      console.log(ctx);
      const range = getRangeFromToken(ctx.start);
      const matched = monaco.Range.containsPosition(range, this.position);
      if (matched) {
        this.result = {
          range,
          type: 'number'.name: ctx.start.text, }; }}}visitErrorNode() {
    // For the ts type to be correct}}Copy the code

The effect

Implement error capture

About code error capture using the Monaco. Editor. SetModelMarkers this API, we need to change the text of real-time detection error. We need to implement a validate function that is called when the text changes. This function returns an array representing the error location and content, and we use the setModelMarkers API to identify the error. We will implement this using syntax and lexical error detection. Specific code

import { CommonTokenStream, Token } from 'antlr4ts';
import * as monaco from 'monaco-editor/esm/vs/editor/editor.api';
import { createLexer } from '.. /common';
import { calcParser } from '.. /parser/calcParser';

const getPositionByToken = (token: Token) = > ({
  startLineNumber: token.line,
  startColumn: token.charPositionInLine + 1.endLineNumber: token.line,
  endColumn: token.charPositionInLine + (token.text? .length ||0) + 1});export const validate = async (model: monaco.editor.IModel) => {
  let content = ' ';
  try {
    content = model.getValue();
    console.log(content);
  } catch {
    monaco.editor.setModelMarkers(model, 'ruleLint'[]);return;
  }

  if(! content.trim()) { monaco.editor.setModelMarkers(model,'ruleLint'[]);return;
  }

  const lexer = createLexer(content);
  const tokenStream = new CommonTokenStream(lexer);
  const parser = new calcParser(tokenStream);
  lexer.removeErrorListeners();
  parser.removeErrorListeners();

  const errors: monaco.editor.IMarkerData[] = [];

  // Collect lexical and grammatical errors
  lexer.addErrorListener({
    syntaxError(_1, _2, line, charPositionInLine, msg, _6) {
      errors.push({
        message: msg,
        severity: monaco.MarkerSeverity.Error,
        source: 'validator'.startLineNumber: line,
        startColumn: charPositionInLine + 1.endLineNumber: line,
        endColumn: charPositionInLine + 2.code: 'lexer'}); }}); parser.addErrorListener({syntaxError(_1, offendingSymbol, _3, _4, msg, _6) {
      if (offendingSymbol) {
        errors.push({
          message: msg,
          severity: monaco.MarkerSeverity.Error,
          source: 'validator'.code: 'parser'. getPositionByToken(offendingSymbol), }); }}}); parser.start();return errors;
};

Copy the code

Of course, you can also use the above hover implementation of the parser to implement custom language errors, such as the need to do a variable is not defined, the number of function parameters error etc..

conclusion

With this parser we can do more than just assemble the array into the format Monaco wants. I’m not going to demonstrate any of the other functions here, but if you’re interested, you can explore them for yourself. I believe ANTLR will play a big role in the front end. The security front end team of Hangzhou Bytedance Tiktok community has been hired. The team atmosphere is good, and the delivery address is recommended