As we covered the Monaco Editor project in the last translation, let’s take a look at how to configure custom language highlighting. The Monaco Editor supports the configuration of custom languages through Monarch, its own syntax highlighting library. It allows you to create declarative syntax support for highlighting using JSON. Monarch provides us with playground to debug our custom language online. Language definitions are basically just JSON values that describe various attributes of the language.
start
Monaco uses the setMonarchTokensProvider(languageId, monarchConfig) function to define the highlighting of the language, which takes both the ID of the custom language and the JSON configuration items described above. Let’s start with an example:
{
tokenizer: {
root:[
[/\d+/,{token:"keyword"}],
[/[a-z]+/,{token:"string"}}}]],Copy the code
We define a root property in tokenizer, root is a state in Tokenizer, and that’s where we write parsing rules, where we write regular expressions that match text, Then we set an action for the matched text. In the action, we can set the token class for the matched text.
In our example, we set two rules in root, one for matching numbers and one for matching letters, and then execute the corresponding action. Finally, in the action, we set the token class :keyword and string for matching text. The final result is shown as follows:
The following token classes are built into Monarch:
identifier entity constructor operators tag namespace keyword info-token type string warn-token predefined string.escape error-token invalid comment debug-token comment.doc regexp constant attribute delimiter .[curly,square,parenthesis,angle,array,bracket] number .[hex,octal,binary,float] variable .[name,value] meta .[content]Copy the code
As you can see from the highlight above, the uppercase TEST is not recognized. In this case, we can perfect the rule expression for matching strings.
tokenizer: {
root:[
[/\d+/,{token:"keyword"}],
[/[a-zA-Z]+/,{token:"string"}}]],Copy the code
If our language is case-insensitive, we can simply add an ignoreCase attribute.
{
ignoreCase: true,
tokenizer: {
root:[
[/\d+/, {token: "keyword"}],
[/[a-z]+/, {token: "string"}}}]],Copy the code
Let’s look at the highlights again:
Monarch uses create declarative JSON
Now that we’ve seen the general configuration, let’s look at monarch’s usage. The library allows you to specify efficient syntax highlighting tools using JSON values. The specification is expressive enough to specify highlighting with complex state transitions, dynamic curly bracket matching, auto-completion, embedding in other languages, and more.
Creating a language definition
Language definitions basically just describe JSON values for various attributes of the language. Some of the default attributes are: ignoreCase: Is the language case sensitive? The default is true. DefaultToken: returns the defaultToken if there is no match in the token generator. Brackets: The token generator uses brackets that define matching. Each parenthesis definition is an array of three elements or objects that describe the open parentheses, close parentheses, and token classes. The default definition is:
[[" ', '} ', 'delimiter. Curly'], [' (', ') ', 'delimiter. Square'], [' (', ') ', 'delimiter. Parenthesis'], [' <' and '>'. 'delimiter.angle'] ]Copy the code
Tokenizer: Mandatory, this defines the tokenization rules.
To create atokenizer
The Tokenizer property describes how lexical analysis is performed and how input is divided into tokens. Each token has a CSS class name that is used to render each token in the editor. The standard CSS token classes include:
identifier entity constructor operators tag namespace keyword info-token type string warn-token predefined string.escape error-token invalid comment debug-token comment.doc regexp constant attribute delimiter .[curly,square,parenthesis,angle,array,bracket] number .[hex,octal,binary,float] variable .[name,value] meta .[content]Copy the code
state
A participle consists of objects that define state. The initial state of the token generator is the first state defined in the token generator. When the token generator is in a particular state, only the rules in that state are applied. All rules are matched in order, and when the first matches, its action is used to determine the token class. No further rules will be attempted. Therefore, it is important to sort the rules in the most efficient way, using Spaces and identifiers first.
The rules
Each state is defined as a set of rules that match input. Rules can take the following form:
[regex, action]
[regex, action, next]
I can write it as theta{ regex: regex, action: action{next: next} }
{regex: regex, action: action }
{ include: state }
When the regex matches the current input, the token class set by action acts on that input. A regular expression regex can be a regular expression (used) or a string representing a regular expression. If it starts with a character, the expression matches only at the beginning of the source line. Note that the token generator is not called when the end of the line has already arrived, so the empty pattern /$/ will never match. Include is to better organize rules and introduce all rule states defined.
Actions
Actions determines the result tag class. Can have the following form:
string
[action1,...,actionN]
{ token: tokenclass }
@brackets
or@brackets.tokenclass
An Action object can contain more fields that affect the lexical analyzer state. The following attributes are recognized: Next: State, (string) If defined, pushes the current state onto the token generator stack and generates the current state. For example, this can be used to mark opening block comments:
['/ \\ *', 'comment', '@comment ']Copy the code
Please note that this is shorthand for the following:
{ regex: '/\\*', action: { token: 'comment', next: '@comment' } }
Copy the code
There are some special states available for this next property: @POP: Returns the token generator stack to its previous state. For example, this is used to return from the block comment token after seeing the end token:
['\\*/', 'comment', '@pop']
Copy the code
@push: Pushes into the current state and continues the current state. Execute nested block comments when we see the comment start token, that is, in the @comment state, we can do the following:
['/\\*', 'comment', '@push']
Copy the code
Popall: Pushes everything out of the token generator stack and returns to the top of the stack. It can be used during recovery to “jump back” from the deep nesting level to the initial state.
Log: used for debugging. Log in message to the console window in your browser (press F12 to view).
[/\d+/, { token: 'number', log: 'found number $0 in state $S0' } ]
Copy the code
cases
{ cases: { guard1: action1, … , guardN: actionN}} The last action object is the case statement. The case object contains an object in which each field is used as a conditional selection. Each guard is applied to matched inputs, and once one of them matches, the corresponding action action is applied. Note that since these are actions themselves, cases can be nested. Use case for efficiency: for example, we match identifiers and then test whether the identifiers can be keywords or built-in functions:
[/[a-z_\$][a-zA-Z09 -_\$]*/,
{ cases: { '@typeKeywords': 'keyword.type'
, '@keywords': 'keyword'
, '@default': 'identifier' }
}
]
Copy the code
Guard may include:
- @ Keywords This attribute keywords must be defined in advance in the language object and consists of an array of strings. If the input matches any string, the condition is judged successful.
- @default always successfully matches “@” or “”
- @eos if the matching input has reached the end of the line
- A regex that does not begin with an @ (or $) character is interpreted as a regular expression that tests matching input. For example, this can be used to test specific inputs, here’s an example of the Koka language,
[/[a-z](\w|\-[a-zA-Z])*/,
{ cases:{ '@keywords': {
cases: { 'alias' : { token: 'keyword', next: '@alias-type' }
, 'struct' : { token: 'keyword', next: '@struct-type' }
, 'type|cotype|rectype': { token: 'keyword', next: '@type' }
, 'module|as|import' : { token: 'keyword', next: '@module' }
, '@default' : 'keyword' }
}
, '@builtins': 'predefined'
, '@default' : 'identifier' }
}
]
Copy the code
Note that you can use nested cases to improve efficiency. In addition, the library recognizes the simple regular expressions described above and compiles them efficiently.
Now that you have a basic understanding of how to configure custom language highlighting, let’s take a look at the official example
Monaco Official Example
{
// For plug-in content that has not been resolved by token, set defaultToken to invalid
defaultToken: 'invalid',
// Keyword definition
keywords: [
'abstract', 'continue', 'for', 'new', 'switch', 'assert', 'goto', 'do',
'if', 'private', 'this', 'break', 'protected', 'throw', 'else', 'public',
'enum', 'return', 'catch', 'try', 'interface', 'static', 'class',
'finally', 'const', 'super', 'while', 'true', 'false'].// Type definition
typeKeywords: [
'boolean', 'double', 'byte', 'int', 'short', 'char', 'void', 'long', 'float'
],
// Operator definitionoperators: [' = ', '>', '<', '. ', '-' and '? ', ':', '= =', '< =', '> =', '! = ', '&' & ', '| |', '+ +', '-', '+', '-', '*', '/', '&', '|', '^', '%', '< <', '> >' to '> > >', '+ =', '- =', '* =', '/ =', '& =', '| =', '^ =', '% =', '< < =', '> > =', '> > > ='].// Define common regular expressions
symbols: /[=><!~?:&|+\-*\/\^%]+/,
// C# style stringescapes: /\\(? :[abfnrtv\\"'] | x [0-9 a - Fa - f] {1, 4} | u [0-9 a - Fa - f] {4} | u [0-9 a - Fa - f] {8}) /, / / language main token generator tokenizer: {root: [/ / identifier with the keyword [/ / a - z_ $$] [\ w * /, {cases: {' @ typeKeywords' : 'keyword', '@ keywords' :' keyword ', '@ the default: 'Identifier'}}], [/[A-z][\w\$]*/, 'type.identifier'], // to show class Names NICELY // space {include: '@ whitespace'}, / / brackets and operator [/ [{} () \ [\]] /, '@ brackets], [/ / < > (?! @ symbols) /,' @ brackets], [/ @ symbols, {cases: { '@operators': 'operator', '@default' : "'}}], / / @ annotations. / / as an example, we issued a debug log messages on the token [/ @ \ s * [a zA - Z_ \ $] [\ w \ $] * /, {token: 'the annotation, the log: 'annotation token: $0 '}], / / all kinds of digital definition [/ \ d * \ \ d + ([eE] [\ - +]? \ d +)? /, 'number. Float], [/ 0 [xX] [0-9 a - fA - F] + /,' number. Hex], [/ \ d + /, 'number']. // Delimiter [/[;,.]/, 'delimiter'], // String definition [/"([^"\ \] | \ \.) *$/, 'string.invalid' ], // non-teminated string [/"/, { token: 'string.quote', bracket: '@open', next: '@string' } ],
[/'[^\\']'/, 'string'],
[/(')(@escapes)(')/, ['string','string.escape','string']],
[/'/, 'string.invalid']
],
// Custom rules - Remarks
comment: [
[/[^\/*]+/, 'comment' ], [/\/\*/, 'comment', '@push' ], // nested comment
["\ \ * /", 'comment', '@pop' ],
[/[\/*]/, 'comment']], // User-defined rule - String: [ [/[^\\"]+/, 'string'], [/@escapes/, 'string.escape'], [/\\./, 'string.escape.invalid'], [/"/, { token: 'string. Quote, bracket:' @ close 'next:' @ pop '}]], / / custom rules - Spaces whitespace: [[/ [\ t \ r \ n] + /, 'white'], [/ \ \ * /, 'comment', '@comment' ],
[/\/\/.*$/, 'comment'],
],
},
};
Copy the code
Through the above learning and understanding, a custom language configuration we have been able to configure.
Relevant reference
Monarch playgroud: Microsoft. Making. IO/Monaco – edit…