preface
This article is a summary of the chapters in Understanding Typescript compilation.
This article has been included in my blog star✨!
The compiler
The Typescript compiler is divided into five key parts:
- Scanner Scanner (scanner.ts)
- Parser Parser (parser.ts)
- Binder Binder (binder.ts)
- Checker Checker Checker (checker.ts)
- Emitters (Emitters. Ts)
The compiler code for each section can be found in SRC/Compiler, and this article explains each parser in detail. Before we begin, I found a diagram from the web that helps us understand how the compiler puts the key pieces together.
From the figure above, we can see that the compiler has three main lines:
- Source code -> Scanner -> Token Stream -> Parser -> AST -> Binder -> Symbol
- AST -> Inspector ~~ Symbol -> Type checking
- AST -> Inspector ~~ emitter -> JS code
I’ll start by explaining how each parser works, and I’ll finish with an overview of each line.
The scanner
The source code for ts scanner is in scanner.ts. From the previous flowchart, we saw that the scanner’s role is to generate token streams from source code. Let’s go straight to the createScanner function that creates the scanner in Scanner.ts and read one by one. I’m going to cut out some of the code for you to understand the general process.
export function createScanner(languageVersion: ScriptTarget, skipTrivia: boolean, languageVariant = LanguageVariant.Standard, text? : string, onError? : ErrorCallback, start? : number, length? : number) :Scanner {
let pos: number;
let end: number;
let startPos: number;
let tokenPos: number;
let token: SyntaxKind;
let tokenValue: string;
setText(text, start, length);
// ...
return {
getStartPos: () = > startPos,
getTextPos: () = > pos,
getToken: () = > token,
getTokenPos: () = > tokenPos,
getTokenText: () = > text.substring(tokenPos, pos),
getTokenValue: () = > tokenValue,
// ...
scan,
// ...
};
Copy the code
After we create the scanner using createScanner, we need to scan the source code, corresponding to the scan function in the source code. We continue to find the logic of the scan function, because the createScanner function only defines some functions, and there is no actual logical progression.
function scan() :SyntaxKind {
startPos = pos;
hasExtendedUnicodeEscape = false;
precedingLineBreak = false;
tokenIsUnterminated = false;
numericLiteralFlags = 0;
while (true) {
tokenPos = pos;
if (pos >= end) {
return token = SyntaxKind.EndOfFileToken;
}
let ch = text.charCodeAt(pos);
// Special handling for shebang
if (ch === CharacterCodes.hash && pos === 0 && isShebangTrivia(text, pos)) {
pos = scanShebangTrivia(text, pos);
if (skipTrivia) {
continue;
}
else {
returntoken = SyntaxKind.ShebangTrivia; }}switch (ch) {
case CharacterCodes.lineFeed:
case CharacterCodes.carriageReturn:
precedingLineBreak = true;
if (skipTrivia) {
pos++;
continue;
}
else {
if (ch === CharacterCodes.carriageReturn && pos + 1 < end && text.charCodeAt(pos + 1) === CharacterCodes.lineFeed) {
// consume both CR and LF
pos += 2;
}
else {
pos++;
}
return token = SyntaxKind.NewLineTrivia;
}
case CharacterCodes.tab:
// ...
Copy the code
The scan function returns a value of type SyntaxKind. By annotating token > Syntaxkind. Identifer => Token is a keyword in the source code, I found that it is necessary to generate tokens. In addition, it defines various keywords such as: return, super, switch… . Let’s consider this an enumeration of lexical keywords for a moment.
// token > SyntaxKind.Identifer => token is a keyword
// Also, If you add a new SyntaxKind be sure to keep the `Markers` section at the bottom in sync
export const enum SyntaxKind {
Unknown,
EndOfFileToken,
SingleLineCommentTrivia,
MultiLineCommentTrivia,
NewLineTrivia,
WhitespaceTrivia,
// We detect and preserve #! on the first line
ShebangTrivia,
// We detect and provide better error recovery when we encounter a git merge marker. This
// allows us to edit files with git-conflict markers in them in a much more pleasant manner.
ConflictMarkerTrivia,
// Literals
NumericLiteral,
StringLiteral,
JsxText,
JsxTextAllWhiteSpaces,
RegularExpressionLiteral,
NoSubstitutionTemplateLiteral,
// Pseudo-literals
TemplateHead,
TemplateMiddle,
TemplateTail,
// Punctuation
OpenBraceToken,
ReturnKeyword,
SuperKeyword,
SwitchKeyword,
ThisKeyword,
ThrowKeyword,
TrueKeyword,
TryKeyword,
TypeOfKeyword,
VarKeyword,
VoidKeyword,
WhileKeyword,
WithKeyword,
// ...
}
Copy the code
Continue reading about the logic inside the SCAN function. Let ch = text.charcodeat (pos); This sentence is concerned with. The result of the scan is obtained by generating Unicode encoding. So we can draw a simple conclusion: the scanner through the input source code lexical analysis, the corresponding SyntaxKind, or “token”.
To verify this conclusion, we can create an example to test it simply:
Before starting the scan, we need to initialize some configuration, such as the string to scan, setting the standard version of the JS language, and so on. We then create a scanner using the createScanner function and retrieve the token by calling scan. As long as the token is not the ending token the scanner will keep scanning the entered string.
import * as ts from 'ntypescript';
const scanner = ts.createScanner(ts.ScriptTarget.Latest, true);
function initializeState(text: string) {
scanner.setText(text);
scanner.setScriptTarget(ts.ScriptTarget.ES5);
scanner.setLanguageVariant(ts.LanguageVariant.Standard);
}
const str = 'const foo = 123; '
initializeState(str);
var token = scanner.scan();
while(token ! = ts.SyntaxKind.EndOfFileToken) {console.log(token);
console.log(ts.formatSyntaxKind(token));
token = scanner.scan();
}
Copy the code
Run the code above:
76
ConstKeyword
71
Identifier
58
EqualsToken
8
NumericLiteral
25
SemicolonToken
Copy the code
Const foo = 123; Each part of the word generates a token. For a more intuitive explanation of the result of this scan, we use formatSyntaxKind and output the enumeration value corresponding to SyntaxKind. The following result can be obtained:
- const -> ConstKeyword
- foo -> Identifier
- = -> EqualsToken
- 123 -> NumericLiteral
- ; -> SemicolonToken
The enumeration value corresponding to token is just right and corresponds to it. To prove that our inference is correct, the process is somewhat similar to lexical analysis. I’ve drawn a simple diagram that summarizes the main steps:
The parser
The token generated by the scanner in the first step provides the necessary conditions for the parser to generate the AST.
So in this section, we need to figure out how the tokens generated in the first step are converted to AST nodes. Let’s start with an example of generating an AST:
import * as ts from 'ntypescript';
function printAllChildren(node: ts.Node, depth = 0) {
console.log(new Array(depth + 1).join(The '-'), ts.formatSyntaxKind(node.kind), node.pos, node.end);
depth++;
node.getChildren().forEach(c= > printAllChildren(c, depth));
}
var sourceCode = `const foo = 123; `;
var sourceFile = ts.createSourceFile('foo.ts', sourceCode, ts.ScriptTarget.ES5, true);
printAllChildren(sourceFile);
Copy the code
Run the above code to get:
SourceFile 0 16
---- SyntaxList 0 16
-------- VariableStatement 0 16
------------ VariableDeclarationList 0 15
---------------- ConstKeyword 0 5
---------------- SyntaxList 5 15
-------------------- VariableDeclaration 5 15
------------------------ Identifier 5 9
------------------------ EqualsToken 9 11
------------------------ NumericLiteral 11 15
------------ SemicolonToken 15 16
---- EndOfFileToken 16 16
Copy the code
For those of you who have read about AST before, you can see at a glance that we actually print out an AST tree. The AST tree contains the following key information: 1. Type of the node 2. The starting position of a node. The node type corresponds just right to the enumeration of formatSyntaxKind, and the start position corresponds to Node.pos and Node.end. If you have any questions, you can proofread them one by one with the documents on MDN.
We see that generating the AST actually calls the createSourceFile function. We’ll start with the createSourceFile function from Parser. ts:
export function createSourceFile(fileName: string, sourceText: string, languageVersion: ScriptTarget, setParentNodes = false, scriptKind? : ScriptKind) :SourceFile {
performance.mark("beforeParse");
const result = Parser.parseSourceFile(fileName, sourceText, languageVersion, /*syntaxCursor*/ undefined, setParentNodes, scriptKind);
performance.mark("afterParse");
performance.measure("Parse"."beforeParse"."afterParse");
return result;
}
Copy the code
We found two lines of code in the createSourceFile function: performance.mark(“beforeParse”); And the performance. Mark (” afterParse “). It marks before and after parsing. So the middle should be the analysis process of the serious classics. So let’s dig deeper into the parser. parseSourceFile function.
export function parseSourceFile(fileName: string, sourceText: string, languageVersion: ScriptTarget, syntaxCursor: IncrementalParser.SyntaxCursor, setParentNodes? : boolean, scriptKind? : ScriptKind) :SourceFile {
scriptKind = ensureScriptKind(fileName, scriptKind);
initializeState(sourceText, languageVersion, syntaxCursor, scriptKind);
const result = parseSourceFileWorker(fileName, languageVersion, setParentNodes, scriptKind);
clearState();
return result;
}
Copy the code
First it initializes the state, which is just the first step facing our scanner. To make sure that we can understand each step, we check to see if initializeState is doing the preparatory work before scanning.
function initializeState(_sourceText: string, languageVersion: ScriptTarget, _syntaxCursor: IncrementalParser.SyntaxCursor, scriptKind: ScriptKind) {
// ...
// Initialize and prime the scanner before parsing the source elements.
scanner.setText(sourceText);
scanner.setOnError(scanError);
scanner.setScriptTarget(languageVersion);
scanner.setLanguageVariant(getLanguageVariant(scriptKind));
}
Copy the code
Ok, we have verified that it is indeed preparing for the scan. So let’s move on to parseSourceFileWorker.
function parseSourceFileWorker(fileName: string, languageVersion: ScriptTarget, setParentNodes: boolean, scriptKind: ScriptKind) :SourceFile {
sourceFile = createSourceFile(fileName, languageVersion, scriptKind);
sourceFile.flags = contextFlags;
// Prime the scanner.
nextToken();
processReferenceComments(sourceFile);
sourceFile.statements = parseList(ParsingContext.SourceElements, parseStatement);
Debug.assert(token() === SyntaxKind.EndOfFileToken);
sourceFile.endOfFileToken = addJSDocComment(parseTokenNode() as EndOfFileToken);
setExternalModuleIndicator(sourceFile);
sourceFile.nodeCount = nodeCount;
sourceFile.identifierCount = identifierCount;
sourceFile.identifiers = identifiers;
sourceFile.parseDiagnostics = parseDiagnostics;
if (setParentNodes) {
fixupParentReferences(sourceFile);
}
return sourceFile;
}
Copy the code
-
- CreateSourceFile creates the parse target for us
-
- Performs token replacement for nextToken() newly scanned
currentToken
- Performs token replacement for nextToken() newly scanned
-
- Execute processReferenceComments to generate various pieces of information for each range (including starting and ending points)
function processReferenceComments(sourceFile: SourceFile) :void {
const triviaScanner = createScanner(sourceFile.languageVersion, /*skipTrivia*/ false, LanguageVariant.Standard, sourceText);
while (true) {
const kind = triviaScanner.scan();
const range = {
kind: <SyntaxKind.SingleLineCommentTrivia | SyntaxKind.MultiLineCommentTrivia>triviaScanner.getToken(), pos: triviaScanner.getTokenPos(), end: triviaScanner.getTextPos(), }; const comment = sourceText.substring(range.pos, range.end); else { const amdModuleNameRegEx = /^\/\/\/\s*<amd-module\s+name\s*=\s*('|")(.+?) \1/gim; const amdModuleNameMatchResult = amdModuleNameRegEx.exec(comment); if (amdModuleNameMatchResult) { if (amdModuleName) { parseDiagnostics.push(createFileDiagnostic(sourceFile, range.pos, range.end - range.pos, Diagnostics.An_AMD_module_cannot_have_multiple_name_assignments)); } amdModuleName = amdModuleNameMatchResult[2]; } const amdDependencyRegEx = /^\/\/\/\s*<amd-dependency\s/gim; const pathRegex = /\spath\s*=\s*('|")(.+?) \1/gim; const nameRegex = /\sname\s*=\s*('|")(.+?) \1/gim; const amdDependencyMatchResult = amdDependencyRegEx.exec(comment); if (amdDependencyMatchResult) { const pathMatchResult = pathRegex.exec(comment); const nameMatchResult = nameRegex.exec(comment); if (pathMatchResult) { const amdDependency = { path: pathMatchResult[2], name: nameMatchResult ? nameMatchResult[2] : undefined }; amdDependencies.push(amdDependency); } } const checkJsDirectiveRegEx = /^\/\/\/? \s*(@ts-check|@ts-nocheck)\s*$/gim; const checkJsDirectiveMatchResult = checkJsDirectiveRegEx.exec(comment); if (checkJsDirectiveMatchResult) { checkJsDirective = { enabled: compareStrings(checkJsDirectiveMatchResult[1], "@ts-check", /*ignoreCase*/ true) === Comparison.EqualTo, end: range.end, pos: range.pos }; } } } sourceFile.referencedFiles = referencedFiles; sourceFile.typeReferenceDirectives = typeReferenceDirectives; sourceFile.amdDependencies = amdDependencies; sourceFile.moduleName = amdModuleName; sourceFile.checkJsDirective = checkJsDirective; }Copy the code
-
- The parseList function, we found that the returns
result
Finally byparseListElement
So let’s go ahead and see.
function parseList<T extends Node> (kind: ParsingContext, parseElement: () => T) :NodeArray<T> { const saveParsingContext = parsingContext; parsingContext |= 1 << kind; const result = createNodeArray<T>(); while(! isListTerminator(kind)) {if (isListElement(kind, /*inErrorRecovery*/ false)) { const element = parseListElement(kind, parseElement); result.push(element); continue; } if (abortParsingListOrMoveToNextToken(kind)) { break; } } result.end = getNodeEnd(); parsingContext = saveParsingContext; return result; } Copy the code
- The parseList function, we found that the returns
-
- ParseListElement: Get to the end and find the final result is passed
parseElement
To make up my mind, and I’m confused.
function parseListElement<T extends Node> (parsingContext: ParsingContext, parseElement: () => T) :T { const node = currentNode(parsingContext); if (node) { return <T>consumeNode(node); } return parseElement(); } Copy the code
- ParseListElement: Get to the end and find the final result is passed
-
- parseElement
function parseStatement() :Statement {
switch (token()) {
case SyntaxKind.SemicolonToken:
return parseEmptyStatement();
case SyntaxKind.OpenBraceToken:
return parseBlock(/*ignoreMissingOpenBrace*/ false);
case SyntaxKind.VarKeyword:
return parseVariableStatement(scanner.getStartPos(), /*decorators*/ undefined./*modifiers*/ undefined);
// ...
break;
}
return parseExpressionOrLabeledStatement();
}
Copy the code
We seem to be getting there! In parseStatement, tokens are switched, and different nodes are obtained based on different tokens. For example, we start with the final; Let’s make a simple judgment. In the first place. Corresponding token value should be SyntaxKind. SemicolonToken just right is the first condition judgment. So let’s go to the next step and see what does the parseEmptyStatement function actually do?
-
- parseEmptyStatement
function parseEmptyStatement() :Statement {
const node = <Statement>createNode(SyntaxKind.EmptyStatement);
parseExpected(SyntaxKind.SemicolonToken);
return finishNode(node);
}
Copy the code
We can observe that the createNode function is where the node is actually created for us.
function createNode<TKind extends SyntaxKind> (kind: TKind, pos? : number) :Node | Token<TKind> | Identifier {
nodeCount++;
if(! (pos >=0)) {
pos = scanner.getStartPos();
}
return isNodeKind(kind) ? new NodeConstructor(kind, pos, pos) :
kind === SyntaxKind.Identifier ? new IdentifierConstructor(kind, pos, pos) :
new TokenConstructor(kind, pos, pos);
}
Copy the code
The createNode is responsible for creating the node, setting the SyntaxKind passed in, and the initial position (by default, the position information provided by the current scanner state is used). ParseExpected will check to see if the current token in the parser state matches the specified SyntaxKind. An error report will be generated if there is no match.
function parseExpected(kind: SyntaxKind, diagnosticMessage? : DiagnosticMessage, shouldAdvance =true) :boolean {
if (token() === kind) {
if (shouldAdvance) {
nextToken();
}
return true;
}
// Report specific message if provided with one. Otherwise, report generic fallback message.
if (diagnosticMessage) {
parseErrorAtCurrentToken(diagnosticMessage);
}
else {
parseErrorAtCurrentToken(Diagnostics._0_expected, tokenToString(kind));
}
return false;
}
Copy the code
The final step finishNode will set the end position of the node. It also adds contextFlags for the context and any errors that occur before parsing the node (if there are any errors, the AST node cannot be reused in incremental parsing).
function finishNode<T extends Node> (node: T, end? : number) :T {
node.end = end === undefined ? scanner.getStartPos() : end;
if (contextFlags) {
node.flags |= contextFlags;
}
// Keep track on the node if we encountered an error while parsing it. If we did, then
// we cannot reuse the node incrementally. Once we've marked this node, clear out the
// flag so that we don't mark any subsequent nodes.
if (parseErrorBeforeNextFinishedNode) {
parseErrorBeforeNextFinishedNode = false;
node.flags |= NodeFlags.ThisNodeHasError;
}
return node;
}
Copy the code
We have gone through the whole process of the parser, according to the old rules, we draw a simple flow chart, replay the process of just ~
binder
Most transpilers are simpler than TypeScript because they provide few means of code analysis. A typical JavaScript converter has only the following flow:
Tokens | Tokens | Parses | AST | Transmitters | JavaScriptCopy the code
While this architecture does help simplify the understanding that TypeScript generates JavaScript, one key feature is missing: TypeScript’s semantic system. To assist in type checking, the binder connects the pieces of the source code into a related type system that the inspector can use. The main responsibility of the binder is to create Symbols.
symbol
Symbols connect declaration nodes in the AST to the same entity as other declarations. Symbols are the basic building blocks of semantic systems. So what does the symbol look like?
function Symbol(flags: SymbolFlags, name: string) {
this.flags = flags;
this.name = name;
this.declarations = undefined;
}
Copy the code
An enumeration of flags is used to identify additional symbol categories (e.g. For details, see the enumeration definition for SymbolFlags in Compiler/Types.
Create symbols and bind nodes
First we go to bindSourceFile in bind.ts. Once again, before each parsing and binding, the source code identifies the current operation with performance.mark(). We’ll look at the binder functions beforeBind and beforeBind after.
- bindSourceFile
export function bindSourceFile(file: SourceFile, options: CompilerOptions) {
performance.mark("beforeBind");
binder(file, options);
performance.mark("afterBind");
performance.measure("Bind"."beforeBind"."afterBind");
}
Copy the code
- binder
const binder = createBinder();
Copy the code
- createBinder
function createBinder() : (file: SourceFile, options: CompilerOptions) = >void {
function bindSourceFile(f: SourceFile, opts: CompilerOptions) {
file = f;
options = opts;
languageVersion = getEmitScriptTarget(options);
inStrictMode = bindInStrictMode(file, opts);
classifiableNames = createMap<string>();
symbolCount = 0;
skipTransformFlagAggregation = file.isDeclarationFile;
Symbol = objectAllocator.getSymbolConstructor();
if(! file.locals) { bind(file); file.symbolCount = symbolCount; file.classifiableNames = classifiableNames; }// ...
}
return bindSourceFile;
// ...
}
Copy the code
Since the createBinder function body is so long, I cut the most important part. At first I thought that such a long piece of code must be implemented, but the binding is still in the bind function. BindSourceFile checks to see if file.locals is defined, and if not, submits to bind. The file is not defined in the first place, so we use the logic in bind.
- bind
function bind(node: Node) :void {
if(! node) {return;
}
node.parent = parent;
const saveInStrictMode = inStrictMode;
// Even though in the AST the jsdoc @typedef node belongs to the current node,
// its symbol might be in the same scope with the current node's symbol. Consider:
//
// /** @typedef {string | number} MyType */
// function foo();
//
// Here the current node is "foo", which is a container, but the scope of "MyType" should
// not be inside "foo". Therefore we always bind @typedef before bind the parent node,
// and skip binding this tag later when binding all the other jsdoc tags.
if (isInJavaScriptFile(node)) bindJSDocTypedefTagIfAny(node);
// First we bind declaration nodes to a symbol if possible. We'll both create a symbol
// and then potentially add the symbol to an appropriate symbol table. Possible
// destination symbol tables are:
//
// 1) The 'exports' table of the current container's symbol.
// 2) The 'members' table of the current container's symbol.
// 3) The 'locals' table of the current container.
//
// However, not all symbols will end up in any of these tables. 'Anonymous' symbols
// (like TypeLiterals for example) will not be put in any table.
bindWorker(node);
// Then we recurse into the children of the node to bind them as well. For certain
// symbols we do specialized work when we recurse. For example, we'll keep track of
// the current 'container' node when it changes. This helps us know which symbol table
// a local should go into for example. Since terminal nodes are known not to have
// children, as an optimization we don't process those.
if (node.kind > SyntaxKind.LastToken) {
const saveParent = parent;
parent = node;
const containerFlags = getContainerFlags(node);
if (containerFlags === ContainerFlags.None) {
bindChildren(node);
}
else {
bindContainer(node, containerFlags);
}
parent = saveParent;
}
else if(! skipTransformFlagAggregation && (node.transformFlags & TransformFlags.HasComputedFlags) ===0) {
subtreeTransformFlags |= computeTransformFlagsForNode(node, 0);
}
inStrictMode = saveInStrictMode;
}
Copy the code
My experience with the source code must be important once I see so many comments in the bind function. First, it adds a parent to the current node, then calls bindWorker to call the corresponding binding function for each node, and finally calls bindChildren to bind each child node of the current node. Inside bindChildren, each node is also bound by a recursive call to bind. As for bindContainer, it can be concluded from the comments that bindContainer does the same binding for some special nodes, such as exports, members, locals, etc
- bindWorker
function bindWorker(node: Node) {
switch (node.kind) {
case SyntaxKind.Identifier:
if ((<Identifier>node).isInJSDocNamespace) {
let parentNode = node.parent;
while(parentNode && parentNode.kind ! == SyntaxKind.JSDocTypedefTag) { parentNode = parentNode.parent; } bindBlockScopedDeclaration(<Declaration>parentNode, SymbolFlags.TypeAlias, SymbolFlags.TypeAliasExcludes);break;
}
case SyntaxKind.ThisKeyword:
if (currentFlow && (isExpression(node) || parent.kind === SyntaxKind.ShorthandPropertyAssignment)) {
node.flowNode = currentFlow;
}
return checkStrictModeIdentifier(<Identifier>node);
// ...
}
Copy the code
I truncated part of the code, because what bindWorker does is bind according to Node. kind (SyntaxKind type), and delegate the work to the corresponding bindXXX function for the actual binding operation. Let’s take Identifier for example. What did see bindBlockScopedDeclaration ❓
- bindBlockScopedDeclaration
function bindBlockScopedDeclaration(node: Declaration, symbolFlags: SymbolFlags, symbolExcludes: SymbolFlags) {
switch (blockScopeContainer.kind) {
case SyntaxKind.ModuleDeclaration:
declareModuleMember(node, symbolFlags, symbolExcludes);
break;
case SyntaxKind.SourceFile:
if (isExternalModule(<SourceFile>container)) {
declareModuleMember(node, symbolFlags, symbolExcludes);
break;
}
// falls through
default:
if(! blockScopeContainer.locals) { blockScopeContainer.locals = createMap<Symbol> (); addToContainerChain(blockScopeContainer); } declareSymbol(blockScopeContainer.locals,/*parent*/ undefined, node, symbolFlags, symbolExcludes); }}Copy the code
DeclareSymbol (declareModuleMember); declareSymbol (declareSymbol); declareSymbol (declareModuleMember); declareSymbol (declareSymbol);
- declareSymbol
function declareSymbol(symbolTable: SymbolTable, parent: Symbol, node: Declaration, includes: SymbolFlags, excludes: SymbolFlags) :Symbol { Debug.assert(! hasDynamicName(node));const isDefaultExport = hasModifier(node, ModifierFlags.Default);
// The exported symbol for an export default function/class node is always named "default"
const name = isDefaultExport && parent ? "default" : getDeclarationName(node);
let symbol: Symbol;
if (name === undefined) {
symbol = createSymbol(SymbolFlags.None, "__missing");
}
else {
symbol = symbolTable.get(name);
addDeclarationToSymbol(symbol, node, includes);
symbol.parent = parent;
// ..
return symbol;
}
Copy the code
CreateSymbol 2. AddDeclarationToSymbol
- createSymbol
function createSymbol(flags: SymbolFlags, name: string) :Symbol {
symbolCount++;
return new Symbol(flags, name);
}
Copy the code
CreateSymbol basically simply updates symbolCount (a local variable of bindSourceFile) and creates symbols with the specified parameters. Once the symbol is created, you need to bind the node.
- addDeclarationToSymbol
function addDeclarationToSymbol(symbol: Symbol, node: Declaration, symbolFlags: SymbolFlags) {
symbol.flags |= symbolFlags;
node.symbol = symbol;
if(! symbol.declarations) { symbol.declarations = []; } symbol.declarations.push(node);// ...
}
Copy the code
AddDeclarationToSymbol function basically does two things: 1. Create a link between the AST node and symbol (node.symbol = symbol;) 2. Add a declaration (symbol.declaration.push (node);) to the node. .
Now that the most important work of the binder is done, let’s draw a simple flow chart to walk through the symbol creation process:
Thus declared our first route has been completed:
Source code -> Scanner -> Token Stream -> Parser -> AST -> Binder ->Symbol(symbols)Copy the code
Let’s look at the remaining two routes: type checking and code firing.
The viewer
Program use of checkers
To begin the source analysis, we need to know that the inspector is initialized by the program and that the bindSourceFile in the binder is started by the inspector. To simplify, take a look at the program’s call stack:
GetTypeChecker -> ts. CreateTypeChecker -> initializeTypeChecker -> initializeTypeChecker ->for each SourceFile `ts.bindSourceFile`(in the binder)/ / then
for each SourceFile `ts.mergeSymbolTable`(In the inspector)Copy the code
I can see that initializeTypeChecker calls the bindSourceFile of the binder and mergeSymbolTable of the checker itself
Verify that the call stack is correct
function initializeTypeChecker() {
// Bind all source files and propagate errors
for (const file of host.getSourceFiles()) {
bindSourceFile(file, compilerOptions);
}
// Initialize global symbol table
let augmentations: LiteralExpression[][];
for (const file of host.getSourceFiles()) {
if(! isExternalOrCommonJsModule(file)) { mergeSymbolTable(globals, file.locals); }// ...
}
// ...
}
Copy the code
Looking at the source code in the inspector, we did verify the stack call process described above. BindSourceFile is called and mergeSymbolTable is called.
Analysis mergeSymbolTable
In the last section, we did an analysis of bindSourceFile, and finally created a symbol for each node to connect each node into a related type system. What mergeSymbolTable does is merge all the global symbols into the let globals: SymbolTable = {} SymbolTable. All future type checks can be verified on global.
function mergeSymbolTable(target: SymbolTable, source: SymbolTable) {
source.forEach((sourceSymbol, id) = > {
let targetSymbol = target.get(id);
if(! targetSymbol) { target.set(id, sourceSymbol); }else {
if (!(targetSymbol.flags & SymbolFlags.Transient)) {
targetSymbol = cloneSymbol(targetSymbol);
target.set(id, targetSymbol);
}
mergeSymbol(targetSymbol, sourceSymbol);
}
});
}
Copy the code
function mergeSymbol(target: Symbol, source: Symbol) {
if(! (target.flags & getExcludedSymbolFlags(source.flags))) {if(source.flags & SymbolFlags.ValueModule && target.flags & SymbolFlags.ValueModule && target.constEnumOnlyModule && ! source.constEnumOnlyModule) {// reset flag when merging instantiated module into value module that has only const enums
target.constEnumOnlyModule = false;
}
target.flags |= source.flags;
if(source.valueDeclaration && (! target.valueDeclaration || (target.valueDeclaration.kind === SyntaxKind.ModuleDeclaration && source.valueDeclaration.kind ! == SyntaxKind.ModuleDeclaration))) {// other kinds of value declarations take precedence over modules
target.valueDeclaration = source.valueDeclaration;
}
addRange(target.declarations, source.declarations);
if (source.members) {
if(! target.members) target.members = createMap<Symbol> (); mergeSymbolTable(target.members, source.members); }if (source.exports) {
if(! target.exports) target.exports = createMap<Symbol> (); mergeSymbolTable(target.exports, source.exports); } recordMergedSymbol(target, source); }else if (target.flags & SymbolFlags.NamespaceModule) {
error(getNameOfDeclaration(source.declarations[0]), Diagnostics.Cannot_augment_module_0_with_value_exports_because_it_resolves_to_a_non_module_entity, symbolToString(target));
}
else {
const message = target.flags & SymbolFlags.BlockScopedVariable || source.flags & SymbolFlags.BlockScopedVariable
? Diagnostics.Cannot_redeclare_block_scoped_variable_0 : Diagnostics.Duplicate_identifier_0;
forEach(source.declarations, node= > {
error(getNameOfDeclaration(node) || node, message, symbolToString(source));
});
forEach(target.declarations, node= >{ error(getNameOfDeclaration(node) || node, message, symbolToString(source)); }); }}Copy the code
Type checking
The real type checking happens when getDiagnostics is called. When this function is called (such as by program.emit request), the inspector returns an EmitResolver (obtained by the Program calling the inspector’s getEmitResolver function), EmitResolver is a collection of local functions of createTypeChecker. Let’s proceed with a step-by-step analysis of how the getDiagnostics ““ function does type checking
getDiagnostics
function getDiagnostics(sourceFile: SourceFile, ct: CancellationToken) :Diagnostic[] {
try {
cancellationToken = ct;
return getDiagnosticsWorker(sourceFile);
}
finally {
cancellationToken = undefined; }}Copy the code
Without time to explain, let’s just go to getDiagnosticsWorker
getDiagnosticsWorker
function getDiagnosticsWorker(sourceFile: SourceFile) :Diagnostic[] {
throwIfNonDiagnosticsProducing();
if (sourceFile) {
// ..
checkSourceFile(sourceFile);
// ..
const semanticDiagnostics = diagnostics.getDiagnostics(sourceFile.fileName);
// ..
return semanticDiagnostics;
}
forEach(host.getSourceFiles(), checkSourceFile);
return diagnostics.getDiagnostics();
}
Copy the code
Remove all irrelevant things, we found a small recursion, checkSourceFile if sourceFile exist for operation, otherwise it into diagnostics. GetDiagnostics () do it again.
checkSourceFile
function checkSourceFile(node: SourceFile) {
performance.mark("beforeCheck");
checkSourceFileWorker(node);
performance.mark("afterCheck");
performance.measure("Check"."beforeCheck"."afterCheck");
}
Copy the code
CheckSourceFileWorker = checkSourceFileWorker = checkSourceFileWorker = checkSourceFileWorker
checkSourceFileWorker
function checkSourceFileWorker(node: SourceFile) {
const links = getNodeLinks(node);
if(! (links.flags & NodeCheckFlags.TypeChecked)) {if (compilerOptions.skipLibCheck && node.isDeclarationFile || compilerOptions.skipDefaultLibCheck && node.hasNoDefaultLib) {
return;
}
// Grammar checking
checkGrammarSourceFile(node);
forEach(node.statements, checkSourceElement);
checkDeferredNodes();
if (isExternalModule(node)) {
registerForUnusedIdentifiersCheck(node);
}
if(! node.isDeclarationFile) { checkUnusedIdentifiers(); }if (isExternalOrCommonJsModule(node)) {
checkExternalModuleExports(node);
}
// ...links.flags |= NodeCheckFlags.TypeChecked; }}Copy the code
The checkSourceFileWorker function contains various check operations, such as: CheckGrammarSourceFile, checkDeferredNodes, registerForUnusedIdentifiersCheck… Isn’t that what we’re looking for? Let’s just pick one of them and keep digging.
checkGrammarSourceFile
function checkGrammarSourceFile(node: SourceFile) :boolean {
return isInAmbientContext(node) && checkGrammarTopLevelElementsForRequiredDeclareModifier(node);
}
Copy the code
So we’re going to check whether the node is an Inambient context. We don’t care. We’re done with him.
checkGrammarTopLevelElementsForRequiredDeclareModifier
function checkGrammarTopLevelElementsForRequiredDeclareModifier(file: SourceFile) :boolean {
for (const decl of file.statements) {
if (isDeclaration(decl) || decl.kind === SyntaxKind.VariableStatement) {
if (checkGrammarTopLevelElementForRequiredDeclareModifier(decl)) {
return true; }}}}Copy the code
checkGrammarTopLevelElementForRequiredDeclareModifier
function checkGrammarTopLevelElementForRequiredDeclareModifier(node: Node) :boolean {
if (node.kind === SyntaxKind.InterfaceDeclaration ||
node.kind === SyntaxKind.TypeAliasDeclaration ||
node.kind === SyntaxKind.ImportDeclaration ||
node.kind === SyntaxKind.ImportEqualsDeclaration ||
node.kind === SyntaxKind.ExportDeclaration ||
node.kind === SyntaxKind.ExportAssignment ||
node.kind === SyntaxKind.NamespaceExportDeclaration ||
getModifierFlags(node) & (ModifierFlags.Ambient | ModifierFlags.Export | ModifierFlags.Default)) {
return false;
}
return grammarErrorOnFirstToken(node, Diagnostics.A_declare_modifier_is_required_for_a_top_level_declaration_in_a_d_ts_file);
}
Copy the code
grammarErrorOnFirstToken
function grammarErrorOnFirstToken(node: Node, message: DiagnosticMessage, arg0? : any, arg1? : any, arg2? : any) :boolean {
const sourceFile = getSourceFileOfNode(node);
if(! hasParseDiagnostics(sourceFile)) {const span = getSpanOfTokenAtPosition(sourceFile, node.pos);
diagnostics.add(createFileDiagnostic(sourceFile, span.start, span.length, message, arg0, arg1, arg2));
return true; }}Copy the code
createFileDiagnostic
export function createFileDiagnostic(file: SourceFile, start: number, length: number, message: DiagnosticMessage) :Diagnostic {
const end = start + length;
Debug.assert(start >= 0."start must be non-negative, is " + start);
Debug.assert(length >= 0."length must be non-negative, is " + length);
if (file) {
Debug.assert(start <= file.text.length, `start must be within the bounds of the file. ${start} > ${file.text.length}`);
Debug.assert(end <= file.text.length, `end must be the bounds of the file. ${end} > ${file.text.length}`);
}
let text = getLocaleSpecificMessage(message);
if (arguments.length > 4) {
text = formatStringFromArgs(text, arguments.4);
}
return {
file,
start,
length,
messageText: text,
category: message.category,
code: message.code,
};
}
Copy the code
Finally, we see that the final type validation is thrown through the debug.assert function. I won’t go into what assert does here. Interested students can view the source code. Inspector source summary: it is based on our generated AST node declaration start node position of the string passed in to do position type syntax and so on checksum exception thrown.
To conclude our validator, we draw a simple diagram:
That’s the end of our second leg of the route:
AST -> Inspector ~~Symbol(symbol) -> Type checkingCopy the code
The emitter
The TypeScript compiler provides two emitters:
- Emitters. Ts: This is ts -> JavaScript emitters
- DeclarationEmitter. Ts: Used to create declaration files for TypeScript source files (.ts)
Program use of emitters: Program provides an EMIT function. This function primarily delegates functionality to emitFiles in Emitter.ts. Here is the call stack:
Program.emit ->
`emitWorker`(createProgram in program.ts) ->`emitFiles`(Functions in Emitters. Ts)Copy the code
emitFiles
export function emitFiles(resolver: EmitResolver, host: EmitHost, targetSourceFile: SourceFile, emitOnlyDtsFiles? : boolean, transformers? : TransformerFactory
[]
) :EmitResult {
const compilerOptions = host.getCompilerOptions();
const moduleKind = getEmitModuleKind(compilerOptions);
const sourceMapDataList: SourceMapData[] = compilerOptions.sourceMap || compilerOptions.inlineSourceMap ? [] : undefined;
const emittedFilesList: string[] = compilerOptions.listEmittedFiles ? [] : undefined;
const emitterDiagnostics = createDiagnosticCollection();
const newLine = host.getNewLine();
const writer = createTextWriter(newLine);
const sourceMap = createSourceMapWriter(host, writer);
let currentSourceFile: SourceFile;
let bundledHelpers: Map<boolean>;
let isOwnFileEmit: boolean;
let emitSkipped = false;
const sourceFiles = getSourceFilesToEmit(host, targetSourceFile);
// Transform the source files
const transform = transformNodes(resolver, host, compilerOptions, sourceFiles, transformers, /*allowDtsFiles*/ false);
// Create a printer to print the nodes
const printer = createPrinter();
// Emit each output file
performance.mark("beforePrint");
forEachEmittedFile(host, emitSourceFileOrBundle, transform.transformed, emitOnlyDtsFiles);
performance.measure("printTime"."beforePrint");
// Clean up emit nodes on parse tree
transform.dispose();
return {
emitSkipped,
diagnostics: emitterDiagnostics.getDiagnostics(),
emittedFiles: emittedFilesList,
sourceMaps: sourceMapDataList
};
function emitSourceFileOrBundle({ jsFilePath, sourceMapFilePath, declarationFilePath }: EmitFileNames, sourceFileOrBundle: SourceFile | Bundle) {}function printSourceFileOrBundle(jsFilePath: string, sourceMapFilePath: string, sourceFileOrBundle: SourceFile | Bundle) {}function setSourceFile(node: SourceFile) {}function emitHelpers(node: Node, writeLines: (text: string) => void) {}}Copy the code
It basically sets up a bunch of local variables and functions (which make up most of emitSourceFile), and then hands the local function emitSourceFile the text. The emitSourceFile function sets currentSourceFile and gives it to the local function to emit.
emit
function emit(node: Node) {
pipelineEmitWithNotification(EmitHint.Unspecified, node);
}
Copy the code
pipelineEmitWithHint
The emit triggered functions were wrapped one by one and finally stuck in pipelineEmitWithHint after many layers of troubleshooting. Emits different code through different hints.
function pipelineEmitWithNotification(hint: EmitHint, node: Node) {
if (onEmitNode) {
onEmitNode(hint, node, pipelineEmitWithComments);
}
else{ pipelineEmitWithComments(hint, node); }}function pipelineEmitWithComments(hint: EmitHint, node: Node) {
node = trySubstituteNode(hint, node);
if(emitNodeWithComments && hint ! == EmitHint.SourceFile) { emitNodeWithComments(hint, node, pipelineEmitWithSourceMap); }else{ pipelineEmitWithSourceMap(hint, node); }}function pipelineEmitWithSourceMap(hint: EmitHint, node: Node) {
if(onEmitSourceMapOfNode && hint ! == EmitHint.SourceFile && hint ! == EmitHint.IdentifierName) { onEmitSourceMapOfNode(hint, node, pipelineEmitWithHint); }else{ pipelineEmitWithHint(hint, node); }}function pipelineEmitWithHint(hint: EmitHint, node: Node) :void {
switch (hint) {
case EmitHint.SourceFile: return pipelineEmitSourceFile(node);
case EmitHint.IdentifierName: return pipelineEmitIdentifierName(node);
case EmitHint.Expression: return pipelineEmitExpression(node);
case EmitHint.Unspecified: returnpipelineEmitUnspecified(node); }}Copy the code
pipelineEmitSourceFile
If the hint initially passed in is Unspecified, it is determined in pipelineEmitUnspecified based on the node’s Kind.
function pipelineEmitUnspecified(node: Node) :void {
const kind = node.kind;
// Reserved words
// Strict mode reserved words
// Contextual keywords
if (isKeyword(kind)) {
writeTokenNode(node);
return;
}
switch (kind) {
// Pseudo-literals
case SyntaxKind.TemplateHead:
case SyntaxKind.TemplateMiddle:
case SyntaxKind.TemplateTail:
returnemitLiteral(<LiteralExpression>node); }}Copy the code
emitLiteral
For example, if our node type is TemplateHead, the emitLiteral function is executed to emit code.
function emitLiteral(node: LiteralLikeNode) {
const text = getLiteralTextOfNode(node);
if ((printerOptions.sourceMap || printerOptions.inlineSourceMap)
&& (node.kind === SyntaxKind.StringLiteral || isTemplateLiteralKind(node.kind))) {
writer.writeLiteral(text);
}
else{ write(text); }}Copy the code
Everything else is pretty much the same. Let’s sum it up:
conclusion
In the process of debugging the source code, we recommend a vscode plug-in Bookmarks. It helps mark and locate our key code.
This article is a personal reading of the source code of some thinking, I think the biggest reference value of this article is to comb the main process of each compiler.
Refer to the article
Understand typescript in depth
Typescript Compilation Principles (-)