First, the github address for TS: github.com/Microsoft/T… . You can download it in advance. The compiled part is in the SRC/Compiler directory.
Which is divided into the following key parts,
- Scanner a Scanner(
scanner.ts
) - Parser Parser(
parser.ts
) - Binder Binder(
binder.ts
) - The Checker Checker(
checker.ts
) - Emitter launcher(
emitter.ts
)
Each section has a separate file in the source file, and its role in the compilation process will be explained later.
An overview of
The figure above briefly illustrates how the TypeScript compiler puts these key pieces together:
- Source ~ scanner ~ Token Data Stream ~ Parser -> AST
- AST ~ Binder -> Symbols
- AST(Abstract Syntax tree) + Symbols ~ checker(checker) -> Type checking
- AST + Checker ~ Emitter -> JS code
Flow 1: Source => AST
Source ~ scanner ~ Token Data Stream ~ Parser -> AST
Typescript’s scanner is located in scanner.ts, and its parser is located in Parser.ts. Internally, the Scanner is controlled by the Parser, which converts source code into an abstract syntax tree (AST). The process is as follows:
By analogy with the common AST generation process, the scanner stage can be simply likened to the lexical analysis process and the parser stage to the grammatical analysis process.
For details about AST abstract syntax trees, see AST Abstract Syntax Trees.
The use of Scanner by the Parser
Set the initial state via parseSourceFile and hand the work to the parseSourceFileWorker function.
parseSourceFile
export function parseSourceFile(fileName: string, sourceText: string, languageVersion: ScriptTarget, syntaxCursor: IncrementalParser.SyntaxCursor | undefined, setParentNodes = false, scriptKind? : ScriptKind) :SourceFile {
scriptKind = ensureScriptKind(fileName, scriptKind);
// Initialization state
if (scriptKind === ScriptKind.JSON) {
const result = parseJsonText(fileName, sourceText, languageVersion, syntaxCursor, setParentNodes);
convertToObjectWorker(result, result.parseDiagnostics, /*returnValue*/ false./*knownRootOptions*/ undefined./*jsonConversionNotifier*/ undefined);
result.referencedFiles = emptyArray;
result.typeReferenceDirectives = emptyArray;
result.libReferenceDirectives = emptyArray;
result.amdDependencies = emptyArray;
result.hasNoDefaultLib = false;
result.pragmas = emptyMap;
return result;
}
// Prepare the scanner state
initializeState(sourceText, languageVersion, syntaxCursor, scriptKind);
// Pass the work to parseSourceFileWorker
const result = parseSourceFileWorker(fileName, languageVersion, setParentNodes, scriptKind);
clearState();
return result;
}
Copy the code
parseSourceFileWorker
The function creates a SourceFile AST node and then parses the source code from the parseStatement function. Once the result is returned, refine the SourceFile node with additional information (such as nodeCount, identifierCount, and so on).
function parseSourceFileWorker(fileName: string, languageVersion: ScriptTarget, setParentNodes: boolean, scriptKind: ScriptKind) :SourceFile {
const isDeclarationFile = isDeclarationFileName(fileName);
if (isDeclarationFile) {
contextFlags |= NodeFlags.Ambient;
}
// Create a SourceFile AST node
sourceFile = createSourceFile(fileName, languageVersion, scriptKind, isDeclarationFile);
sourceFile.flags = contextFlags;
// Prime the scanner.
nextToken();
// A member of ReadonlyArray<T> isn't assignable to a member of T[] (and prevents a direct cast) - but this is where we set up those members so they can be readonly in the future
processCommentPragmas(sourceFile as {} as PragmaContext, sourceText);
processPragmasIntoFields(sourceFile as {} as PragmaContext, reportPragmaDiagnostic);
// Call parseStatement to parse the source code
sourceFile.statements = parseList(ParsingContext.SourceElements, parseStatement);
Debug.assert(token() === SyntaxKind.EndOfFileToken);
// Lines 871 to 871 complete the Sourcefile AST node
sourceFile.endOfFileToken = addJSDocComment(parseTokenNode());
setExternalModuleIndicator(sourceFile);
sourceFile.nodeCount = nodeCount;
sourceFile.identifierCount = identifierCount;
sourceFile.identifiers = identifiers;
sourceFile.parseDiagnostics = parseDiagnostics;
if (setParentNodes) {
fixupParentReferences(sourceFile);
}
return sourceFile;
function reportPragmaDiagnostic(pos: number, end: number, diagnostic: DiagnosticMessage) { parseDiagnostics.push(createFileDiagnostic(sourceFile, pos, end, diagnostic)); }}Copy the code
Node creation: parseStatement/parseXXXX etc
The parseStatement function, which switches (calls the corresponding parseXXX function) based on the current token returned by the scanner to generate the AST node.
function parseStatement() :Statement {
// Token is the current token stream returned by scanner. SyntaxKind is the constant enumeration type of the AST. Different nodes are created according to different types
switch (token()) {
// If the type is SemicolonToken, call parseEmptyStatement
case SyntaxKind.SemicolonToken:
return parseEmptyStatement();
case SyntaxKind.OpenBraceToken:
return parseBlock(/*ignoreMissingOpenBrace*/ false);
case SyntaxKind.VarKeyword:
return parseVariableStatement(<VariableStatement>createNodeWithJSDoc(SyntaxKind.VariableDeclaration));
case SyntaxKind.LetKeyword:
if (isLetDeclaration()) {
return parseVariableStatement(<VariableStatement>createNodeWithJSDoc(SyntaxKind.VariableDeclaration));
}
break;
case SyntaxKind.FunctionKeyword:
return parseFunctionDeclaration(<FunctionDeclaration>createNodeWithJSDoc(SyntaxKind.FunctionDeclaration));
case SyntaxKind.ClassKeyword:
return parseClassDeclaration(<ClassDeclaration>createNodeWithJSDoc(SyntaxKind.ClassDeclaration));
case SyntaxKind.IfKeyword:
return parseIfStatement();
case SyntaxKind.DoKeyword:
return parseDoStatement();
case SyntaxKind.WhileKeyword:
return parseWhileStatement();
case SyntaxKind.ForKeyword:
return parseForOrForInOrForOfStatement();
case SyntaxKind.ContinueKeyword:
return parseBreakOrContinueStatement(SyntaxKind.ContinueStatement);
case SyntaxKind.BreakKeyword:
return parseBreakOrContinueStatement(SyntaxKind.BreakStatement);
case SyntaxKind.ReturnKeyword:
return parseReturnStatement();
case SyntaxKind.WithKeyword:
return parseWithStatement();
case SyntaxKind.SwitchKeyword:
return parseSwitchStatement();
case SyntaxKind.ThrowKeyword:
return parseThrowStatement();
case SyntaxKind.TryKeyword:
// Include 'catch' and 'finally' for error recovery.
case SyntaxKind.CatchKeyword:
case SyntaxKind.FinallyKeyword:
return parseTryStatement();
case SyntaxKind.DebuggerKeyword:
return parseDebuggerStatement();
case SyntaxKind.AtToken:
return parseDeclaration();
case SyntaxKind.AsyncKeyword:
case SyntaxKind.InterfaceKeyword:
case SyntaxKind.TypeKeyword:
case SyntaxKind.ModuleKeyword:
case SyntaxKind.NamespaceKeyword:
case SyntaxKind.DeclareKeyword:
case SyntaxKind.ConstKeyword:
case SyntaxKind.EnumKeyword:
case SyntaxKind.ExportKeyword:
case SyntaxKind.ImportKeyword:
case SyntaxKind.PrivateKeyword:
case SyntaxKind.ProtectedKeyword:
case SyntaxKind.PublicKeyword:
case SyntaxKind.AbstractKeyword:
case SyntaxKind.StaticKeyword:
case SyntaxKind.ReadonlyKeyword:
case SyntaxKind.GlobalKeyword:
if (isStartOfDeclaration()) {
return parseDeclaration();
}
break;
}
return parseExpressionOrLabeledStatement();
}
Copy the code
For example, if the current token is a semicolon (semicolon), paserEmptyStatement is called to create an AST node for the empty statement.
PaserEmptyStatement/parseIfStatement etc
function parseEmptyStatement() :Statement {
const node = <Statement>createNode(SyntaxKind.EmptyStatement);
parseExpected(SyntaxKind.SemicolonToken);
return finishNode(node);
}
function parseIfStatement() :IfStatement {
const node = <IfStatement>createNode(SyntaxKind.IfStatement);
parseExpected(SyntaxKind.IfKeyword);
parseExpected(SyntaxKind.OpenParenToken);
node.expression = allowInAnd(parseExpression);
parseExpected(SyntaxKind.CloseParenToken);
node.thenStatement = parseStatement();
node.elseStatement = parseOptional(SyntaxKind.ElseKeyword) ? parseStatement() : undefined;
return finishNode(node);
}
Copy the code
If you look at parseXXXX, there are three key functions createNode, parseExpected, and finishNode
createNode
function createNode(kind: SyntaxKind, pos? :number) :Node {
nodeCount++;
Call scanner startPos, 'Start position of whitespace before Current token')
constp = pos! > =0 ? pos! : scanner.getStartPos();
// Returns the node type
return isNodeKind(kind) || kind === SyntaxKind.Unknown ? new NodeConstructor(kind, p, p) :
kind === SyntaxKind.Identifier ? new IdentifierConstructor(kind, p, p) :
new TokenConstructor(kind, p, p);
}
Copy the code
parseExpected
function parseExpected(kind: SyntaxKind, diagnosticMessage? : DiagnosticMessage, shouldAdvance =true) :boolean {
// Check whether the current token is consistent with the current kind passed in
if (token() === kind) {
if (shouldAdvance) {
nextToken();
}
return true;
}
// If the token and kind are inconsistent, an error is returned according to whether or not diagnosticMessage is passed
if (diagnosticMessage) {
parseErrorAtCurrentToken(diagnosticMessage);
}
else {
parseErrorAtCurrentToken(Diagnostics._0_expected, tokenToString(kind));
}
return false;
}
Copy the code
finishNode
function finishNode<T extends Node> (node: T, end? :number) :T {
// Get the end position
node.end = end === undefined ? scanner.getStartPos() : end;
// Add tags
if (contextFlags) {
node.flags |= contextFlags;
}
// Determine if there is an error, if there is an error, no subsequent nodes will be marked.
if (parseErrorBeforeNextFinishedNode) {
parseErrorBeforeNextFinishedNode = false;
node.flags |= NodeFlags.ThisNodeHasError;
}
return node;
}
Copy the code
At this point, the AST is built.
To be continued…