Get a deeper understanding of Typescript from the compiler

preface

This article is a summary of the chapters in Understanding Typescript compilation.

This article has been included in my blog star✨!

The compiler

The Typescript compiler is divided into five key parts:

Scanner Scanner (scanner.ts)
Parser Parser (parser.ts)
Binder Binder (binder.ts)
Checker Checker Checker (checker.ts)
Emitters (Emitters. Ts)

The compiler code for each section can be found in SRC/Compiler, and this article explains each parser in detail. Before we begin, I found a diagram from the web that helps us understand how the compiler puts the key pieces together.

From the figure above, we can see that the compiler has three main lines:

Source code -> Scanner -> Token Stream -> Parser -> AST -> Binder -> Symbol
AST -> Inspector ~~ Symbol -> Type checking
AST -> Inspector ~~ emitter -> JS code

I’ll start by explaining how each parser works, and I’ll finish with an overview of each line.

The scanner

The source code for ts scanner is in scanner.ts. From the previous flowchart, we saw that the scanner’s role is to generate token streams from source code. Let’s go straight to the createScanner function that creates the scanner in Scanner.ts and read one by one. I’m going to cut out some of the code for you to understand the general process.

export function createScanner(languageVersion: ScriptTarget, skipTrivia: boolean, languageVariant = LanguageVariant.Standard, text? : string, onError? : ErrorCallback, start? : number, length? : number) :Scanner {
      let pos: number;
      let end: number;
      let startPos: number;
      let tokenPos: number;
      let token: SyntaxKind;
      let tokenValue: string;
      setText(text, start, length);
      // ...
      return {
          getStartPos: () = > startPos,
          getTextPos: () = > pos,
          getToken: () = > token,
          getTokenPos: () = > tokenPos,
          getTokenText: () = > text.substring(tokenPos, pos),
          getTokenValue: () = > tokenValue,
          // ...
          scan,
          // ...
      };

Copy the code

After we create the scanner using createScanner, we need to scan the source code, corresponding to the scan function in the source code. We continue to find the logic of the scan function, because the createScanner function only defines some functions, and there is no actual logical progression.

function scan() :SyntaxKind {
    startPos = pos;
    hasExtendedUnicodeEscape = false;
    precedingLineBreak = false;
    tokenIsUnterminated = false;
    numericLiteralFlags = 0;
    while (true) {
        tokenPos = pos;
        if (pos >= end) {
            return token = SyntaxKind.EndOfFileToken;
        }
        let ch = text.charCodeAt(pos);

        // Special handling for shebang
        if (ch === CharacterCodes.hash && pos === 0 && isShebangTrivia(text, pos)) {
            pos = scanShebangTrivia(text, pos);
            if (skipTrivia) {
                continue;
            }
            else {
                returntoken = SyntaxKind.ShebangTrivia; }}switch (ch) {
            case CharacterCodes.lineFeed:
            case CharacterCodes.carriageReturn:
                precedingLineBreak = true;
                if (skipTrivia) {
                    pos++;
                    continue;
                }
                else {
                    if (ch === CharacterCodes.carriageReturn && pos + 1 < end && text.charCodeAt(pos + 1) === CharacterCodes.lineFeed) {
                        // consume both CR and LF
                        pos += 2;
                    }
                    else {
                        pos++;
                    }
                    return token = SyntaxKind.NewLineTrivia;
                }
            case CharacterCodes.tab:
              // ...

Copy the code

The scan function returns a value of type SyntaxKind. By annotating token > Syntaxkind. Identifer => Token is a keyword in the source code, I found that it is necessary to generate tokens. In addition, it defines various keywords such as: return, super, switch… . Let’s consider this an enumeration of lexical keywords for a moment.

  // token > SyntaxKind.Identifer => token is a keyword
  // Also, If you add a new SyntaxKind be sure to keep the `Markers` section at the bottom in sync
  export const enum SyntaxKind {
      Unknown,
      EndOfFileToken,
      SingleLineCommentTrivia,
      MultiLineCommentTrivia,
      NewLineTrivia,
      WhitespaceTrivia,
      // We detect and preserve #! on the first line
      ShebangTrivia,
      // We detect and provide better error recovery when we encounter a git merge marker. This
      // allows us to edit files with git-conflict markers in them in a much more pleasant manner.
      ConflictMarkerTrivia,
      // Literals
      NumericLiteral,
      StringLiteral,
      JsxText,
      JsxTextAllWhiteSpaces,
      RegularExpressionLiteral,
      NoSubstitutionTemplateLiteral,
      // Pseudo-literals
      TemplateHead,
      TemplateMiddle,
      TemplateTail,
      // Punctuation
      OpenBraceToken,
      ReturnKeyword,
      SuperKeyword,
      SwitchKeyword,
      ThisKeyword,
      ThrowKeyword,
      TrueKeyword,
      TryKeyword,
      TypeOfKeyword,
      VarKeyword,
      VoidKeyword,
      WhileKeyword,
      WithKeyword,
      // ...
  }
Copy the code

Continue reading about the logic inside the SCAN function. Let ch = text.charcodeat (pos); This sentence is concerned with. The result of the scan is obtained by generating Unicode encoding. So we can draw a simple conclusion: the scanner through the input source code lexical analysis, the corresponding SyntaxKind, or “token”.

To verify this conclusion, we can create an example to test it simply:

Before starting the scan, we need to initialize some configuration, such as the string to scan, setting the standard version of the JS language, and so on. We then create a scanner using the createScanner function and retrieve the token by calling scan. As long as the token is not the ending token the scanner will keep scanning the entered string.

  import * as ts from 'ntypescript';

  const scanner = ts.createScanner(ts.ScriptTarget.Latest, true);

  function initializeState(text: string) {
      scanner.setText(text);
      scanner.setScriptTarget(ts.ScriptTarget.ES5);
      scanner.setLanguageVariant(ts.LanguageVariant.Standard);
  }

  const str = 'const foo = 123; '

  initializeState(str);

  var token = scanner.scan();

  while(token ! = ts.SyntaxKind.EndOfFileToken) {console.log(token);
      console.log(ts.formatSyntaxKind(token));
      token = scanner.scan();
  }
Copy the code

Run the code above:

 76
 ConstKeyword
 71
 Identifier
 58
 EqualsToken
 8
 NumericLiteral
 25
 SemicolonToken
Copy the code

Const foo = 123; Each part of the word generates a token. For a more intuitive explanation of the result of this scan, we use formatSyntaxKind and output the enumeration value corresponding to SyntaxKind. The following result can be obtained:

const -> ConstKeyword
foo -> Identifier
= -> EqualsToken
123 -> NumericLiteral
; -> SemicolonToken

The enumeration value corresponding to token is just right and corresponds to it. To prove that our inference is correct, the process is somewhat similar to lexical analysis. I’ve drawn a simple diagram that summarizes the main steps:

The parser

The token generated by the scanner in the first step provides the necessary conditions for the parser to generate the AST.

So in this section, we need to figure out how the tokens generated in the first step are converted to AST nodes. Let’s start with an example of generating an AST:

  import * as ts from 'ntypescript';
  function printAllChildren(node: ts.Node, depth = 0) {
      console.log(new Array(depth + 1).join(The '-'), ts.formatSyntaxKind(node.kind), node.pos, node.end);
      depth++;
      node.getChildren().forEach(c= > printAllChildren(c, depth));
  }
  var sourceCode = `const foo = 123; `;
  var sourceFile = ts.createSourceFile('foo.ts', sourceCode, ts.ScriptTarget.ES5, true);
  printAllChildren(sourceFile);
Copy the code

Run the above code to get:

  SourceFile 0 16
  ---- SyntaxList 0 16
  -------- VariableStatement 0 16
  ------------ VariableDeclarationList 0 15
  ---------------- ConstKeyword 0 5
  ---------------- SyntaxList 5 15
  -------------------- VariableDeclaration 5 15
  ------------------------ Identifier 5 9
  ------------------------ EqualsToken 9 11
  ------------------------ NumericLiteral 11 15
  ------------ SemicolonToken 15 16
  ---- EndOfFileToken 16 16
Copy the code

For those of you who have read about AST before, you can see at a glance that we actually print out an AST tree. The AST tree contains the following key information: 1. Type of the node 2. The starting position of a node. The node type corresponds just right to the enumeration of formatSyntaxKind, and the start position corresponds to Node.pos and Node.end. If you have any questions, you can proofread them one by one with the documents on MDN.

We see that generating the AST actually calls the createSourceFile function. We’ll start with the createSourceFile function from Parser. ts:

  export function createSourceFile(fileName: string, sourceText: string, languageVersion: ScriptTarget, setParentNodes = false, scriptKind? : ScriptKind) :SourceFile {
      performance.mark("beforeParse");
      const result = Parser.parseSourceFile(fileName, sourceText, languageVersion, /*syntaxCursor*/ undefined, setParentNodes, scriptKind);
      performance.mark("afterParse");
      performance.measure("Parse"."beforeParse"."afterParse");
      return result;
  }
Copy the code

We found two lines of code in the createSourceFile function: performance.mark(“beforeParse”); And the performance. Mark (” afterParse “). It marks before and after parsing. So the middle should be the analysis process of the serious classics. So let’s dig deeper into the parser. parseSourceFile function.

  export function parseSourceFile(fileName: string, sourceText: string, languageVersion: ScriptTarget, syntaxCursor: IncrementalParser.SyntaxCursor, setParentNodes? : boolean, scriptKind? : ScriptKind) :SourceFile {
      scriptKind = ensureScriptKind(fileName, scriptKind);
      initializeState(sourceText, languageVersion, syntaxCursor, scriptKind);
      const result = parseSourceFileWorker(fileName, languageVersion, setParentNodes, scriptKind);
      clearState();
      return result;
  }
Copy the code

First it initializes the state, which is just the first step facing our scanner. To make sure that we can understand each step, we check to see if initializeState is doing the preparatory work before scanning.

  function initializeState(_sourceText: string, languageVersion: ScriptTarget, _syntaxCursor: IncrementalParser.SyntaxCursor, scriptKind: ScriptKind) {
      // ...
      // Initialize and prime the scanner before parsing the source elements.
      scanner.setText(sourceText);
      scanner.setOnError(scanError);
      scanner.setScriptTarget(languageVersion);
      scanner.setLanguageVariant(getLanguageVariant(scriptKind));
  }
Copy the code

Ok, we have verified that it is indeed preparing for the scan. So let’s move on to parseSourceFileWorker.

  function parseSourceFileWorker(fileName: string, languageVersion: ScriptTarget, setParentNodes: boolean, scriptKind: ScriptKind) :SourceFile {
      sourceFile = createSourceFile(fileName, languageVersion, scriptKind);
      sourceFile.flags = contextFlags;

      // Prime the scanner.
      nextToken();
      processReferenceComments(sourceFile);

      sourceFile.statements = parseList(ParsingContext.SourceElements, parseStatement);
      Debug.assert(token() === SyntaxKind.EndOfFileToken);
      sourceFile.endOfFileToken = addJSDocComment(parseTokenNode() as EndOfFileToken);

      setExternalModuleIndicator(sourceFile);

      sourceFile.nodeCount = nodeCount;
      sourceFile.identifierCount = identifierCount;
      sourceFile.identifiers = identifiers;
      sourceFile.parseDiagnostics = parseDiagnostics;

      if (setParentNodes) {
          fixupParentReferences(sourceFile);
      }

      return sourceFile;
  }
Copy the code

1. CreateSourceFile creates the parse target for us
1. Performs token replacement for nextToken() newly scannedcurrentToken
1. Execute processReferenceComments to generate various pieces of information for each range (including starting and ending points)

 function processReferenceComments(sourceFile: SourceFile) :void {
     const triviaScanner = createScanner(sourceFile.languageVersion, /*skipTrivia*/ false, LanguageVariant.Standard, sourceText);
     while (true) {
         const kind = triviaScanner.scan();

         const range = {
             kind: <SyntaxKind.SingleLineCommentTrivia | SyntaxKind.MultiLineCommentTrivia>triviaScanner.getToken(), pos: triviaScanner.getTokenPos(), end: triviaScanner.getTextPos(), }; const comment = sourceText.substring(range.pos, range.end); else { const amdModuleNameRegEx = /^\/\/\/\s*<amd-module\s+name\s*=\s*('|")(.+?) \1/gim; const amdModuleNameMatchResult = amdModuleNameRegEx.exec(comment); if (amdModuleNameMatchResult) { if (amdModuleName) { parseDiagnostics.push(createFileDiagnostic(sourceFile, range.pos, range.end - range.pos, Diagnostics.An_AMD_module_cannot_have_multiple_name_assignments)); } amdModuleName = amdModuleNameMatchResult[2]; } const amdDependencyRegEx = /^\/\/\/\s*<amd-dependency\s/gim; const pathRegex = /\spath\s*=\s*('|")(.+?) \1/gim; const nameRegex = /\sname\s*=\s*('|")(.+?) \1/gim; const amdDependencyMatchResult = amdDependencyRegEx.exec(comment); if (amdDependencyMatchResult) { const pathMatchResult = pathRegex.exec(comment); const nameMatchResult = nameRegex.exec(comment); if (pathMatchResult) { const amdDependency = { path: pathMatchResult[2], name: nameMatchResult ? nameMatchResult[2] : undefined }; amdDependencies.push(amdDependency); } } const checkJsDirectiveRegEx = /^\/\/\/? \s*(@ts-check|@ts-nocheck)\s*$/gim; const checkJsDirectiveMatchResult = checkJsDirectiveRegEx.exec(comment); if (checkJsDirectiveMatchResult) { checkJsDirective = { enabled: compareStrings(checkJsDirectiveMatchResult[1], "@ts-check", /*ignoreCase*/ true) === Comparison.EqualTo, end: range.end, pos: range.pos }; } } } sourceFile.referencedFiles = referencedFiles; sourceFile.typeReferenceDirectives = typeReferenceDirectives; sourceFile.amdDependencies = amdDependencies; sourceFile.moduleName = amdModuleName; sourceFile.checkJsDirective = checkJsDirective; }Copy the code

The parseList function, we found that the returnsresultFinally byparseListElementSo let’s go ahead and see.

function parseList<T extends Node> (kind: ParsingContext, parseElement: () => T) :NodeArray<T> {
const saveParsingContext = parsingContext;
parsingContext |= 1 << kind;
const result = createNodeArray<T>();

while(! isListTerminator(kind)) {if (isListElement(kind, /*inErrorRecovery*/ false)) {
        const element = parseListElement(kind, parseElement);
        result.push(element);

        continue;
    }

    if (abortParsingListOrMoveToNextToken(kind)) {
        break;
    }
}

result.end = getNodeEnd();
parsingContext = saveParsingContext;
return result;
}
Copy the code

ParseListElement: Get to the end and find the final result is passedparseElementTo make up my mind, and I’m confused.

function parseListElement<T extends Node> (parsingContext: ParsingContext, parseElement: () => T) :T {
    const node = currentNode(parsingContext);
    if (node) {
        return <T>consumeNode(node);
    }

    return parseElement();
}
Copy the code

1. parseElement

  function parseStatement() :Statement {
      switch (token()) {
          case SyntaxKind.SemicolonToken:
              return parseEmptyStatement();
          case SyntaxKind.OpenBraceToken:
              return parseBlock(/*ignoreMissingOpenBrace*/ false);
          case SyntaxKind.VarKeyword:
              return parseVariableStatement(scanner.getStartPos(), /*decorators*/ undefined./*modifiers*/ undefined);
          // ...
              break;
      }
      return parseExpressionOrLabeledStatement();
  }
Copy the code

We seem to be getting there! In parseStatement, tokens are switched, and different nodes are obtained based on different tokens. For example, we start with the final; Let’s make a simple judgment. In the first place. Corresponding token value should be SyntaxKind. SemicolonToken just right is the first condition judgment. So let’s go to the next step and see what does the parseEmptyStatement function actually do?

1. parseEmptyStatement

function parseEmptyStatement() :Statement {
     const node = <Statement>createNode(SyntaxKind.EmptyStatement);
     parseExpected(SyntaxKind.SemicolonToken);
     return finishNode(node);
 }
Copy the code

We can observe that the createNode function is where the node is actually created for us.

 function createNode<TKind extends SyntaxKind> (kind: TKind, pos? : number) :Node | Token<TKind> | Identifier {
     nodeCount++;
     if(! (pos >=0)) {
         pos = scanner.getStartPos();
     }

     return isNodeKind(kind) ? new NodeConstructor(kind, pos, pos) :
         kind === SyntaxKind.Identifier ? new IdentifierConstructor(kind, pos, pos) :
             new TokenConstructor(kind, pos, pos);
 }
Copy the code

The createNode is responsible for creating the node, setting the SyntaxKind passed in, and the initial position (by default, the position information provided by the current scanner state is used). ParseExpected will check to see if the current token in the parser state matches the specified SyntaxKind. An error report will be generated if there is no match.

  function parseExpected(kind: SyntaxKind, diagnosticMessage? : DiagnosticMessage, shouldAdvance =true) :boolean {
      if (token() === kind) {
          if (shouldAdvance) {
              nextToken();
          }
          return true;
      }

      // Report specific message if provided with one. Otherwise, report generic fallback message.
      if (diagnosticMessage) {
          parseErrorAtCurrentToken(diagnosticMessage);
      }
      else {
          parseErrorAtCurrentToken(Diagnostics._0_expected, tokenToString(kind));
      }
      return false;
  }
Copy the code

The final step finishNode will set the end position of the node. It also adds contextFlags for the context and any errors that occur before parsing the node (if there are any errors, the AST node cannot be reused in incremental parsing).

  function finishNode<T extends Node> (node: T, end? : number) :T {
      node.end = end === undefined ? scanner.getStartPos() : end;

      if (contextFlags) {
          node.flags |= contextFlags;
      }

      // Keep track on the node if we encountered an error while parsing it. If we did, then
      // we cannot reuse the node incrementally. Once we've marked this node, clear out the
      // flag so that we don't mark any subsequent nodes.
      if (parseErrorBeforeNextFinishedNode) {
          parseErrorBeforeNextFinishedNode = false;
          node.flags |= NodeFlags.ThisNodeHasError;
      }

      return node;
  }
Copy the code

We have gone through the whole process of the parser, according to the old rules, we draw a simple flow chart, replay the process of just ~

binder

Most transpilers are simpler than TypeScript because they provide few means of code analysis. A typical JavaScript converter has only the following flow:

Tokens | Tokens | Parses | AST | Transmitters | JavaScriptCopy the code

While this architecture does help simplify the understanding that TypeScript generates JavaScript, one key feature is missing: TypeScript’s semantic system. To assist in type checking, the binder connects the pieces of the source code into a related type system that the inspector can use. The main responsibility of the binder is to create Symbols.

symbol

Symbols connect declaration nodes in the AST to the same entity as other declarations. Symbols are the basic building blocks of semantic systems. So what does the symbol look like?

  function Symbol(flags: SymbolFlags, name: string) {
  this.flags = flags;
  this.name = name;
  this.declarations = undefined;
  }
Copy the code

An enumeration of flags is used to identify additional symbol categories (e.g. For details, see the enumeration definition for SymbolFlags in Compiler/Types.

Create symbols and bind nodes

First we go to bindSourceFile in bind.ts. Once again, before each parsing and binding, the source code identifies the current operation with performance.mark(). We’ll look at the binder functions beforeBind and beforeBind after.

bindSourceFile

 export function bindSourceFile(file: SourceFile, options: CompilerOptions) {
     performance.mark("beforeBind");
     binder(file, options);
     performance.mark("afterBind");
     performance.measure("Bind"."beforeBind"."afterBind");
 }
Copy the code

binder

 const binder = createBinder();
Copy the code

createBinder

 function createBinder() : (file: SourceFile, options: CompilerOptions) = >void {

  function bindSourceFile(f: SourceFile, opts: CompilerOptions) {
         file = f;
         options = opts;
         languageVersion = getEmitScriptTarget(options);
         inStrictMode = bindInStrictMode(file, opts);
         classifiableNames = createMap<string>();
         symbolCount = 0;
         skipTransformFlagAggregation = file.isDeclarationFile;

         Symbol = objectAllocator.getSymbolConstructor();

         if(! file.locals) { bind(file); file.symbolCount = symbolCount; file.classifiableNames = classifiableNames; }// ...
     }

     return bindSourceFile;
     // ...
 }

Copy the code

Since the createBinder function body is so long, I cut the most important part. At first I thought that such a long piece of code must be implemented, but the binding is still in the bind function. BindSourceFile checks to see if file.locals is defined, and if not, submits to bind. The file is not defined in the first place, so we use the logic in bind.

bind

     function bind(node: Node) :void {
     if(! node) {return;
     }
     node.parent = parent;
     const saveInStrictMode = inStrictMode;

     // Even though in the AST the jsdoc @typedef node belongs to the current node,
     // its symbol might be in the same scope with the current node's symbol. Consider:
     //
     // /** @typedef {string | number} MyType */
     // function foo();
     //
     // Here the current node is "foo", which is a container, but the scope of "MyType" should
     // not be inside "foo". Therefore we always bind @typedef before bind the parent node,
     // and skip binding this tag later when binding all the other jsdoc tags.
     if (isInJavaScriptFile(node)) bindJSDocTypedefTagIfAny(node);

     // First we bind declaration nodes to a symbol if possible. We'll both create a symbol
     // and then potentially add the symbol to an appropriate symbol table. Possible
     // destination symbol tables are:
     //
     // 1) The 'exports' table of the current container's symbol.
     // 2) The 'members' table of the current container's symbol.
     // 3) The 'locals' table of the current container.
     //
     // However, not all symbols will end up in any of these tables. 'Anonymous' symbols
     // (like TypeLiterals for example) will not be put in any table.
     bindWorker(node);
     // Then we recurse into the children of the node to bind them as well. For certain
     // symbols we do specialized work when we recurse. For example, we'll keep track of
     // the current 'container' node when it changes. This helps us know which symbol table
     // a local should go into for example. Since terminal nodes are known not to have
     // children, as an optimization we don't process those.
     if (node.kind > SyntaxKind.LastToken) {
         const saveParent = parent;
         parent = node;
         const containerFlags = getContainerFlags(node);
         if (containerFlags === ContainerFlags.None) {
             bindChildren(node);
         }
         else {
             bindContainer(node, containerFlags);
         }
         parent = saveParent;
     }
     else if(! skipTransformFlagAggregation && (node.transformFlags & TransformFlags.HasComputedFlags) ===0) {
         subtreeTransformFlags |= computeTransformFlagsForNode(node, 0);
     }
         inStrictMode = saveInStrictMode;
     }
Copy the code

My experience with the source code must be important once I see so many comments in the bind function. First, it adds a parent to the current node, then calls bindWorker to call the corresponding binding function for each node, and finally calls bindChildren to bind each child node of the current node. Inside bindChildren, each node is also bound by a recursive call to bind. As for bindContainer, it can be concluded from the comments that bindContainer does the same binding for some special nodes, such as exports, members, locals, etc

bindWorker

function bindWorker(node: Node) {
 switch (node.kind) {
     case SyntaxKind.Identifier:
         if ((<Identifier>node).isInJSDocNamespace) {
             let parentNode = node.parent;
             while(parentNode && parentNode.kind ! == SyntaxKind.JSDocTypedefTag) { parentNode = parentNode.parent; } bindBlockScopedDeclaration(<Declaration>parentNode, SymbolFlags.TypeAlias, SymbolFlags.TypeAliasExcludes);break;
         }
         case SyntaxKind.ThisKeyword:
         if (currentFlow && (isExpression(node) || parent.kind === SyntaxKind.ShorthandPropertyAssignment)) {
             node.flowNode = currentFlow;
         }
         return checkStrictModeIdentifier(<Identifier>node);

     // ...
 }
Copy the code

I truncated part of the code, because what bindWorker does is bind according to Node. kind (SyntaxKind type), and delegate the work to the corresponding bindXXX function for the actual binding operation. Let’s take Identifier for example. What did see bindBlockScopedDeclaration ❓

bindBlockScopedDeclaration


 function bindBlockScopedDeclaration(node: Declaration, symbolFlags: SymbolFlags, symbolExcludes: SymbolFlags) {
     switch (blockScopeContainer.kind) {
         case SyntaxKind.ModuleDeclaration:
             declareModuleMember(node, symbolFlags, symbolExcludes);
             break;
         case SyntaxKind.SourceFile:
             if (isExternalModule(<SourceFile>container)) {
                 declareModuleMember(node, symbolFlags, symbolExcludes);
                 break;
             }
         // falls through
         default:
             if(! blockScopeContainer.locals) { blockScopeContainer.locals = createMap<Symbol> (); addToContainerChain(blockScopeContainer); } declareSymbol(blockScopeContainer.locals,/*parent*/ undefined, node, symbolFlags, symbolExcludes); }}Copy the code

DeclareSymbol (declareModuleMember); declareSymbol (declareSymbol); declareSymbol (declareModuleMember); declareSymbol (declareSymbol);

declareSymbol

 function declareSymbol(symbolTable: SymbolTable, parent: Symbol, node: Declaration, includes: SymbolFlags, excludes: SymbolFlags) :Symbol { Debug.assert(! hasDynamicName(node));const isDefaultExport = hasModifier(node, ModifierFlags.Default);

     // The exported symbol for an export default function/class node is always named "default"
     const name = isDefaultExport && parent ? "default" : getDeclarationName(node);

     let symbol: Symbol;
     if (name === undefined) {
         symbol = createSymbol(SymbolFlags.None, "__missing");
     }
     else {
     symbol = symbolTable.get(name);
         addDeclarationToSymbol(symbol, node, includes);
         symbol.parent = parent;
     // ..
     return symbol;
 }
Copy the code

CreateSymbol 2. AddDeclarationToSymbol

createSymbol

 function createSymbol(flags: SymbolFlags, name: string) :Symbol {
         symbolCount++;
         return new Symbol(flags, name);
 }
Copy the code

CreateSymbol basically simply updates symbolCount (a local variable of bindSourceFile) and creates symbols with the specified parameters. Once the symbol is created, you need to bind the node.

addDeclarationToSymbol

 function addDeclarationToSymbol(symbol: Symbol, node: Declaration, symbolFlags: SymbolFlags) {
     symbol.flags |= symbolFlags;

     node.symbol = symbol;

     if(! symbol.declarations) { symbol.declarations = []; } symbol.declarations.push(node);// ...
 }
Copy the code

AddDeclarationToSymbol function basically does two things: 1. Create a link between the AST node and symbol (node.symbol = symbol;) 2. Add a declaration (symbol.declaration.push (node);) to the node. .

Now that the most important work of the binder is done, let’s draw a simple flow chart to walk through the symbol creation process:

Thus declared our first route has been completed:

Source code -> Scanner -> Token Stream -> Parser -> AST -> Binder ->Symbol(symbols)Copy the code

Let’s look at the remaining two routes: type checking and code firing.

The viewer

Program use of checkers

To begin the source analysis, we need to know that the inspector is initialized by the program and that the bindSourceFile in the binder is started by the inspector. To simplify, take a look at the program’s call stack:

GetTypeChecker -> ts. CreateTypeChecker -> initializeTypeChecker -> initializeTypeChecker ->for each SourceFile `ts.bindSourceFile`(in the binder)/ / then
          for each SourceFile `ts.mergeSymbolTable`(In the inspector)Copy the code

I can see that initializeTypeChecker calls the bindSourceFile of the binder and mergeSymbolTable of the checker itself

Verify that the call stack is correct

    function initializeTypeChecker() {
         // Bind all source files and propagate errors
         for (const file of host.getSourceFiles()) {
             bindSourceFile(file, compilerOptions);
         }
         // Initialize global symbol table
         let augmentations: LiteralExpression[][];
         for (const file of host.getSourceFiles()) {
             if(! isExternalOrCommonJsModule(file)) { mergeSymbolTable(globals, file.locals); }// ...
         }
         // ...
     }

Copy the code

Looking at the source code in the inspector, we did verify the stack call process described above. BindSourceFile is called and mergeSymbolTable is called.

Analysis mergeSymbolTable

In the last section, we did an analysis of bindSourceFile, and finally created a symbol for each node to connect each node into a related type system. What mergeSymbolTable does is merge all the global symbols into the let globals: SymbolTable = {} SymbolTable. All future type checks can be verified on global.

 function mergeSymbolTable(target: SymbolTable, source: SymbolTable) {
          source.forEach((sourceSymbol, id) = > {
              let targetSymbol = target.get(id);
              if(! targetSymbol) { target.set(id, sourceSymbol); }else {
                  if (!(targetSymbol.flags & SymbolFlags.Transient)) {
                      targetSymbol = cloneSymbol(targetSymbol);
                      target.set(id, targetSymbol);
                  }
                  mergeSymbol(targetSymbol, sourceSymbol);
              }
          });
      }
Copy the code

     function mergeSymbol(target: Symbol, source: Symbol) {
          if(! (target.flags & getExcludedSymbolFlags(source.flags))) {if(source.flags & SymbolFlags.ValueModule && target.flags & SymbolFlags.ValueModule && target.constEnumOnlyModule && ! source.constEnumOnlyModule) {// reset flag when merging instantiated module into value module that has only const enums
                  target.constEnumOnlyModule = false;
              }
              target.flags |= source.flags;
              if(source.valueDeclaration && (! target.valueDeclaration || (target.valueDeclaration.kind === SyntaxKind.ModuleDeclaration && source.valueDeclaration.kind ! == SyntaxKind.ModuleDeclaration))) {// other kinds of value declarations take precedence over modules
                  target.valueDeclaration = source.valueDeclaration;
              }
              addRange(target.declarations, source.declarations);
              if (source.members) {
                  if(! target.members) target.members = createMap<Symbol> (); mergeSymbolTable(target.members, source.members); }if (source.exports) {
                  if(! target.exports) target.exports = createMap<Symbol> (); mergeSymbolTable(target.exports, source.exports); } recordMergedSymbol(target, source); }else if (target.flags & SymbolFlags.NamespaceModule) {
              error(getNameOfDeclaration(source.declarations[0]), Diagnostics.Cannot_augment_module_0_with_value_exports_because_it_resolves_to_a_non_module_entity, symbolToString(target));
          }
          else {
              const message = target.flags & SymbolFlags.BlockScopedVariable || source.flags & SymbolFlags.BlockScopedVariable
                  ? Diagnostics.Cannot_redeclare_block_scoped_variable_0 : Diagnostics.Duplicate_identifier_0;
              forEach(source.declarations, node= > {
                  error(getNameOfDeclaration(node) || node, message, symbolToString(source));
              });
              forEach(target.declarations, node= >{ error(getNameOfDeclaration(node) || node, message, symbolToString(source)); }); }}Copy the code

Type checking

The real type checking happens when getDiagnostics is called. When this function is called (such as by program.emit request), the inspector returns an EmitResolver (obtained by the Program calling the inspector’s getEmitResolver function), EmitResolver is a collection of local functions of createTypeChecker. Let’s proceed with a step-by-step analysis of how the getDiagnostics ““ function does type checking

getDiagnostics

 function getDiagnostics(sourceFile: SourceFile, ct: CancellationToken) :Diagnostic[] {
     try {
         cancellationToken = ct;
         return getDiagnosticsWorker(sourceFile);
     }
     finally {
         cancellationToken = undefined; }}Copy the code

Without time to explain, let’s just go to getDiagnosticsWorker

getDiagnosticsWorker

   function getDiagnosticsWorker(sourceFile: SourceFile) :Diagnostic[] {
          throwIfNonDiagnosticsProducing();
          if (sourceFile) {
              // ..
              checkSourceFile(sourceFile);
              // ..
              const semanticDiagnostics = diagnostics.getDiagnostics(sourceFile.fileName);
              // ..
              return semanticDiagnostics;
          }
          forEach(host.getSourceFiles(), checkSourceFile);
          return diagnostics.getDiagnostics();
      }

Copy the code

Remove all irrelevant things, we found a small recursion, checkSourceFile if sourceFile exist for operation, otherwise it into diagnostics. GetDiagnostics () do it again.

checkSourceFile

  function checkSourceFile(node: SourceFile) {
      performance.mark("beforeCheck");
      checkSourceFileWorker(node);
      performance.mark("afterCheck");
      performance.measure("Check"."beforeCheck"."afterCheck");
  }
Copy the code

CheckSourceFileWorker = checkSourceFileWorker = checkSourceFileWorker = checkSourceFileWorker

checkSourceFileWorker


  function checkSourceFileWorker(node: SourceFile) {
         const links = getNodeLinks(node);
         if(! (links.flags & NodeCheckFlags.TypeChecked)) {if (compilerOptions.skipLibCheck && node.isDeclarationFile || compilerOptions.skipDefaultLibCheck && node.hasNoDefaultLib) {
                 return;
             }
             // Grammar checking
             checkGrammarSourceFile(node);

             forEach(node.statements, checkSourceElement);

             checkDeferredNodes();

             if (isExternalModule(node)) {
               registerForUnusedIdentifiersCheck(node);
             }

             if(! node.isDeclarationFile) { checkUnusedIdentifiers(); }if (isExternalOrCommonJsModule(node)) {
                 checkExternalModuleExports(node);
             }
             // ...links.flags |= NodeCheckFlags.TypeChecked; }}Copy the code

The checkSourceFileWorker function contains various check operations, such as: CheckGrammarSourceFile, checkDeferredNodes, registerForUnusedIdentifiersCheck… Isn’t that what we’re looking for? Let’s just pick one of them and keep digging.

checkGrammarSourceFile

 function checkGrammarSourceFile(node: SourceFile) :boolean {
         return isInAmbientContext(node) && checkGrammarTopLevelElementsForRequiredDeclareModifier(node);
 }
Copy the code

So we’re going to check whether the node is an Inambient context. We don’t care. We’re done with him.

checkGrammarTopLevelElementsForRequiredDeclareModifier

 function checkGrammarTopLevelElementsForRequiredDeclareModifier(file: SourceFile) :boolean {
         for (const decl of file.statements) {
             if (isDeclaration(decl) || decl.kind === SyntaxKind.VariableStatement) {
                 if (checkGrammarTopLevelElementForRequiredDeclareModifier(decl)) {
                     return true; }}}}Copy the code

checkGrammarTopLevelElementForRequiredDeclareModifier

function checkGrammarTopLevelElementForRequiredDeclareModifier(node: Node) :boolean {
      if (node.kind === SyntaxKind.InterfaceDeclaration ||
          node.kind === SyntaxKind.TypeAliasDeclaration ||
          node.kind === SyntaxKind.ImportDeclaration ||
          node.kind === SyntaxKind.ImportEqualsDeclaration ||
          node.kind === SyntaxKind.ExportDeclaration ||
          node.kind === SyntaxKind.ExportAssignment ||
          node.kind === SyntaxKind.NamespaceExportDeclaration ||
          getModifierFlags(node) & (ModifierFlags.Ambient | ModifierFlags.Export | ModifierFlags.Default)) {
              return false;
      }

      return grammarErrorOnFirstToken(node, Diagnostics.A_declare_modifier_is_required_for_a_top_level_declaration_in_a_d_ts_file);
   }
Copy the code

grammarErrorOnFirstToken

  function grammarErrorOnFirstToken(node: Node, message: DiagnosticMessage, arg0? : any, arg1? : any, arg2? : any) :boolean {
          const sourceFile = getSourceFileOfNode(node);
          if(! hasParseDiagnostics(sourceFile)) {const span = getSpanOfTokenAtPosition(sourceFile, node.pos);
              diagnostics.add(createFileDiagnostic(sourceFile, span.start, span.length, message, arg0, arg1, arg2));
              return true; }}Copy the code

createFileDiagnostic

  export function createFileDiagnostic(file: SourceFile, start: number, length: number, message: DiagnosticMessage) :Diagnostic {
          const end = start + length;

          Debug.assert(start >= 0."start must be non-negative, is " + start);
          Debug.assert(length >= 0."length must be non-negative, is " + length);

          if (file) {
              Debug.assert(start <= file.text.length, `start must be within the bounds of the file. ${start} > ${file.text.length}`);
              Debug.assert(end <= file.text.length, `end must be the bounds of the file. ${end} > ${file.text.length}`);
          }

          let text = getLocaleSpecificMessage(message);

          if (arguments.length > 4) {
              text = formatStringFromArgs(text, arguments.4);
          }

          return {
              file,
              start,
              length,

              messageText: text,
              category: message.category,
              code: message.code,
          };
      }
Copy the code

Finally, we see that the final type validation is thrown through the debug.assert function. I won’t go into what assert does here. Interested students can view the source code. Inspector source summary: it is based on our generated AST node declaration start node position of the string passed in to do position type syntax and so on checksum exception thrown.

To conclude our validator, we draw a simple diagram:

That’s the end of our second leg of the route:

AST -> Inspector ~~Symbol(symbol) -> Type checkingCopy the code

The emitter

The TypeScript compiler provides two emitters:

Emitters. Ts: This is ts -> JavaScript emitters
DeclarationEmitter. Ts: Used to create declaration files for TypeScript source files (.ts)

Program use of emitters: Program provides an EMIT function. This function primarily delegates functionality to emitFiles in Emitter.ts. Here is the call stack:

  Program.emit ->
  `emitWorker`(createProgram in program.ts) ->`emitFiles`(Functions in Emitters. Ts)Copy the code

emitFiles

   export function emitFiles(resolver: EmitResolver, host: EmitHost, targetSourceFile: SourceFile, emitOnlyDtsFiles? : boolean, transformers? : TransformerFactory
       
        []
       ) :EmitResult {
      const compilerOptions = host.getCompilerOptions();
      const moduleKind = getEmitModuleKind(compilerOptions);
      const sourceMapDataList: SourceMapData[] = compilerOptions.sourceMap || compilerOptions.inlineSourceMap ? [] : undefined;
      const emittedFilesList: string[] = compilerOptions.listEmittedFiles ? [] : undefined;
      const emitterDiagnostics = createDiagnosticCollection();
      const newLine = host.getNewLine();
      const writer = createTextWriter(newLine);
      const sourceMap = createSourceMapWriter(host, writer);

      let currentSourceFile: SourceFile;
      let bundledHelpers: Map<boolean>;
      let isOwnFileEmit: boolean;
      let emitSkipped = false;

      const sourceFiles = getSourceFilesToEmit(host, targetSourceFile);

      // Transform the source files
      const transform = transformNodes(resolver, host, compilerOptions, sourceFiles, transformers, /*allowDtsFiles*/ false);

      // Create a printer to print the nodes
      const printer = createPrinter();

      // Emit each output file
      performance.mark("beforePrint");
      forEachEmittedFile(host, emitSourceFileOrBundle, transform.transformed, emitOnlyDtsFiles);
      performance.measure("printTime"."beforePrint");

      // Clean up emit nodes on parse tree
      transform.dispose();

      return {
          emitSkipped,
          diagnostics: emitterDiagnostics.getDiagnostics(),
          emittedFiles: emittedFilesList,
          sourceMaps: sourceMapDataList
      };

      function emitSourceFileOrBundle({ jsFilePath, sourceMapFilePath, declarationFilePath }: EmitFileNames, sourceFileOrBundle: SourceFile | Bundle) {}function printSourceFileOrBundle(jsFilePath: string, sourceMapFilePath: string, sourceFileOrBundle: SourceFile | Bundle) {}function setSourceFile(node: SourceFile) {}function emitHelpers(node: Node, writeLines: (text: string) => void) {}}Copy the code

It basically sets up a bunch of local variables and functions (which make up most of emitSourceFile), and then hands the local function emitSourceFile the text. The emitSourceFile function sets currentSourceFile and gives it to the local function to emit.

emit

 function emit(node: Node) {
          pipelineEmitWithNotification(EmitHint.Unspecified, node);
  }
Copy the code

pipelineEmitWithHint

The emit triggered functions were wrapped one by one and finally stuck in pipelineEmitWithHint after many layers of troubleshooting. Emits different code through different hints.


      function pipelineEmitWithNotification(hint: EmitHint, node: Node) {
          if (onEmitNode) {
              onEmitNode(hint, node, pipelineEmitWithComments);
          }
          else{ pipelineEmitWithComments(hint, node); }}function pipelineEmitWithComments(hint: EmitHint, node: Node) {
          node = trySubstituteNode(hint, node);
          if(emitNodeWithComments && hint ! == EmitHint.SourceFile) { emitNodeWithComments(hint, node, pipelineEmitWithSourceMap); }else{ pipelineEmitWithSourceMap(hint, node); }}function pipelineEmitWithSourceMap(hint: EmitHint, node: Node) {
          if(onEmitSourceMapOfNode && hint ! == EmitHint.SourceFile && hint ! == EmitHint.IdentifierName) { onEmitSourceMapOfNode(hint, node, pipelineEmitWithHint); }else{ pipelineEmitWithHint(hint, node); }}function pipelineEmitWithHint(hint: EmitHint, node: Node) :void {
          switch (hint) {
              case EmitHint.SourceFile: return pipelineEmitSourceFile(node);
              case EmitHint.IdentifierName: return pipelineEmitIdentifierName(node);
              case EmitHint.Expression: return pipelineEmitExpression(node);
              case EmitHint.Unspecified: returnpipelineEmitUnspecified(node); }}Copy the code

pipelineEmitSourceFile

If the hint initially passed in is Unspecified, it is determined in pipelineEmitUnspecified based on the node’s Kind.

 function pipelineEmitUnspecified(node: Node) :void {
      const kind = node.kind;

      // Reserved words
      // Strict mode reserved words
      // Contextual keywords
      if (isKeyword(kind)) {
          writeTokenNode(node);
          return;
      }

      switch (kind) {
          // Pseudo-literals
          case SyntaxKind.TemplateHead:
          case SyntaxKind.TemplateMiddle:
          case SyntaxKind.TemplateTail:
              returnemitLiteral(<LiteralExpression>node); }}Copy the code

emitLiteral

For example, if our node type is TemplateHead, the emitLiteral function is executed to emit code.

  function emitLiteral(node: LiteralLikeNode) {
         const text = getLiteralTextOfNode(node);
         if ((printerOptions.sourceMap || printerOptions.inlineSourceMap)
             && (node.kind === SyntaxKind.StringLiteral || isTemplateLiteralKind(node.kind))) {
             writer.writeLiteral(text);
         }
         else{ write(text); }}Copy the code

Everything else is pretty much the same. Let’s sum it up:

conclusion

In the process of debugging the source code, we recommend a vscode plug-in Bookmarks. It helps mark and locate our key code.

This article is a personal reading of the source code of some thinking, I think the biggest reference value of this article is to comb the main process of each compiler.

Refer to the article

Understand typescript in depth

Typescript Compilation Principles (-)