(Again, for a quick and violent explanation of AST conversion, the compiled content uses “‘Hello’ + ‘World'” as an example.)
The Init method is a Stream class, but this section goes back to the scanner level.
/** * Scan * only involves the next_ pointer */
void Scanner::Scan() { Scan(next_); }
void Scanner::Scan(TokenDesc* next_desc) {
next_desc->token = ScanSingleToken();
/** * sets the end of the current lexical position */
next_desc->location.end_pos = source_pos();
}Copy the code
Although there are only two simple steps here (all CHECK and DEBUG content is cut out), this ScanSingleToken is enough. From the literal meaning of understanding, is the analysis of a single morphology, source code as follows.
/** * The ScanSingleToken method is too long */
V8_INLINE Token::Value Scanner::ScanSingleToken() {
Token::Value token;
do {
/** * sets the start position of the current morphology */
next().location.beg_pos = source_pos();
/** * Ascii code from 0 to 127 */
if (V8_LIKELY(static_cast<unsigned>(c0_) <= kMaxAscii)) {
/** * this is a mapping array * that maps all Unicode => Ascii */
token = one_char_tokens[c0_];
/** * contains a lot of cases... Tokens are processed according to Token types */
switch (token) {
case Token::LPAREN:
case Token::RPAREN:
// Other single symbols...
// One character tokens.
return Select(token);
case Token::STRING:
return ScanString();
/ / more...
default: UNREACHABLE(); }}/** * handles special cases such as terminators, Spaces, and exception symbols */
// ...
} while (token == Token::WHITESPACE);
return token;
}Copy the code
As a lexical parsing method, length is actually acceptable, and most of the case judgment has been removed, leaving STRING as this series focuses on compiling “‘Hello’ + ‘World'”.
Tell two points, the first is that source_pos, the location of the attribute and method is really many, relatively simple, take a look on the line.
/** * pos is moved to 1 *, while location needs to be started from the beginning, so an offset */ is made
static const int kCharacterLookaheadBufferSize = 1;
int source_pos(a) {
return static_cast<int>(source_->pos()) - kCharacterLookaheadBufferSize;
}Copy the code
Then that mapping array can give a little source, the source code is as follows.
/** * GetOneCharToken(0),GetOneCharToken(1)... GetOneCharToken(127) is called * IsDecimalDigit is responsible for determining whether it is a number * and IsAsciiIdentifier is responsible for determining whether it is an identifier, $, _, a-z, etc. * The one_char_tokens array finally generates the tokens with the subscript representing the Unicode encoding value representing the corresponding Token type */
#define INT_0_TO_127_LIST(V) \
V(0) V(1) V(2) V(3) V(4) V(5) V(6) V(7) V(8) V(9) \
// ...
V(120) V(121) V(122) V(123) V(124) V(125) V(126) V(127)
static const constexpr Token::Value one_char_tokens[128] = {
#define CALL_GET_SCAN_FLAGS(N) GetOneCharToken(N),
INT_0_TO_127_LIST(CALL_GET_SCAN_FLAGS)
#undef CALL_GET_SCAN_FLAGS
};
constexpr Token::Value GetOneCharToken(char c) {
// clang-format off
return
c == '(' ? Token::LPAREN :
c == ') ' ? Token::RPAREN :
// Other characters...
IsDecimalDigit(c) ? Token::NUMBER :
IsAsciiIdentifier(c) ? Token::IDENTIFIER :
Token::ILLEGAL;
}Copy the code
As mentioned earlier, c0_ represents the Unicode encoding of the current parsed character, so we look for the corresponding type directly through the array index. In this example, our character is a single quote, and the single quote type is as follows.
/** * both single and double quotation marks are recognized as string tokens *. Es6 template strings are special, so we don't have to deal with them */
c == '"' ? Token::STRING :
c == '\' ' ? Token::STRING :
c == '` ? Token::TEMPLATE_SPAN :Copy the code
Therefore, the current token is assigned to token ::STRING, so the case branch goes into ScanString’s methods. It’s a little bit more informative, but I’ll do it in the next one, lunch break.