The full text is 12003 words, and the expected reading time is 24 minutes

When the main client App process creates a WKWebView object, two other child processes are created: the renderer process and the network process. When the main process WKWebView initiates a request, it first forwards the request to the renderer process, which then forwards it to the network process, which requests the server. If a web page is requested, the web process will stream the server’s response data HTML file characters to the renderer. Rendering process to get HTML file character stream, first to parse, HTML file character stream into DOM tree, and then on the basis of DOM tree, rendering operations, that is, layout, drawing. Finally, the renderer tells the main WKWebView process to create the corresponding View presentation View. The whole process is shown in the figure below:

What is a DOM tree

The renderer gets the HTML file character stream and converts the HTML file character stream into a DOM tree. The image below shows an HTML file on the left and a CONVERTED DOM tree on the right.

You can see that the root node of the DOM tree is HTMLDocument, which represents the entire document. The child nodes below the root node correspond one to one to the tags in the HTML file, for example, the tags in the HTML correspond to the head nodes in the DOM tree. At the same time, the text in the HTML file becomes a node in the DOM tree, such as the text ‘Hello, World! ‘, becomes a child of the DIV node in the DOM tree.

Each node in the DOM tree is an object with certain methods and attributes, which are created by the corresponding class. For example, the HTMLDocument node, its corresponding class is class HTMLDocument, the following is part of the source of HTMLDocument:

Class HTMLDocument: public Document {// Inherit from Document... WEBCORE_EXPORT int width(); WEBCORE_EXPORT int height(); . }Copy the code

From the source, it can be seen that HTMLDocument inherits from the Document class, and some of the source code of the Document class is as follows:

class Document : Public ContainerNode // Document inherits from ContainerNode, ContainerNode inherits from Node, public TreeScope, public ScriptExecutionContext, public FontSelectorClient, public FrameDestructionObserver , public Supplementable<Document> , public Logger::Observer , public CanvasObserver { WEBCORE_EXPORT ExceptionOr<Ref<Element>> createElementForBindings(const AtomString& tagName); WEBCORE_EXPORT Ref<Text> createTextNode(const String& data); WEBCORE_EXPORT Ref<Comment> createComment(const String& data); WEBCORE_EXPORT Ref<Element> createElement(const QualifiedName&, bool createdByParser); // Create the Element method.... }Copy the code

The above source code can see Document inherited from Node, but also can see the front end is very familiar with the createElement, createTextNode and other methods, JavaScript calls to these methods, finally converted to the corresponding C++ method call.

The Document class has these methods, not for nothing, but as stipulated by the W3C standard, the DOM(Document Object Model). DOM defines the interface and attributes that each node in the DOM tree needs to implement. HTMLDocument, Document and HTMLDivElement Platform – and language-independent), the complete IDL can be found at W3C.

In the DOM tree, every Node is derived from a Node class, and Node also has a subclass, Element. Some nodes are derived directly from a Node class, such as a text Node, while others are derived from an Element class, such as a div Node. So for the DOM tree in the figure above, executing the following JavaScript statement returns a different result:

document.childNodes; // Return the DocumentType and HTML nodes inherited from Node document.children; // Returns a collection of child Elements. Only HTML nodes are returned. DocumentType does not inherit from ElementCopy the code

The following figure shows the inheritance diagram of some nodes:

DOM tree construction

The DOM tree construction process can be divided into four steps: decoding, word segmentation, node creation and node addition.

2.1 the decoding

The renderer receives the HTML byte stream from the network process, and the next word segmentation is done in character units. Due to various encoding specifications, such as ISO-8859-1, UTF-8, etc., it is often possible for a character to correspond to one or more encoded bytes. The purpose of decoding is to convert HTML byte streams into HTML character streams, or in other words, to convert the original HTML byte streams into strings.

2.1.1 Decoding class diagram

From the class diagram, HTMLDocumentParser is in the core position of decoding. This class calls decoder to decode HTML byte stream into character stream, which is stored in HTMLInputStream.

2.1.2 Decoding process

In the whole decoding process, the most important thing is how to find the correct encoding way. Only when the correct encoding method is found can the corresponding decoder be used for decoding. Where decoding occurs is shown in the source code below. This method is called on the third stack frame above:

/ / HTMLDocumentParser is a subclass of DecodedDataDocumentParser void DecodedDataDocumentParser: : appendBytes (DocumentWriter & writer, const uint8_t* data, size_t length) { if (! length) return; String decoded = writer.decoder().decode(data, length); If (decoded. IsEmpty ()) return; writer.reportDataReceived(); append(decoded.releaseImpl()); }Copy the code

The above code line 7 writer. The decoder () returns a TextResourceDecoder object, decoding operation by TextResourceDecoder: : decode method is complete. Gradually see below TextResourceDecoder: : decode method source code:

/ / only retained the most important part of the String TextResourceDecoder: : decode (const char * data, size_t length) {... / / if the HTML file, look for character set if (out of the head tag (m_contentType = = HTML | | m_contentType = = XML) &&! m_checkedForHeadCharset) // HTML and XML if (! checkForHeadCharset(data, length, movedDataToBuffer)) return emptyString(); . // m_encoding Encoding name found in the HTML file if (! m_codec) m_codec = newTextCodec(m_encoding); // Create a specific encoder... // Decode and mandatory String result = M_COdec ->decode(m_buffer.data() + lengthOfBOM, m_buffer.size() - lengthOfBOM, false, m_contentType == XML && ! m_useLenientXMLDecoding, m_sawError); m_buffer.clear(); // Empty the store of raw undecoded HTML bytes return result; }Copy the code

TextResourceDecoder first looks for the encoding method in the HTML tag, because the tag can contain the tag, and the tag can set the CHARACTER set of the HTML file:

<head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <! --> <title>DOM Tree</title> <script>window.name = 'Lucy'; </script> </head>Copy the code

If the character set can be found, TextResourceDeocder stores it in the member variable m_Encoding, and creates a real decoder based on the encoding to store it in the member variable m_codec, and finally uses m_COdec to decode the byte stream. And returns the decoded string. If the label with the character set is not found, TextResourceDeocder’s m_Encoding has a default value windows-1252(equivalent to ISO-8859-1).

The TextResourceDecoder calls the checkForHeadCharset function in line 8 of the source code:

/ / only keep the key code bool TextResourceDecoder: : checkForHeadCharset (const char * data, size_t len, bool & movedDataToBuffer) {... // This is not completely efficient, since the function might go // through the HTML head several times. size_t oldSize = m_buffer.size(); m_buffer.grow(oldSize + len); memcpy(m_buffer.data() + oldSize, data, len); MovedDataToBuffer = true; // Copy byte stream data into its cache m_buffer. // Continue with checking for an HTML meta tag if we were already doing so. if (m_charsetParser) return checkForMetaCharset(data, len); // If a meta tag parser already exists, start parsing.... m_charsetParser = makeUnique<HTMLMetaCharsetParser>(); Return checkForMetaCharset(data, len); }Copy the code

In line 11 of the source code above, the class TextResourceDecoder stores the HTML byte stream to be decoded internally, an important step that will be covered later. First look at lines 17, 21, and 22. These three lines parse the character set using the tag parser, using lazy loading. CheckForMetaCharset (checkForMetaCharset);

bool TextResourceDecoder::checkForMetaCharset(const char* data, size_t length) { if (! M_charsetParser ->checkForMetaCharset(data, length)) return false; setEncoding(m_charsetParser->encoding(), EncodingFromMetaTag); M_charsetParser = NULlptr; m_checkedForHeadCharset = true; return true; }Copy the code

Source line 3 above you can see, the whole parsing task in the class labels HTMLMetaCharsetParser: : checkForMetaCharset.

/ / only keep the key code bool HTMLMetaCharsetParser: : checkForMetaCharset (const char * data, Size_t length) {if (m_doneChecking) return true; // We still don't have an encoding, and are in the head. // The following tags are allowed in <head>: // SCRIPT|STYLE|META|LINK|OBJECT|TITLE|BASE // // We stop scanning when a tag that is not permitted in <head> // is seen, rather when </head> is seen, because that more closely // matches behavior in other browsers; more details in // <http://bugs.webkit.org/show_bug.cgi?id=3590>. // // Additionally, we ignore things that looks like tags in <title>, <script> // and <noscript>; see <http://bugs.webkit.org/show_bug.cgi?id=4560>, // <http://bugs.webkit.org/show_bug.cgi?id=12165> and // <http://bugs.webkit.org/show_bug.cgi?id=12389>. // // Since many sites have charset declarations after <body> or other tags // that are disallowed in <head>, we don't bail out until we've checked at // least bytesToCheckUnconditionally bytes of input. constexpr int bytesToCheckUnconditionally = 1024; <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> <meta> m_input.append(m_codec->decode(data, length, false, false, ignoredSawErrorFlag)); While (auto token = m_tokenizer.nextToken(m_input)) {auto token = m_tokenizer.nextToken(m_input)) Bool isEnd = token->type() == HTMLToken::EndTag; if (isEnd || token->type() == HTMLToken::StartTag) { AtomString tagName(token->name()); if (! isEnd) { m_tokenizer.updateStateFor(tagName); If (tagName == metaTag && processMeta(*token)) {m_doneChecking = true; return true; }} if (tagName! = scriptTag && tagName ! = noscriptTag && tagName ! = styleTag && tagName ! = linkTag && tagName ! = metaTag && tagName ! = objectTag && tagName ! = titleTag && tagName ! = baseTag && (isEnd || tagName ! = htmlTag) && (isEnd || tagName ! = headTag)) { m_inHeadSection = false; } } if (! M_inHeadSection && m_input. NumberOfCharactersConsumed () > = bytesToCheckUnconditionally) {/ / if the word has entered the < body > tag, M_doneChecking = true; return true; } } return false; }Copy the code

The HTMLMetaCharsetParser class also has a decoder m_codec, which is generated when the HTMLMetaCharsetParser object is created, The true type of the decoder is TextCodecLatin1(Latin1 encoding is isO-8859-1, equivalent to Windows-1252 encoding). TextCodecLatin1 decoders can be used directly because labels, if set correctly, are English characters that can be resolved using TextCodecLatin1. This avoids the chicken-and-egg problem of decoding the byte stream in order to find the label, which requires finding the label.

Line 37 of the code handles the found tag. This is a relatively simple function that basically parses the attributes in the tag and then looks for charsets in those attribute names.

bool HTMLMetaCharsetParser::processMeta(HTMLToken& token) { AttributeList attributes; for (auto& attribute : Token. The attributes ()) {/ / meta tag attributes String attributeName = StringImpl: : create8BitIfPossible (attribute. The name); String attributeValue = StringImpl::create8BitIfPossible(attribute.value); attributes.append(std::make_pair(attributeName, attributeValue)); } m_encoding = encodingFromMetaAttributes(attributes); Charset return m_encoding.isValid(); }Copy the code

Above analysis TextResourceDecoder: : checkForHeadCharset function, said line 11 TextResourceDecoder class stores HTML bytes operation is important. The reason may be the whole HTML bytes may indeed there is set charset tags, TextResourceDecoder: at this time: checkForHeadCharset function will return false, Cause TextResourceDecoder: : decode () function returns an empty string, also is not to make any decoding. Is that right? The truth is that no tag with the charset attribute was actually found during the receiving of the HTML byte stream, and no decoding was performed during the receiving. But the entire HTML byte stream is stored in the member m_buffer of TextResourceDecoder. When the entire HTML byte stream is received, there is the following call stack:

From the call stack can be seen, when the HTML bytes to receive is completed, will eventually call TextResourceDecoder: : flush method, this method will TextResourceDecoder m_buffer store HTML bytes of decoding, Since the encoding was not successfully found during the receiving of the HTML bytes, m_buffer stores all the HTML bytes to be decoded, and then decodes all the bytes using the default encoding windows-1252. Therefore, if the HTML byte stream contains Chinese characters, the final page will be garbled if the character set is not specified. After decoding, the decoded character stream is stored in HTMLDocumentParser.

void DecodedDataDocumentParser::flush(DocumentWriter& writer) { String remainingData = writer.decoder().flush(); if (remainingData.isEmpty()) return; writer.reportDataReceived(); append(remainingData.releaseImpl()); // The decoded stream is stored in HTMLDocumentParser}Copy the code

2.1.3 Decoding summary

The whole decoding process can be divided into two scenarios: In the first case, the HTML byte stream can parse the tag with the charset attribute, so that the corresponding encoding method can be obtained. Then, every HML byte stream received can be decoded with the corresponding encoding method, and the decoded character stream can be added to the HTMLInputStream. The HTML byte stream will not parse the tag with the charset attribute, so every HTML byte stream received will be cached in TextResourceDecoder’s M_buffer cache. The default encoding windows-1252 is used for decoding.

2.2 word

The received HTML byte stream is decoded into a character stream stored in HTMLInputStream. The process of word segmentation is to take each character in turn from HTMLInputStream and determine if the character is a special HTML character ‘<‘, ‘/’, ‘>’, ‘=’, etc. According to the segmentation of these special characters, the HTML tag name and attribute list can be parsed. HTMLToken is the result of storing segmentation words.

2.2.1 Part of speech diagram

As can be seen from the class diagram, the most important word segmentation is HTMLTokenizer and HTMLToken. Here is the main information about HTMLToken like:

Class HTMLToken {public: Enum Type {// Token Type Uninitialized, // Token initialization Type DOCTYPE, // indicates that the Token is a DOCTYPE tag StartTag, // Token is a start tag EndTag, // Token is an EndTag Comment, // Token is a Comment Character, // Token is a text EndOfFile, // Token is the end of the file}; Struct Attribute {// Vector<UChar, 32> name; Vector<UChar, 64> value; // Used by HTMLSourceTracker. Unsigned startOffset; unsigned endOffset; }; typedef Vector<Attribute, 10> AttributeList; // typedef Vector<UChar, 256> DataVector; // Store Token names... private: Type m_type; DataVector m_data; // For StartTag and EndTag bool m_selfClosing; // Token is injected like <img> from the end of the AttributeList m_attributes tag; Attribute* m_currentAttribute; // The property currently being parsed};Copy the code

2.2.2 Word segmentation process

The segmentation process HTMLDocumentParser: : pumpTokenizerLoop method is one of the most important, this method can be seen from the method name contains circular logic:

/ / only keep key code bool HTMLDocumentParser: : pumpTokenizerLoop (SynchronousMode mode, bool parsingFragment, PumpSession &session) {do {// the body of the parsing loop begins... If (UNLIKELY(mode == AllowYield && m_parserScheduler->shouldYieldBeforeToken(session))) Return true; if (! parsingFragment) m_sourceTracker.startToken(m_input.current(), m_tokenizer); auto token = m_tokenizer.nextToken(m_input.current()); Token if (! token) return false; // if (! parsingFragment) m_sourceTracker.endToken(m_input.current(), m_tokenizer); constructTreeFromHTMLToken(token); // build a DOM tree according to the token} while (! isStopped()); return false; }Copy the code

There is a yield exit at line 7 in the code above to avoid being stuck in a word break loop for too long and occupying the main thread. When the exit condition is true, it is returned from the word segmentation loop with the return value true. Here is the exit judgment code:

/ / only keep key code bool HTMLParserScheduler: : shouldYieldBeforeToken (PumpSession & session) {... / / numberOfTokensBeforeCheckingForYield is static variables, defined as 4096 / / session. ProcessedTokensOnLastCheck said so far out from the last time, // Session. didSeeScript indicates whether script tags if (UNLIKELY(session.processedTokens > are used during word segmentation session.processedTokensOnLastCheck + numberOfTokensBeforeCheckingForYield || session.didSeeScript)) return checkForYield(session); ++session.processedTokens; return false; } bool HTMLParserScheduler::checkForYield(PumpSession& session) { session.processedTokensOnLastCheck = session.processedTokens; session.didSeeScript = false; Seconds elapsedTime = MonotonicTime::now() - session.startTime; return elapsedTime > m_parserTimeLimit; // the default value of m_parserTimeLimit is 500ms.Copy the code

If the yield exit condition above is hit, when is the participle entered again? The following code shows the process of entering the word segmentation again:

/ / key code void HTMLDocumentParser: : pumpTokenizer (SynchronousMode mode) {... If (shouldResume) // Return true when yield exits from pumpTokenizerLoop m_parserScheduler->scheduleForResume(); } void HTMLParserScheduler::scheduleForResume() { ASSERT(! m_suspended); m_continueNextChunkTimer.startOneShot(0_s); // Trigger timer(flash), Trigger the response function of the HTMLParserScheduler: : void continueNextChunkTimerFired} / / retain key code HTMLParserScheduler::continueNextChunkTimerFired() { ... m_parser.resumeParsingAfterYield(); / / to Resume segmentation process} void HTMLDocumentParser: : resumeParsingAfterYield () {/ / pumpTokenizer can cause this parser to be detached from the Document, // but we need to ensure it isn't deleted yet. Ref<HTMLDocumentParser> protectedThis(*this);  // We should never be here unless we can pump immediately. // Call pumpTokenizer() directly so that ASSERTS will fire if  we're wrong. pumpTokenizer(AllowYield); // Re-enter the word segmentation process, which calls pumpTokenizerLoop endIfDelayed(); }Copy the code

As can be seen from the above code, the process of word segmentation is realized by triggering a Timer. Although the Timer is triggered after 0s, it does not mean that the response function of the Timer will be executed immediately. If there are other tasks in the main thread that are ready to be executed before this time, there is a chance that they will be executed.

Continue to see HTMLDocumentParser: : pumpTokenizerLoop function of line 13, this line of word segmentation operation, separated from the character after decoding stream a token. Code is located in HTMLTokenizer: to realize participle: processToken:

/ / key code only keep bool HTMLTokenizer: : processToken (SegmentedString & source) {... if (! M_preprocessor. Peek (source, isNullCharacterSkippingState (m_state))) / / remove the internal source points to the character, Assign m_nextInputCharacter return haveBufferedCharacterToken (); UChar character = m_preprocessor.nextInputCharacter(); / / access character / / https://html.spec.whatwg.org/#tokenization switch (m_state) {/ / state transitions, m_state initial values for DataState... } return false; }Copy the code

There are more than 1200 lines in this method because of the number of internal state transitions. There will be 4 examples to explain the logic of state transitions.

First look at InputStreamPreprocessor: : method of peek:

// Returns whether we succeeded in peeking at the next character. // The only way we can fail to peek is if there are no  more // characters in |source| (after collapsing \r\n, etc). ALWAYS_INLINE bool InputStreamPreprocessor::peek(SegmentedString& source, bool skipNullCharacters = false) { if (UNLIKELY(source.isEmpty())) return false; m_nextInputCharacter = source.currentCharacter(); // Every branch in this function is expensive, so we have a // fast-reject branch for characters that don't require special // handling. Please run the parser benchmark whenever you touch // this function. It's very hot. constexpr UChar specialCharacterMask = '\n' | '\r' | '\0';  if (LIKELY(m_nextInputCharacter & ~specialCharacterMask)) { m_skipNextNewLine = false; return true; } return processNextInputCharacter(source, skipNullCharacters); // Skip the null character, The \ r \ n accord with and into a new line \ n} bool InputStreamPreprocessor: : processNextInputCharacter (SegmentedString & source, bool skipNullCharacters) { ProcessAgain: ASSERT(m_nextInputCharacter == source.currentCharacter()); // For the \r\n newline character, the following if statement processes the \r character and sets m_skipNextNewLine=true, If (m_nextInputCharacter == '\n' &&m_skipNextNewLine) {m_skipNextNewLine = false; source.advancePastNewline(); // Move forward the character if (source.isempty ()) return false; m_nextInputCharacter = source.currentCharacter(); } // If it is a continuous newline character, then the first time we encounter \r character, replace \r character with \n character, Set the flag m_skipNextNewLine=true if (m_nextInputCharacter == '\r') {m_nextInputCharacter = '\n'; m_skipNextNewLine = true; return true; } m_skipNextNewLine = false; if (m_nextInputCharacter || isAtEndOfFile(source)) return true; // Skip the null character if (skipNullCharacters &&! m_tokenizer.neverSkipNullCharacters()) { source.advancePastNonNewline(); if (source.isEmpty()) return false; m_nextInputCharacter = source.currentCharacter(); goto ProcessAgain; } m_nextInputCharacter = replacementCharacter; return true; }Copy the code

Because the peek method skips empty characters and merges \r\n characters into \n characters, a character stream source that contains Spaces or \r\n newlines is actually processed as follows:

HTMLTokenizer: : processToken internal defines a state machine, the following four kinds of situations to explain.

Case1: the tag

BEGIN_STATE(DataState) // If (character == '&') ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInDataState); If (= = '<' character) {/ / the whole characters of the flow at first is the '<', is the beginning of a tag then said if (haveBufferedCharacterToken ()) RETURN_IN_CURRENT_STATE (true); ADVANCE_PAST_NON_NEWLINE_TO(TagOpenState); // Jump to TagOpenState and remove the next character is '!" }if (character == kEndOfFileMarker)return emitEndOfFile(source); bufferCharacter(character); ADVANCE_TO(DataState); ADVANCE_PAST_NON_NEWLINE_TO #define ADVANCE_PAST_NON_NEWLINE_TO(newState) \do {\if (! m_preprocessor.advancePastNonNewline(source, IsNullCharacterSkippingState (newState because))) {/ / / if you do not move down to take the next character m_state = newState because; \ / / save state return haveBufferedCharacterToken (); / / / return} \ character = m_preprocessor nextInputCharacter (); // Get the next character goto newState; While (false)BEGIN_STATE(TagOpenState)if (character == '! ') / / meet this condition ADVANCE_PAST_NON_NEWLINE_TO (MarkupDeclarationOpenState); / / in the same way, jump to MarkupDeclarationOpenState state, and take out the next character 'D' if (character = = '/') ADVANCE_PAST_NON_NEWLINE_TO (EndTagOpenState); if (isASCIIAlpha(character)) { m_token.beginStartTag(convertASCIIAlphaToLower(character)); ADVANCE_PAST_NON_NEWLINE_TO(TagNameState); }if (character == '? ') { parseError(); // The spec consumes the current character before switching// to the bogus comment state, but it's easier to implement// if we reconsume the current character. RECONSUME_IN(BogusCommentState); } parseError(); bufferASCIICharacter('<'); RECONSUME_IN(DataState); END_STATE()BEGIN_STATE(MarkupDeclarationOpenState)if (character == '-') { auto result = source.advancePast("--"); if (result == SegmentedString::DidMatch) { m_token.beginComment(); SWITCH_TO(CommentStartState); }if (result == SegmentedString::NotEnoughCharacters) RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); } else if (isASCIIAlphaCaselessEqual (character, 'd')) {/ / as a result of character = = 'd', Meet the conditions for auto result = source. AdvancePastLettersIgnoringASCIICase (" doctype "); / / see the character after decoding flow if there is a complete "doctype" if (result = = SegmentedString: : DidMatch) SWITCH_TO (DOCTYPEState); // If it matches, jump to DOCTYPEState and fetch the current pointing character. So at this point out of character for '>' the if (result = = SegmentedString: : NotEnoughCharacters) / / if you don't match RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); Else if (character == '[' && shouldAllowCDATA()) {auto result = source.advancepast ("[CDATA["); if (result == SegmentedString::DidMatch) SWITCH_TO(CDATASectionState); if (result == SegmentedString::NotEnoughCharacters) RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());  } parseError(); RECONSUME_IN(BogusCommentState);END_STATE()#define SWITCH_TO(newState) \do { \if (! m_preprocessor.peek(source, isNullCharacterSkippingState(newState))) { \ m_state = newState;  \return haveBufferedCharacterToken(); \ } \ character = m_preprocessor.nextInputCharacter(); // Get the next character goto newState; \ // Jump to the specified state} while (false)#define RETURN_IN_CURRENT_STATE(expression) \do {\ m_state = currentState; // Save the current state return expression;  \ } while (false)BEGIN_STATE(DOCTYPEState)if (isTokenizerWhitespace(character)) ADVANCE_TO(BeforeDOCTYPENameState); if (character == kEndOfFileMarker) { parseError(); m_token.beginDOCTYPE(); m_token.setForceQuirks(); return emitAndReconsumeInDataState(); } parseError(); RECONSUME_IN(BeforeDOCTYPENameState); END_STATE()#define RECONSUME_IN(newState) \do {\  \ } while (false) BEGIN_STATE(BeforeDOCTYPENameState)if (isTokenizerWhitespace(character)) ADVANCE_TO(BeforeDOCTYPENameState);if (character == '>') {// character == '>', match here, complete parseError() to this DOCTYPE tag;  m_token.beginDOCTYPE(); m_token.setForceQuirks();return emitAndResumeInDataState(source);  }if (character == kEndOfFileMarker) { parseError(); m_token.beginDOCTYPE(); m_token.setForceQuirks(); return emitAndReconsumeInDataState(); } m_token.beginDOCTYPE(toASCIILower(character));  ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPENameState); END_STATE()inline bool HTMLTokenizer::emitAndResumeInDataState(SegmentedString& source){ saveEndTagNameIfNeeded(); M_state = DataState; / / reset state as the initial state DataState source. AdvancePastNonNewline (); / / move to the next character return true;}Copy the code

The DOCTYPE Token goes through six states and is finally resolved, as shown in the figure below:

After the Token is parsed, the toffect state is reset to DataState. Note that the source of the character stream points to the next character ‘<‘.

Line 61 of the above code may not match the string “doctype” with the character stream source. Why is that? This is because the construction process of the whole DOM tree does not require the completion of decoding first. After the completion of decoding, a complete character stream can be obtained before word segmentation. As can be seen from the previous decoding, decoding may be carried out while receiving byte stream, so word segmentation is the same, as soon as a stream of characters can be decoded, word segmentation will be carried out immediately. The whole process will appear as shown below:

For this reason, the character stream used for word segmentation may be incomplete. The incomplete DOCTYPE word segmentation process is shown in the figure below:

Decoding, word segmentation, decoding, word segmentation processing DOCTYPE tag, can be seen from the logic of this situation and complete decoding word segmentation is the same. The subsequent introduction will only focus on the case of complete decoding and word segmentation. For the case of word segmentation while decoding, it only needs to correctly understand the movement of pointer inside the source character stream, which is not difficult to analyze.

Case2: the tag

BEGIN_STATE(TagOpenState) if (character == '! ') ADVANCE_PAST_NON_NEWLINE_TO(MarkupDeclarationOpenState); if (character == '/') ADVANCE_PAST_NON_NEWLINE_TO(EndTagOpenState); If (isASCIIAlpha(character)) {// In the open label state, the current character is 'h' M_token. beginStartTag(convertASCIIAlphaToLower(character)); ADVANCE_PAST_NON_NEWLINE_TO(TagNameState); // Add 'h' to Token name ADVANCE_PAST_NON_NEWLINE_TO(TagNameState); // Jump to TagNameState and move to the next character 't'} if (character == '? ') { parseError(); // The spec consumes the current character before switching // to the bogus comment state, but it's easier to implement // if we reconsume the current character. RECONSUME_IN(BogusCommentState); } parseError(); bufferASCIICharacter('<'); RECONSUME_IN(DataState); END_STATE() BEGIN_STATE(TagNameState) if (isTokenizerWhitespace(character)) ADVANCE_TO(BeforeAttributeNameState); if (character == '/') ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState); If (character == '>') // Return emitAndResumeInDataState(source); // End of the current participle, Reset the participle status to DataState if (m_options usePreHTML5ParserQuirks && = = '<' character) return emitAndReconsumeInDataState (); if (character == kEndOfFileMarker) { parseError(); RECONSUME_IN(DataState); } m_token.appendToName(toASCIILower(character)); // Add the current character to the Token name ADVANCE_PAST_NON_NEWLINE_TO(TagNameState); // Continue to jump to the current state and move to the next character END_STATE()Copy the code

Case3: Tag with attributes

HTML tags can have attributes, which consist of attribute names and attribute values separated by Spaces:

<! Div class="news" align="center">Hello,World! </div> <! <div class=news align=center>Hello,World! </div>Copy the code

The whole

Div div div div div div div div div div div div div div div div div div div div div div div div div div

BEGIN_STATE(TagNameState)if (isTokenizerWhitespace(character)) // 在解析TagName时遇到空白字符，标志属性开始        ADVANCE_TO(BeforeAttributeNameState);if (character == '/')        ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);if (character == '>')return emitAndResumeInDataState(source);if (m_options.usePreHTML5ParserQuirks && character == '<')return emitAndReconsumeInDataState();if (character == kEndOfFileMarker) {        parseError();        RECONSUME_IN(DataState);    }    m_token.appendToName(toASCIILower(character));    ADVANCE_PAST_NON_NEWLINE_TO(TagNameState);END_STATE()#define ADVANCE_TO(newState)                                    \do {                                                        \if (!m_preprocessor.advance(source, isNullCharacterSkippingState(newState))) { \ // 移动到下一个字符            m_state = newState;                                 \return haveBufferedCharacterToken();                \        }                                                       \        character = m_preprocessor.nextInputCharacter();        \        goto newState;                                          \ // 跳转到指定状态    } while (false)BEGIN_STATE(BeforeAttributeNameState)if (isTokenizerWhitespace(character)) // 如果标签名后有连续空格，那么就不停的跳过，在当前状态不停循环        ADVANCE_TO(BeforeAttributeNameState);if (character == '/')        ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);if (character == '>')return emitAndResumeInDataState(source);if (m_options.usePreHTML5ParserQuirks && character == '<')return emitAndReconsumeInDataState();if (character == kEndOfFileMarker) {        parseError();        RECONSUME_IN(DataState);    }if (character == '"' || character == '\'' || character == '<' || character == '=')        parseError();    m_token.beginAttribute(source.numberOfCharactersConsumed()); // Token的属性列表增加一个，用来存放新的属性名与属性值    m_token.appendToAttributeName(toASCIILower(character)); // 添加属性名    ADVANCE_PAST_NON_NEWLINE_TO(AttributeNameState); // 跳转到AttributeNameState，并且移动到下一个字符END_STATE()BEGIN_STATE(AttributeNameState)if (isTokenizerWhitespace(character))        ADVANCE_TO(AfterAttributeNameState);if (character == '/')        ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);if (character == '=')        ADVANCE_PAST_NON_NEWLINE_TO(BeforeAttributeValueState); // 在解析属性名的过程中如果碰到=，说明属性名结束，属性值就要开始if (character == '>')return emitAndResumeInDataState(source);if (m_options.usePreHTML5ParserQuirks && character == '<')return emitAndReconsumeInDataState();if (character == kEndOfFileMarker) {        parseError();        RECONSUME_IN(DataState);    }if (character == '"' || character == '\'' || character == '<' || character == '=')        parseError();    m_token.appendToAttributeName(toASCIILower(character));    ADVANCE_PAST_NON_NEWLINE_TO(AttributeNameState);END_STATE()BEGIN_STATE(BeforeAttributeValueState)if (isTokenizerWhitespace(character))        ADVANCE_TO(BeforeAttributeValueState);if (character == '"')        ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueDoubleQuotedState); // 有的属性值有引号包围，这里跳转到AttributeValueDoubleQuotedState，并移动到下一个字符if (character == '&')        RECONSUME_IN(AttributeValueUnquotedState);if (character == '\'')        ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueSingleQuotedState);if (character == '>') {        parseError();return emitAndResumeInDataState(source);    }if (character == kEndOfFileMarker) {        parseError();        RECONSUME_IN(DataState);    }if (character == '<' || character == '=' || character == '`')        parseError();    m_token.appendToAttributeValue(character); // 有的属性值没有引号包围，添加属性值字符到Token    ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueUnquotedState); // 跳转到AttributeValueUnquotedState，并移动到下一个字符END_STATE()BEGIN_STATE(AttributeValueDoubleQuotedState)if (character == '"') { // 在当前状态下如果遇到引号，说明属性值结束        m_token.endAttribute(source.numberOfCharactersConsumed()); // 结束属性解析        ADVANCE_PAST_NON_NEWLINE_TO(AfterAttributeValueQuotedState); // 跳转到AfterAttributeValueQuotedState，并移动到下一个字符    }if (character == '&') {        m_additionalAllowedCharacter = '"';        ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInAttributeValueState);    }if (character == kEndOfFileMarker) {        parseError();        m_token.endAttribute(source.numberOfCharactersConsumed());        RECONSUME_IN(DataState);    }    m_token.appendToAttributeValue(character); // 将属性值字符添加到Token    ADVANCE_TO(AttributeValueDoubleQuotedState); // 跳转到当前状态END_STATE()BEGIN_STATE(AfterAttributeValueQuotedState)if (isTokenizerWhitespace(character))        ADVANCE_TO(BeforeAttributeNameState); // 属性值解析完毕，如果后面继续跟着空白字符，说明后续还有属性要解析，调回到BeforeAttributeNameStateif (character == '/')        ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);if (character == '>')return emitAndResumeInDataState(source); // 属性值解析完毕，如果遇到'>'字符，说明整个标签也要解析完毕了，此时结束当前标签解析，并且重置分词状态为DataState，并移动到下一个字符if (m_options.usePreHTML5ParserQuirks && character == '<')return emitAndReconsumeInDataState();if (character == kEndOfFileMarker) {        parseError();        RECONSUME_IN(DataState);    }    parseError();    RECONSUME_IN(BeforeAttributeNameState);END_STATE()BEGIN_STATE(AttributeValueUnquotedState)if (isTokenizerWhitespace(character)) { // 当解析不带引号的属性值时遇到空白字符(这与带引号的属性值不一样，带引号的属性值可以包含空白字符)，说明当前属性解析完毕，后面还有其他属性，跳转到BeforeAttributeNameState，并且移动到下一个字符        m_token.endAttribute(source.numberOfCharactersConsumed());        ADVANCE_TO(BeforeAttributeNameState);    }if (character == '&') {        m_additionalAllowedCharacter = '>';        ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInAttributeValueState);    }if (character == '>') { // 解析过程中如果遇到'>'字符，说明整个标签也要解析完毕了，此时结束当前标签解析，并且重置分词状态为DataState，并移动到下一个字符        m_token.endAttribute(source.numberOfCharactersConsumed());return emitAndResumeInDataState(source);    }if (character == kEndOfFileMarker) {        parseError();        m_token.endAttribute(source.numberOfCharactersConsumed());        RECONSUME_IN(DataState);    }if (character == '"' || character == '\'' || character == '<' || character == '=' || character == '`')        parseError();    m_token.appendToAttributeValue(character); // 将遇到的属性值字符添加到Token    ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueUnquotedState); // 跳转到当前状态，并且移动到下一个字符END_STATE()
Copy the code

As you can see from the code, the logic of parsing is different when attribute values are quoted and not quoted. Attribute values can contain whitespace characters when they are quoted. If the attribute value is not quoted, then when a whitespace character is encountered, the attribute is parsed and the next attribute is parsed.

Case4: Plain text parsing

Plain text here means any plain text between the start tag and the end tag, including footers, CSS text, and so on, as follows:

<! -- Plain text in div tag Hello,Word! --> <div class=news align=center>Hello,World! </div> <! -- Plain text window.name = 'Lucy' in script tag; --> <script>window.name = 'Lucy'; </script>Copy the code

The process of parsing plain text is relatively simple. The process is to repeatedly jump between DataState and cache the encountered characters until a ‘<‘ character of the end tag is encountered. The code is as follows:

BEGIN_STATE(DataState) if (character == '&') ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInDataState); If (= = '<' character) {/ / if encountered in the process of parsing the text label, if in two different conditions (haveBufferedCharacterToken ()) / / the first, if the cache the text characters according to the current DataState return directly, Characters are not moved, so the next time you enter a word split the character will still be '<' RETURN_IN_CURRENT_STATE(true); ADVANCE_PAST_NON_NEWLINE_TO(TagOpenState); // If no text characters are cached, go directly to the TagOpenState state and enter the start tag parsing process. And move the next character} if (character == kEndOfFileMarker) return emitEndOfFile(source); bufferCharacter(character); // The cache encounters the character ADVANCE_TO(DataState); // Loop to the current DataState state and move to the next character END_STATE()Copy the code

Because of the simplicity of the process, the results of parsing plain text in div tags are shown below:

2.3 Creating and Adding nodes

2.3.1 Related class diagram

2.3.2 Create and Add process

The participle loop above, per cent gives a Token, can according to the Token to create the corresponding Node, and then add the Node to the DOM tree (HTMLDocumentParser: : pumpTokenizerLoop method is described in word segmentation above).

The above method see HTMLTreeBuilder: first: constructTree, code is as follows:

/ / only keep key code void HTMLTreeBuilder: : constructTree (AtomHTMLToken && token) {... if (shouldProcessTokenInForeignContent(token)) processTokenInForeignContent(WTFMove(token)); else processToken(WTFMove(token)); // HTMLToken is processed here... m_tree.executeQueuedTasks(); // HTMLContructionSiteTask is executed here, sometimes directly during creation, The tree Builder might have been destroyed as an indirect result of executing The queued tasks.} Void HTMLConstructionSite: : executeQueuedTasks () {if (m_taskQueue. IsEmpty (), / / the queue is empty, simply return the return; // Copy the task queue into a local variable in case executeTask // re-enters the parser. TaskQueue queue = WTFMove(m_taskQueue); For (auto& Task: queue) // HTMLContructionSiteTask executeTask(task); Run task // We might be detached now.}Copy the code

HTMLTreeBuilder: in the above code: processToken is where processing Token to generate the corresponding Node, the code is as follows:

void HTMLTreeBuilder::processToken(AtomHTMLToken&& token) { switch (token.type()) { case HTMLToken::Uninitialized: ASSERT_NOT_REACHED(); break; Case HTMLToken::DOCTYPE: // HTML DOCTYPE tag m_shouldSkipLeadingNewline = false; processDoctypeToken(WTFMove(token)); break; Case HTMLToken::StartTag: // Start HTML tag m_shouldSkipLeadingNewline = false; processStartTag(WTFMove(token)); break; Case HTMLToken::EndTag: // End HTML tag m_shouldSkipLeadingNewline = false; processEndTag(WTFMove(token)); break; Case HTMLToken::Comment: // m_shouldSkipLeadingNewline = false; processComment(WTFMove(token)); return; Case HTMLToken::Character: // Plain text in HTML processCharacter(WTFMove(token)); break; Case HTMLToken::EndOfFile: // HTML end flag m_shouldSkipLeadingNewline = false; processEndOfFile(WTFMove(token)); break; }}Copy the code

As can be seen from the above code, 7 types of tokens are processed. Since the processing process is similar, the creation and addition process of 5 node cases are analyzed here, which are label, start label, start label, text and end label respectively. The rest process is represented by graphs.

Case1:! A DOCTYPE tag

/ / only keep key code void HTMLTreeBuilder: : processDoctypeToken (AtomHTMLToken && token) {ASSERT (token. The type () = = HTMLToken::DOCTYPE); If (m_insertionMode == InsertionMode::Initial) {// m_insertionMode ::Initial m_tree.insertDoctype(WTFMove(token)); / / insert the DOCTYPE tag m_insertionMode = InsertionMode: : BeforeHTML; / / insert the DOCTYPE tag after m_insertionMode set to InsertionMode: : BeforeHTML, said to open is HTML tags inserted below the return; }... } / / keep only key code void HTMLConstructionSite: : insertDoctype (AtomHTMLToken && token) {... // m_attachmentRoot is the Document object, HTMLContructionSiteTask attachLater(m_attachmentRoot, DocumentType::create(m_document, token.name(), publicId, systemId)); . } / / keep only key code void HTMLConstructionSite: : attachLater (ContainerNode & parent, Ref < Node > child, && bool selfClosing) {... HTMLConstructionSiteTask task(HTMLConstructionSiteTask::Insert); // create HTMLConstructionSiteTask task. Parent = &parent; // task holds the parent of the current node task.child = WTFMove(child); SelfClosing = selfClosing; // The task holds the node to be operated on. // Add as a sibling of the parent if we have reached the maximum depth allowed M_openElements is the HTMLElementStack, which is not used here, but will be covered later. You can see that the stack is limited to 512 objects. // So if an HTML tag has too many nested child tags, If (m_openElements.stackDepth() > m_maximumDOMTreeDepth && task.parent->parentNode()) task.parent = task.parent->parentNode(); ASSERT(task.parent); ASSERT(task.parent); ASSERT(task.parent); m_taskQueue.append(WTFMove(task)); // Add task to Queue}Copy the code

As you can see from the code, the DOCTYPE node is just created, not actually added. Actually perform add operation, you need to perform HTMLContructionSite: : executeQueuedTasks, this method is listed in the first place. Let’s look at how each Task is executed.

CPP static Inline void executeTask(HTMLConstructionSiteTask& Task) {switch (task.operation) {// The method is in htmlContructionSite.cpp static Inline void executeTask(HTMLConstructionSiteTask& Task) {switch (task.operation) {/ / HTMLConstructionSiteTask store to do its own operations, to build a DOM tree is usually Insert operations case HTMLConstructionSiteTask: : Insert: executeInsertTask(task); // insert return; // All the cases below this point are only used by the adoption agency. case HTMLConstructionSiteTask::InsertAlreadyParsedChild: executeInsertAlreadyParsedChildTask(task); return; case HTMLConstructionSiteTask::Reparent: executeReparentTask(task); return; case HTMLConstructionSiteTask::TakeAllChildrenAndReparent: executeTakeAllChildrenAndReparentTask(task); return; } ASSERT_NOT_REACHED(); } // Keep only the key code, The method is located at htmlContructionSite.cpp static inline void executeInsertTask(HTMLConstructionSiteTask& Task) {ASSERT(task.operation == HTMLConstructionSiteTask::Insert); insert(task); // Continue calling the insert method... CPP static inline void insert(HTMLConstructionSiteTask& Task) {... ASSERT(! task.child->parentNode()); if (task.nextChild) task.parent->parserInsertBefore(*task.child, *task.nextChild); else task.parent->parserAppendChild(*task.child); / / call the parent Node method to insert} / / keep only key code void ContainerNode: : parserAppendChild (Node & newChild) {... executeNodeInsertionWithScriptAssertion(*this, newChild, ChildChange::Source::Parser, ReplacedAllChildren::No, [&] { if (&document() ! = &newChild.document()) document().adoptNode(newChild); appendChildCommon(newChild); // Call this method in the Block callback to continue inserting... }); } / / eventually invoke this method is inserted into the void ContainerNode: : appendChildCommon (Node & child) {ScriptDisallowedScope: : InMainThread scriptDisallowedScope; child.setParentNode(this); If (m_lastChild) {// The parent node has been inserted into the child node, run here child.setPreviousSibling(m_lastChild); m_lastChild->setNextSibling(&child); } else m_firstChild = &child; // If the parent is inserting a child for the first time, run here m_lastChild = &child; // update m_lastChild}Copy the code

After executing the above method, the DOM tree that used to have a single root node looks like this:

Case2: start HTML tag

// processStartTag has a lot of state processing inside, Here only keep key code void HTMLTreeBuilder: : processStartTag (AtomHTMLToken && token) {ASSERT (token. The type () = = HTMLToken: : StartTag); switch (m_insertionMode) { case InsertionMode::Initial: defaultForInitial(); ASSERT(m_insertionMode == InsertionMode::BeforeHTML); FALLTHROUGH; case InsertionMode::BeforeHTML: If (token. The name () = = htmlTag) {/ / HTML tags handle m_tree here. InsertHTMLHtmlStartTagBeforeHTML (WTFMove (token)); m_insertionMode = InsertionMode::BeforeHead; / / insert the HTML tags, m_insertionMode = InsertionMode: : BeforeHead, suggests that the treatment head tag return; }... }} / / keep only key code void HTMLConstructionSite: : insertHTMLHtmlStartTagBeforeHTML (AtomHTMLToken && token) {auto element = HTMLHtmlElement::create(m_document); // Create HTML node setAttributes(Element, Token, m_parserContentPolicy); attachLater(m_attachmentRoot, element.copyRef()); // attachLater is also called, Similar to a DOCTYPE m_openElements. PushHTMLHtmlElement (HTMLStackItem: : create (element. CopyRef (), WTFMove (token))); // Notice here that the HTML start tag executeQueuedTasks() is being inserted into the HTMLElementStack; / / here in the insert to perform the task directly, outside HTMLTreeBuilder: : constructTree method call executeQueuedTasks method can direct return... }Copy the code

After executing the code above, the DOM tree should look like the following:

Case3: title Start label

When the start tag is inserted, the DOM tree and HTMLElementStack m_openElements look like this:

Case4: text of the title tag

The text of the tag is inserted as a text node, and the code to generate the text node is as follows: / / only keep key code void HTMLConstructionSite: : insertTextNode (const String & characters, WhitespaceMode whitespaceMode) { HTMLConstructionSiteTask task(HTMLConstructionSiteTask::Insert); task.parent = &currentNode(); // fetch HTMLElementStack m_openElements directly from the top of the stack

unsigned currentPosition = 0; unsigned lengthLimit = shouldUseLengthLimit(*task.parent) ? Text::defaultLengthLimit : std::numeric_limits::max(); // Limits the maximum number of characters a text node can contain to 65536

// If the text is too long, While (currentPosition < character.Length ()) {AtomString charactersAtom = m_whitespaceCache.lookup(characters, whitespaceMode); auto textNode = Text::createWithLengthLimit(task.parent->document(), charactersAtom.isNull() ? characters : charactersAtom.string(), currentPosition, lengthLimit); // If we have a whole string of unbreakable characters the above could lead to an infinite loop. Exceeding the length limit is the lesser evil. if (! textNode->length()) { String substring = characters.substring(currentPosition); AtomString substringAtom = m_whitespaceCache.lookup(substring, whitespaceMode); textNode = Text::create(task.parent->document(), substringAtom.isNull() ? substring : substringAtom.string()); // Generate text node}

currentPosition += textNode->length(); // The next text node contains a character start ASSERT(currentPosition <= character.length ()); task.child = WTFMove(textNode); executeTask(task); // Execute Task insert directly}Copy the code

}

As you can see from the code, if a node is followed by too many text characters, it will be inserted into multiple text nodes. The following example sets the number of text characters after the <title> node to 85248, and Safari does generate 2 text nodes:! [images] (https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/584362db5ed04e2fb9d761f8d6f2d83c~tplv-k3u1fbpfcp-zoom-1.image) **Case5: Closing tag ** When the <title> closing tag is encountered, the code is handled as follows:Copy the code

// There is a lot of state handling inside the code, Here only keep key code void HTMLTreeBuilder: : processEndTag (AtomHTMLToken && token) {ASSERT (token. The type () = = HTMLToken: : EndTag); switch (m_insertionMode) { …

Case InsertionMode::Text: // since the Text is inserted before the title tag is encountered, the InsertionMode is InsertionMode::Text

m_tree.openElements().pop(); HTMLElementStack = title m_insertionMode = m_originalInsertionMode; // Restore the previous insert modeCopy the code

break;

}

Whenever the end tag of a tag is encountered, the top element of HTMLElementStack m_openElementsStack is popped as above. After executing the code above, the DOM tree and HTMLElementStack look like this:

DOM tree in memory

When the entire DOM tree is built, the DOM tree and HTMLElementStack m_openElements look like the following:

As you can see from the figure above, when the DOM is built, HTMLElementStack m_openElements does not empty the stack completely, but keeps two nodes: the HTML node and the body node. This can be seen from Xcode’s console output:

You can also see that the DOM tree structure in memory is not the same as the logical DOM tree structure drawn at the beginning of this article. A logical DOM tree has as many Pointers to child nodes as its parent node, whereas an in-memory DOM tree, no matter how many children the parent node has, always has only two Pointers to child nodes: m_firstChild and m_lastChild. At the same time, the sibling nodes of the DOM tree in memory also have Pointers to each other, whereas the logical DOM tree structure does not.

For example, if a DOM tree has only one parent node and 100 children, then using the logical DOM tree structure, the parent node needs 100 Pointers to its children. If a pointer is 8 bytes, the total is 800 bytes. Using the representation of the DOM tree in memory above, the parent node needs two Pointers to its children, while the sibling nodes need 198 Pointers to each other, for a total of 200 Pointers and a total of 1600 bytes. Compared with the logical DOM tree structure, it is not advantageous in memory. However, no matter how many child nodes the parent node has, the DOM tree structure in memory only needs two Pointers. When there is no need to add child nodes, it frequently dynamically applies for memory and creates new Pointers to child nodes.

———- END ———-

Baidu said Geek

Baidu official technology public number online!

Technical dry goods, industry information, online salon, industry conference

Recruitment information · Internal push information · technical books · Baidu surrounding

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

In-depth understanding of WKWebView (Render) – DOM tree construction

What is a DOM tree

DOM tree construction

2.1 the decoding

2.1.1 Decoding class diagram

2.1.2 Decoding process

2.1.3 Decoding summary

2.2 word

2.2.1 Part of speech diagram

2.2.2 Word segmentation process

2.3 Creating and Adding nodes

2.3.1 Related class diagram

2.3.2 Create and Add process

DOM tree in memory

In-depth understanding of WKWebView (Render) – DOM tree construction

What is a DOM tree

DOM tree construction

2.1 the decoding

2.1.1 Decoding class diagram

2.1.2 Decoding process

2.1.3 Decoding summary

2.2 word

2.2.1 Part of speech diagram

2.2.2 Word segmentation process

2.3 Creating and Adding nodes

2.3.1 Related class diagram

2.3.2 Create and Add process

DOM tree in memory

Related Posts

Method to terminate a thread

Keepalived implements Haproxy high availability

A concurrent insert deadlock brought by MySql lock