This is the third day of my participation in the August More text Challenge. For details, see:August is more challenging
Document Object Model (DOM)
The Document Object Model (DOM) links Web pages to scripts or programming languages. The DOM model represents a document with a logical tree. Each branch of the tree ends in a node, and each node contains objects. DOM methods allow you to programmatically access the tree to change the structure, style, and content of the document. Nodes can be associated with event handlers that will be executed once an event is triggered.
The byte stream of HTML files sent from the network to the rendering engine is not directly understood by the rendering engine, so it has to be converted into an internal structure that the rendering engine can understand, which is the DOM. The DOM provides a structured representation of HTML documents. In the rendering engine, DOM serves on three levels
- From the page’s perspective, the DOM is the underlying data structure that generates the page.
- From the JavaScript scripting perspective, the DOM provides an interface for JavaScript scripting operations through which JavaScript can access DOM structures to change the structure, style, and content of a document.
- From a security perspective, the DOM is a line of security, and some insecure content is excluded from the DOM parsing phase.
In short, the DOM is an internal data structure that represents HTML, connects Web pages to JavaScript scripts, and filters out unsafe content
DOM and JavaScript
DOM is not a programming language, but without DOM, the JavaScript language would not have any concept or model of web pages, XML pages, and the elements involved. Every element in a document – the entire document, the document header, the table in the document, the table header, the text in the table – is part of the document Object model (DOM) to which the document belongs, and therefore can be accessed and processed using the DOM and a scripting language such as JavaScript.
At first, JavaScript and DOM were intertwined, but they eventually evolved into two separate entities. JavaScript can access and manipulate content stored in the DOM, so we can write this approximate equation:
API (Web or XML pages) = DOM + JS (scripting language)
DOM and JavaScript
How is a DOM tree generated
Inside the rendering engine, there is a module called THE HTMLParser, which is responsible for converting HTML byte streams into DOM structures.
The HTML parser does not wait for the entire document to be loaded, but rather parses the HTML document as it is loaded, parsing as much data as the network process loads.
Process: After receiving the response header, the network process will determine the file type based on the content-type field in the response header. For example, if the content-type is “text/ HTML”, the browser will determine that the file is an HTML file and select the corresponding parsing engine based on this judgment. Then select or create a rendering process for the request. Once the render process is ready, a pipeline of shared data is set up between the network process and the render process. When the network process receives data, it puts it into this pipeline, while the render process reads data from the other end of the pipeline and sends it to the HTML parser.
Think of this pipe as a “water pipe” into which the network process receives the byte stream like water, and at the other end of the pipe is the RENDERING process’s HTML parser, which dynamically receives the byte stream and parses it into the DOM.
As you can see from the figure, the byte stream conversion to DOM takes three stages.
The three stages of parsing HTML
In the first stage, the byte stream is converted to a Token by a word separator.
Parsing HTML is the same. The byte stream needs to be converted into a Token by a word separator, which is divided into Tag Token and text Token. The Token generated by lexical analysis of HTML code is shown in the figure below:
The Tag Token can be divided into StartTag and EndTag.
The second stage is to resolve the Token into a DOM node
The HTML parser maintains a Token stack structure, which is used to calculate the parent-child relationship between nodes. The tokens generated in the first phase are pushed into the stack in order. The specific processing rules are as follows:
- If the StartTag Token is pushed into the stack, the HTML parser creates a DOM node for that Token, and then adds that node to the DOM tree. Its parent is the node generated by the next element in the stack.
- If the parser parses a text Token, a text node is generated and added to the DOM tree. The text Token does not need to be pushed on the stack. Its parent node is the DOM node corresponding to the Token at the top of the stack.
- If the parser parses an EndTag tag, such as an EndTag div, the HTML parser will check whether the element at the top of the Token stack is a StarTag div. If so, it will pop the StartTag div off the stack, indicating that the parsing is complete.
The new tokens generated by the participle are thus pushed and pushed out of the stack, and the parsing process continues until the participle has segmented all the byte streams.
The third stage is to add the DOM node to the DOM tree
Add the created DOM node to the Document to form a DOM tree.
Detail the HTML parsing process
When the HTML parser starts working, it creates an empty DOM structure with a root of document by default and pushes a StartTag Document Token to the bottom of the stack. Then the first StartTag HTML Token parsed by the word splitter is pushed onto the stack, and an HTML DOM node is created and added to the document, as shown in the figure below
Then the StartTag body and StartTag div are parsed according to the same process, and the status of the Token stack and DOM is as shown in the figure below:
The rendering engine will create a text node for the Token and add it to the DOM. Its parent node is the node corresponding to the top element of the current Token stack, as shown in the figure below:
Next, the parser parses the first EndTag div, and the HTML parser determines whether the element on the top of the stack is a StartTag div. If so, the StartTag div pops up from the top of the stack, as shown in the figure below
Follow the same rules to parse one way, and the final result is as shown in the figure below:
With this introduction, you should have a good idea of how DOM is generated. However, in the actual production environment, the HTML source file contains not only CSS and JavaScript, but also pictures, audio and video files, so the processing process is far more complicated than the Demo above. But now that we understand the simple Demo generation process, we can move on to more complex scenarios.
How does JavaScript affect DOM generation
If the page contains a JavaScript script, or if a script file is introduced, the process of parsing this script is a little different.
Up to the script tag, the parsing process is the same as before, but when it reaches the script tag, the rendering engine determines that this is a script, at which point the HTML parser suspends DOM parsing, and the JavaScript engine steps in. Because the JavaScript script might want to modify the CURRENTLY generated DOM structure.
If the script is loaded from a JavaScript file, you need to download the JavaScript code first. It’s important to focus on the download environment here, as the process of downloading JavaScript files blocks DOM parsing, and downloading is often time consuming due to factors such as the network environment, JavaScript file size, and so on.
If the script is a directly embedded JavaScript script, it is executed directly.
If a JavaScript script modifies the contents of a DIV in the DOM, the content of the parsed div node will also be modified after executing the script. Once the script is executed, the HTML parser resumes parsing and continues parsing the rest of the content until the final DOM is generated.
In another case, if JavaScript code is present, statements that modify the CSS style of the page are used to manipulate the CSSOM, so all the CSS styles on top of the JavaScript statement need to be parsed before executing the JavaScript. So if your code references an external CSS file, you have to wait for the external CSS file to be downloaded and parsed to generate the CSSOM object before executing the JavaScript script.
The JavaScript engine doesn’t know if JavaScript has manipulated CSSOM until it parses the JavaScript code, so when the rendering engine encounters a JavaScript script, whether the script has manipulated CSSOM or not, The CSS file is downloaded, parsed, and the JavaScript is executed. So JavaScript scripts are dependent on style sheets.
From the above analysis, we know that JavaScript will block DOM generation, and style files will block JavaScript execution, so in the actual project need to focus on JavaScript files and style sheet files, improper use will affect the page performance.
Optimization in the parsing process
One of the main optimizations Chrome has made to prevent page blocking is pre-parsing operations. When the rendering engine receives the byte stream, it starts a pre-parsing thread to analyze the JavaScript, CSS and other related files contained in the HTML file. When the relevant files are parsed, the pre-parsing thread downloads them in advance.
Back to DOM parsing, we know that introducing JavaScript threads will block the DOM, but there are some strategies to get around it, such as using CDN to speed up loading of JavaScript files and to compress the size of JavaScript files. Additionally, if there is no DOM manipulation code in the JavaScript file, you can set the JavaScript script to load asynchronously and mark the code with async or defer, as follows:
<script async type="text/javascript" src='foo.js'></script>
Copy the code
<script defer type="text/javascript" src='foo.js'></script>
Copy the code
Both async and DEFER are asynchronous, but there are some differences: the script file with the async flag is executed as soon as it is loaded; The script file that uses the defer tag needs to be executed before the DOMContentLoaded event.
conclusion
We first introduced how DOM is generated, and then analyzed how JavaScript affects DOM generation based on DOM generation. I also talked about how BOTH CSS and JavaScript can affect DOM generation.
The DOM generation process parses THE HTML by first converting the byte stream to tokens through a word separator.
If the StartTag Token is pushed on the stack, the HTML parser creates a DOM node for that Token and adds that node to the DOM tree. If the parser parses a text Token, a text node is generated and added to the DOM tree. If the parser parses an EndTag tag, the HTML parser checks whether the element at the top of the Token stack is a StarTag div. If so, the StartTag div is popped off the stack, indicating that the parsing is complete.
The new tokens generated by the participle are thus pushed and pushed out of the stack, and the parsing process continues until the participle has segmented all the byte streams.
If JavaScript code is encountered during the parsing process, HTML parsing will be stopped. If JS is loaded through a script, the script will be downloaded before execution. Before execution, CSS will also be parsed to generate CSSOM. This process continues until the entire DOM is built.
If there are any mistakes in this article, please correct them in the comments section. If this article has helped you, please like 👍 and follow 😊.