The goal of this article is to explain, in very simple terms, how browsers transform HTML, CSS, and JavaScript into websites that we can interact with. Understanding this process can help you optimize your Web application for faster speed and better performance.

How does the browser render the site? I’ll deconstruct this process in a moment, but first, it’s worth reviewing some basics.

A Web browser is software that loads files from a remote server (or local disk) and displays them to you — so the user can interact with them.

I know you know what a browser is. However, there is software in the browser called the browser engine. In different browsers, one part of the browser determines what to display based on the files it receives; this is called a browser engine. Browser engines are the core software component of every major browser, and different browser developers call their engines by different names. Firefox’s browser engine is called Gecko and Chrome’s engine is called Blink, which happens to be a branch of Webkit. Don’t let the names confuse you. They’re just names. There’s nothing fancy about it.

To demonstrate, suppose we have a generic browser engine like this:

If you are interested, take a look at the comparison of various browser engines.

Don’t be confused by the fact that I’ll use “browser” and “browser engine” interchangeably throughout this article. It’s important to know that the browser engine is the key software for what we’re talking about.

This is not a computer science networking course, but you might want to remember that data is sent across the Internet in “packets,” which are measured in bytes.

What I’m saying is that when you write some HTML, CSS, and JS and try to open an HTML file in a browser, the browser reads the raw bytes of HTML from your hard drive (or network).

Is that clear? The browser reads the raw data bytes, not the actual characters of the code you write.

Let’s move on. The browser receives bytes of data, but it does nothing with it. The raw bytes of the data must be converted to the form it understands, which is the first step.

Browser objects need to deal with document Object Model (DOM) objects. So where do DOM objects come from? That’s easy. First, the raw data bytes are converted to characters.

You can see this in the characters of the code you write. This conversion is done based on the character encoding of the HTML file. At this point, the browser has converted from the raw data bytes to the actual characters in the file. But that’s not the end of the story. These characters are further parsed into something called tokens.

So, what are these marks? A bunch of characters in a text file is useless to a browser engine. Without this tokenization process, these piles of characters would simply produce a series of meaningless text, known as HTML code — not a real website.

When you save a file with an.html extension, you signal the browser engine to parse the file as an HTML document. The way the browser “interprets” the file is by first parsing it. During parsing, especially during tokenization, the browser parses each opening and closing “tag” in the HTML file. The parser can identify each string in Angle brackets, such as “< HTML >” or “< p>,” and can infer a set of rules that apply to any one of them. For example, a tag that represents an anchor tag has different attributes than a tag that represents a paragraph tag.

Conceptually, you can think of a tag as some kind of data structure that contains information about an HTML tag. Essentially, HTML files are broken down into small parsing units called tags. That’s how browsers begin to recognize what you’ve written.

But tagging is not the end of the story. After tokenization is complete, the tags will then be converted to nodes. You can think of nodes as different objects with specific properties. In fact, a better explanation is to think of nodes as separate entities in a tree of document objects. But nodes are still not final.

Now, let’s look at the last point. Once created, these nodes are linked to a tree data structure called the DOM. DOM establishes father-son relationship, adjacent brother relationship and so on. Within this DOM object, relationships are established between each node. Now, this is something we can use.

If you remember Web Design 101, you don’t open CSS or JS files in your browser to view a Web page. You open an HTML file, in most cases index.html.

You do this because the browser first converts the raw bytes of HTML data into the DOM.

Depending on the size of the HTML file, the DOM build process can take some time. No matter how small the file, it takes some time.

The DOM has been created. A typical HTML file with some CSS would contain links to stylesheets like this:

<! DOCTYPE html> <html> <head> <link rel="stylesheet" type="text/css" media="screen" href="main.css" /> </head> <body> </body> </html>Copy the code

When the browser receives the raw data bytes and starts the DOM building process, it also issues a request for the linked Main.css style sheet. When the browser starts parsing the HTML, it makes a request to retrieve the CSS file’s link tag when it finds it. As you might have guessed, browsers still receive raw bytes of CSS data from the Internet or local disk.

But what does the browser do with these raw bytes of CSS data?

When the browser receives the CSS raw bytes, it initiates a process similar to the HTML raw bytes. That is, raw data bytes are converted into characters, then marked, then formed into nodes, and finally into a tree structure.

What is a tree structure? Most people know the word DOM. There is also a CSS tree structure called the CSS object Model, or CSSOM for short.

As you know, browsers can’t use raw bytes of HTML or CSS. You have to convert it into something that it recognizes, which is these tree structures.

CSS has something called cascading. Cascading is the mechanism by which the browser determines how to apply styles to elements.

The CSSOM tree structure becomes important because the styles that affect the elements may come from the parent element, either through inheritance, or have already been set in the element itself. Why is that? This is because the browser must recursively traverse the CSS tree structure and determine the styles that affect specific elements.

Everything is going well. Browsers now have DOM and CSSOM objects. Now, can we put something on the screen?

What we have here are two independent tree structures that seem to have no common goal.

DOM and CSSOM tree structures are two independent tree structures. The DOM contains all the information about the relationships of HTML elements on a page, while CSSOM contains information about element styles.

Ok, the browser now combines the DOM and CSSOM trees into a render tree.

The render tree contains all the information about the visible DOM content on the page and all the CSSOM information required for the different nodes. Note that if an element is hidden by CSS, for example using display; None, then the node will not be included in the render tree. Hidden elements appear in the DOM, but not in the render tree. This is because the render tree combines information from DOM and CSSOM, so it knows not to include hidden elements in the tree.

Once the render tree is built, the browser moves on to the next step: layout!

Now we have style information for the content on the screen and all the visible content — but we haven’t actually rendered anything on the screen.

First, the browser must calculate the exact size and location of each object on the page. This is like passing the content and style information of all the elements to be rendered on a page to a talented mathematician. The math then uses a browser window to calculate the exact location and size of each element.

This layout step takes into account the content and styles received from the DOM and CSSOM and performs all the necessary layout calculations. Sometimes you’ll hear people refer to this “layout” phase as “reflow.”

Now that the exact location of each element has been calculated, all that remains is to “draw” the element onto the screen.

Think about it. We have all the information we need to display elements on the screen. We just have to show it to the user. That’s all we have to do at this stage. Armed with element content (DOM), style (CSSOM), and calculated exact layout information for elements, the browser can now “draw” nodes onto the screen one by one. The element is now finally on the screen!

What do you think of when you hear “render-blocking”? I guess you’re thinking, “Something is preventing the nodes on the screen from actually drawing”. If you say so, you are absolutely right!

The first rule of website optimization is to get the most important HTML and CSS to the client as quickly as possible. DOM and CSSOM must be constructed before successful drawing, so HTML and CSS are rendered blocking resources. The point is that you should get the HTML and CSS to the client as soon as possible to optimize your application’s first rendering time.

A good Web application will certainly use some JavaScript. That’s for sure. The “problem” with JavaScript is that you can use JavaScript to change the content and style of a page. In this way, you can remove and add elements from the DOM tree, as well as modify the CSSOM attribute of the element through JavaScript.

This is great! But there’s a price to pay. Consider the following HTML document:

<! DOCTYPE html> <html> <head> <meta name="viewport" content="width=device-width,initial-scale=1"> <title>Medium Article Demo</title> <link rel="stylesheet" href="style.css"> </head> <body> <p id="header">How Browser Rendering Works</p> <div><img src="https://i.imgur.com/jDq3k3r.jpg"></div> </body> </html>Copy the code

This is a very simple document. The stylesheet style.css has only one declaration:

body {
  background: #8cacea;
}Copy the code

A simple text and image appear on the screen.

As explained earlier, the browser reads the raw bytes of an HTML file from disk (or the network) and converts them to characters. Characters are further parsed into tokens. When the parser encounters < link rel=”stylesheet” href=”style.css”>, it will request the style.css file. The DOM construction continues, and when the CSS file returns something, the CSSOM construction begins.

How does this process change with the introduction of JavaScript? One of the most important things to remember is that the DOM construct pauses every time the browser encounters a script tag! The entire DOM build process stops until the script is finished executing.

This is because JavaScript can modify DOM and CSSOM at the same time. Since the browser isn’t sure what a particular JavaScript will do, the precaution it takes is to stop the entire DOM construction.

Let’s see how bad that is. In the basic HTML document we shared earlier, we introduced a script tag that contains some basic JavaScript statements:

<! DOCTYPE html> <html> <head> <meta name="viewport" content="width=device-width,initial-scale=1"> <title>Medium Article Demo</title> <link rel="stylesheet" href="style.css"> </head> <body> <p id="header">How Browser Rendering Works</p> <div><img src="https://i.imgur.com/jDq3k3r.jpg"></div> <script> let header = document.getElementById("header"); console.log("header is: ", header); </script> </body> </html>Copy the code

In the script tag, I’ll access the DOM node with ID header and output it to the console. It can run normally, as shown below:

But did you notice that the script tag is at the bottom of the body tag? Let’s put it in head and see what happens:

<! DOCTYPE html> <html> <head> <meta name="viewport" content="width=device-width,initial-scale=1"> <title>Medium Article Demo</title> <link rel="stylesheet" href="style.css"> <script> let header = document.getElementById("header"); console.log("header is: ", header); </script> </head> <body> <p id="header">How Browser Rendering Works</p> <div><img src="https://i.imgur.com/jDq3k3r.jpg"></div> </body> </html>Copy the code

Once I do that, the header is resolved to NULL.

Why is that? Very simple. While the HTML parser was building the DOM, it found a script tag. At this point, the body tag and all its contents have not been parsed. DOM construction will stop until the script is finished executing:

When the script tries to access a DOM node with the ID header, it doesn’t exist yet because the DOM hasn’t finished parsing the document. Which brings us to another important question. The location of the script is important.

And that’s not all. If you extract an inline script into an external local file app.js, the behavior is the same. DOM construction will still stop:

<! DOCTYPE html> <html> <head> <meta name="viewport" content="width=device-width,initial-scale=1"> <title>Medium Article Demo</title> <link rel="stylesheet" href="style.css"> <script src="app.js"></script> </head> <body> <p id="header">How Browser Rendering Works</p> <div><img src="https://i.imgur.com/jDq3k3r.jpg"></div> </body> </html>Copy the code

So what if app.js is not native and must be obtained over the Internet?

<! DOCTYPE html> <html> <head> <meta name="viewport" content="width=device-width,initial-scale=1"> <title>Medium Article Demo</title> <link rel="stylesheet" href="style.css"> <script src="https://some-link-to-app.js"></script> </head> <body>  <p id="header">How Browser Rendering Works</p> <div><img src="https://i.imgur.com/jDq3k3r.jpg"></div> </body> </html>Copy the code

If the connection speed is slow and it takes thousands of milliseconds to get app.js, the DOM build will also be suspended for thousands of milliseconds!! This is a big performance problem, and it doesn’t stop there. JavaScript can also access CSSOM and modify it. For example, here is a valid JavaScript statement:

document.getElementsByTagName("body")[0].style.backgroundColor = "red";Copy the code

So what happens when the parser encounters a script tag that CSSOM is not ready for? The answer is simple. Javascript execution will stop until CSSOM is ready.

Thus, while DOM constructs stop when they encounter script tags, CSSOM does not. For CSSOM, JS execution waits. Without CSSOM, there would be no JS execution.

By default, each script is a parser blocker! DOM builds are interrupted all the time. However, there is a way to change this default behavior. If you add the async keyword to the script tag, DOM construction does not stop. DOM construction will continue, and the script will be executed when the download is complete and ready.

Here’s an example:

<! DOCTYPE html> <html> <head> <meta name="viewport" content="width=device-width,initial-scale=1"> <title>Medium Article Demo</title> <link rel="stylesheet" href="style.css"> <script src="https://some-link-to-app.js" async></script> </head> <body> <p id="header">How Browser Rendering Works</p> <div><img src="https://i.imgur.com/jDq3k3r.jpg"></div> </body> </html>Copy the code

So far, we’ve covered all the steps from receiving HTML, CSS, and JS bytes to converting them to pixels on the screen. This entire process is called the critical render path. Optimizing site performance is optimizing key render paths.

A well-optimized site should be able to render incrementally, rather than obstructing the whole process.

This is the difference between a Slow or fast Web application. Careful critical render path (CRP) optimization strategies enable browsers to load pages as quickly as possible by determining which resources are loaded first and in what order they are loaded.

Having learned the basics of how browsers render HTML, CSS, and JS, I recommend that you spend some time researching how you can use this knowledge to optimize your pages and speed them up.

Google Web based documents related to the performance in section (https://developers.google.com/web/fundamentals/performance/why-performance-matters/) is a good place to start.

https://blog.logrocket.com/how-browser-rendering-works-behind-the-scenes-6782b0e8fb10

Qiu Yue, known as “er Ye”, is a product manager from a technical background. In 7 months, he has increased the number of users of “lottery assistant” from 0 to 20 million. In the column of “Qiu Yue’s Product Notes”, he will share his 10 years’ experience in the product field with 45 lectures to help you cultivate your product thinking and improve your product cognition.

Limited time benefits:

Limited 24 hour group only ¥68, province ¥31.

Scan code is available for free trial reading or subscription