The Web browser is undoubtedly the most common entry point for users to access the Internet. Browsers, with their installation-free and cross-platform advantages, have gradually replaced many traditional rich clients.
Web browsers access Web server resources by sending Web requests to urls and presenting them in an interactive manner. Basic operations include get, process, display, and store. Common browsers include Internet Explorer, Firefox, Google Chrome, Safari, and Opera.
Architecture diagram
The browser mainly consists of the following parts:
- The user interface
- Browser engine
- Rendering engine
- Data storage layer
- UI BackEnd
- JavaScript parser (script engine)
- The network layer
The user interface
This is the area where the user interacts with the browser. There are no specific standards for how a browser should look. The HTML5 specification does not specify what UI elements should look like, but it does list some common elements: address bar, profile bar, scroll bar, status bar, toolbar, etc.
Browser engine
It provides an interface between the UI and the underlying rendering engine, queries and manipulates the rendering engine based on user interaction, provides methods to initialize load urls, and is responsible for reload, return, and forward operations.
Rendering engine
The rendering engine is responsible for displaying web content on the screen. The main job of a rendering engine is to parse HTML. The rendering engine displays HTML, XML, and images by default, and can support other data types through plug-ins or extensions.
Modern browsers use different rendering engines. Gecko: Firefox Webkit: Safari Blink: Chrome, Opera (Version 15 once again)
Web content is displayed through a series of processes:
HTML data is converted into DOM
The requested content from the network layer is received in the rendering engine (typically 8 KB blocks) and the raw bytes are then converted into characters in the HTML file (based on character encodings). A lexical analyzer then performs lexical analysis, breaking the input into tokens. During tokenization, each start and end tag in the file is recorded. It knows how to remove irrelevant characters, such as Spaces and newlines.
Next, the parser parses the document structure by applying language grammar rules to construct a parsing tree. The parsing process is iterative. It requests a new token from the lexical analyzer, and if a grammar rule is matched, the token is added to the parse tree. Then request another token. If there is no matching rule, the parser stores tokens internally and keeps asking for new tokens until it finds a rule that matches all of the internally stored tokens. If no rule is found, the parser throws an exception indicating that the document is invalid and contains syntax errors.
These nodes are linked to each other in the DOM(Document Object Model) tree data structure to establish parent-child relationship and adjacent sibling relationship.
CSS data is converted to CSSOM
Raw bytes of CSS data are converted into characters, tokens, nodes, and eventually CSSOM (CSS object model). The hierarchical nature of CSS determines which styles are applied to elements. The style data for an element can come from the parent element (through inheritance) or can be set directly on the element. The browser needs to recursively traverse the CSS tree structure to determine the style of a particular element.
DOM and CSSOM form the rendering tree
The DOM tree contains information about the relationships between HTML elements, and the CSSOM tree contains information about the styles of those elements. Starting with the root node, the browser traverses every visible node. Some nodes are hidden (with CSS control) and do not appear in the render result. For each visible node, the browser finds and matches the relevant rules defined in CSSOM, and eventually the nodes appear in the render tree with content and style.
layout
Next, lay out the content. The actual size and location of the content needs to be computed before rendering to the page (browser viewport). This process is also called reflow. HTML uses a flow-based layout model, which means that for the most part, geometry is computed once (content size or location changes and needs to be recalculated). This process starts at the document root and is done recursively.
draw
Display content on the screen by iterating through each renderer and calling the paint method. The drawing process can be global (drawing the entire tree) or incremental (rendering the tree validates a rectangular region on the screen), and the operating system generates draw events on these specific nodes, leaving the entire tree unaffected. Drawing is a gradual process, part of which is parsed and rendered, and the process continues to process the rest.
JavaScript parser (JS engine)
JavaScript is a scripting language that dynamically updates Web content, controls multimedia and animation, and so on, through the browser’s JS engine. DOM and CSSOM provide JS interfaces, both of which can be modified with JS. Since the browser is not sure what some JS will do, it will immediately pause building the DOM tree when it encounters a Script tag.
The JS parser will parse the code as soon as it receives it from the server. The code is translated into an object representation that the machine understands. The objects that hold all the parsing information are called abstract syntax trees (AST), and these objects are converted into bytecode by the parser. This compilation method is called Just In Time (JITs), where JavaScript is downloaded from the server and compiled In real Time on the client. The parser and compiler are used in combination; the parser processes the source code immediately, the compiler generates the machine code, and the client operating system runs directly.
JS engines for different browsers
Chrome: V8 engine (Node JS was built on top of this) Mozilla: Spider Monkey (formerly ‘Squirrel Fish’) Microsoft Edge: Chakra Safari: Nitro
UI Back End
Used to draw basic controls, such as check boxes and Windows. The underlying method of using the operating system’s user interface exposes a generic interface, platform-independent.
Data storage layer
This is the persistence layer that assists the browser to store some data (cookies, Session storage, Indexed DB, Web SQL, bookmarks, user preferences, etc.). The HTML5 specification proposes full database capabilities on the browser side.
The network layer
This layer handles the browser’s various network traffic. Browsers use various communication protocols to obtain network resources, such as HTTP, HTTPs, and FTP.
Browsers use DNS to resolve urls. These parse records are cached in a browser, operating system, router, or ISP. If the requested URL is not in the cache, the ISP’s DNS server first initiates a DNS query to find the server’s IP address. Once the correct IP address is found, the browser uses a specific protocol to establish a connection with the server. The browser sends a SYN packet to the server asking if the TCP connection is open. The server responds with a SYN/ACK packet in response to the previous SYN.
After receiving the response, the browser sends an ACK packet to the server. A TCP connection is established with this three-way handshake. Once the connection is established, data can be transferred. Data transfer must comply with the REQUIREMENTS of the HTTP protocol, including the request and response rules.
Browser comparison
There are many different browsers on the market today, and while their core functions are the same, there are many differences between them. This includes platforms (Linux, Windows, Mac, BSD and other Unix systems), protocols, user interfaces, HTML5 support, open source, ownership, etc. See Wikipedia for details.
This is a general description of how a browser works, but it’s actually more complicated than a few pictures and an article can explain. Interested can go to see the browser source code, in-depth understanding.
The resources
www.html5rocks.com/en/tutorial…
Grosskurth. Ca/cca shut/brow…
Developers.google.com/web/fundame…
Dev.w3.org/html5/spec-…
See this rather energetic logo, don’t pay attention to it?