I am three diamonds, one of you in “Technology Galaxy” to come together for a lifetime of drifting learning.
Praise is power, attention is recognition, comment is love! See you next time 👋!
At the forefront of
How browsers work is a very important part of the content, we often see redraw, rearrange or some explain CSS properties, will use some knowledge of how browsers work to explain. The theoretical learning of how browsers work is not very effective and boring, so here we use JavaScript to implement a browser from scratch.
By implementing a simple browser ourselves, we will gain a deeper understanding of the fundamentals of the browser.
Browser base rendering process
- First, the browser does the overall rendering in 5 steps
- We access a web page from a URL, which is parsed and rendered by the browser into a Bitmap
- Finally through our graphics card driver set out the picture, let us see the completed page
- This is a browser rendering flow
- We only implement a simple basic flow here, but a real browser also includes many features, such as history and so on
The main thing we need to complete is the whole process from the URL request to the Bitmap page display.
Browser flow:
URL
Part, throughHTTP
Request, then parse the returned content, then extractHTML
content- get
HTML
After that, we can program an HTML text through text analysis (parse)DOM
æ ‘ - At this time
DOM
The tree was bare, so next we did CSS computing and finally mounted CSS on the DOM tree - After the calculation, we have a styled DOM tree, which is ready to be laid out (or typeset)
- Through layout calculation, each DOM will get a calculated box (of course, in real browsers, each CSS will generate a box, but to simplify this, we will only make one box per DOM).
- Finally we can Render the DOM tree with background images and background colors, and finally paint the styles onto an image. Then we can show the user the API interface provided by the operating system and the hardware driver.
Finite state machines to process strings
Because this handling of strings is a trick used throughout the browser, it would be very difficult to implement and read the browser implementation code without using the state machine. So let’s talk about what a finite state machine is.
- Each state is a machine
- Each machine is a powerful abstraction mechanism decoupled from each other
- In each machine, we can do calculation, storage, output and so on
- All of these machines accept the same input
- Each machine in a state machine has no state of its own. If we express it as a function, it should be a pure function.
- No side effects means that you should no longer be controlled by external inputs, which are ok
- Each machine knows the next state
- Every machine has a definite next state (Moore)
- Each machine decides the next state based on the input (Mealy)
How do you do that in JavaScript
Mealy state machine:
// Each function is a state
function state (input) { // Function arguments are inputs
// Within functions, you are free to write code that handles the logic of each state
return next; // Return the value as the next state
}
/** ========= The following is debug ========= */
while (input) {
// Get input
state = state(input); // Take the return value of the state machine as the next state
}
Copy the code
- As we saw in the code above, each function is a state
- And then the argument to the function is the input
input
- The return value of this function is the next state, which means that the next return value must be a state function
- The ideal implementation of a state machine is a series of state functions that return state functions
- When a state function is called, it usually takes the input in a loop and passes through
state = state(input)
To make the state machine accept input to complete the state switch Mealy
The return value must be based oninput
Returns the next stateMoore
Type state machine, the return value is andinput
There is no relationship, it is all fixed state return
Strings are not processed using state machines
Let’s start by looking at some strings that can be handled without a state machine:
Problem 1: Find the character “A” in a string
function match(string) {
for (let letter of string) {
if (letter == 'a') return true;
}
return false;
}
console.log(match('I am TriDiamond'));
Copy the code
Second problem: No regular expressions allowed, pure JavaScript logic: in a string, find the character “ab”
“Look directly for A and B, and return when you find both.”
/** * find 'a' and 'b' directly, return * when both are found@param {*} String Matched character */
function matchAB(string) {
let hasA = false;
for (let letter of string) {
if (letter == 'a') {
hasA = true;
} else if (hasA && letter == 'b') {
return true;
} else {
hasA = false; }}return false;
}
console.log( matchAB('hello abert'));
Copy the code
No regular expressions, just JavaScript logic: In a string, find the character “abcdef”
Method 1: “Use temporary memory and move pointer to detect”
/** * use temporary space, move pointer to detect *@param {*} Match Indicates the character * to be matched@param {*} String Matched character */
function matchString(match, string) {
const resultLetters = match.split(' '); // Split matching characters into arrays to record
const stringArray = string.split(' '); // Split the contents of the matched string into arrays
let index = 0; // Match a pointer to a string
for (let i = 0; i <= stringArray.length; i++) {
// To ensure that the characters match absolutely, such as "ab "cannot be" ABC ", cannot be "ab"
// So the two characters must be sequential
if (stringArray[i] == resultLetters[index]) {
// If a character matches, index + 1 searches for the next character
index++;
} else {
// If the next character does not match, reset and re-match
index = 0;
}
// Return true if all characters have been matched
// The proof character contains the character to look for
if (index > resultLetters.length - 1) return true;
}
return false;
}
console.log(Method of '1', matchString('abcdef'.'hello abert abcdef'));
Copy the code
Method 2: “Use subString and the length of the matching character to intercept the character to see if it equals the answer”
/** * Generic string matching - Refer to method 2 (using substring) *@param {*} Match Indicates the character * to be matched@param {*} String Matched character */
function matchWithSubstring(match, string) {
for (let i = 0; i < string.length - 1; i++) {
if (string.substring(i, i + match.length) === match) {
return true; }}return false;
}
console.log(Method of '2', matchWithSubstring('abcdef'.'hello abert abcdef'));
Copy the code
Method 3: “Search one by one until you find the final result”
/** ** search one by one until you find the final result *@param {*} String Matched character */
function match(string) {
let matchStatus = [false.false.false.false.false.false];
let matchLetters = ['a'.'b'.'c'.'d'.'e'.'f'];
let statusIndex = 0;
for (let letter of string) {
if (letter == matchLetters[0]) {
matchStatus[0] = true;
statusIndex++;
} else if (matchStatus[statusIndex - 1] && letter == matchLetters[statusIndex]) {
matchStatus[statusIndex] = true;
statusIndex++;
} else {
matchStatus = [false.false.false.false.false.false];
statusIndex = 0;
}
if (statusIndex > matchLetters.length - 1) return true;
}
return false;
}
console.log(Method of '3', match('hello abert abcdef'));
Copy the code
Use state machines to process characters
Here we use the state machine approach: In a string, find the character “abcdef”
- First of all, each state is a state variable
- We should have a start state and an end state function, respectively
start
å’Œend
- State function names represent the state of the current state
matchedA
It’s already matcheda
Characters, and so on - The logic in each state is to match the next character
- Returns the next state function if the match is successful
- Returns the start state if the match fails
start
- Because the last one in the character is
f
Character, somatchedE
After success, you can directly return to the end stateend
end
The end state, also known as the Trap method, is left there until the loop ends because the transition is over
/** * The state machine string matches *@param {*} string* /
function match(string) {
let state = start;
for (let letter of string) {
state = state(letter); // State switchover
}
return state === end; // Return true if the last state function is' end '
}
function start(letter) {
if (letter === 'a') return matchedA;
return start;
}
function end(letter) {
return end;
}
function matchedA(letter) {
if (letter === 'b') return matchedB;
return start(letter);
}
function matchedB(letter) {
if (letter === 'c') return matchedC;
return start(letter);
}
function matchedC(letter) {
if (letter === 'd') return matchedD;
return start(letter);
}
function matchedD(letter) {
if (letter === 'e') return matchedE;
return start(letter);
}
function matchedE(letter) {
if (letter === 'f') return end(letter);
return start(letter);
}
console.log(match('I am abcdef'));
Copy the code
Problem upgrade: with the state machine to achieve the string “abcabx” parsing
- The difference between this question and the one above is that “ab” repeats
- So the logic of our analysis should be:
- The first “B” should be followed by a “C”, and the second “B” should be followed by an “X”
- If it’s not followed by an “x”, go back to the previous state variable
/** * The state machine matches the string *@param {*} String Matched character */
function match(string) {
let state = start;
for (let letter of string) {
state = state(letter);
}
return state === end;
}
function start(letter) {
if (letter === 'a') return matchedA;
return start;
}
function end(letter) {
return end;
}
function matchedA(letter) {
if (letter === 'b') return matchedB;
return start(letter);
}
function matchedB(letter) {
if (letter === 'c') return matchedC;
return start(letter);
}
function matchedC(letter) {
if (letter === 'a') return matchedA2;
return start(letter);
}
function matchedA2(letter) {
if (letter === 'b') return matchedB2;
return start(letter);
}
function matchedB2(letter) {
if (letter === 'x') return end;
return matchedB(letter);
}
console.log('result: ', match('abcabcabx'));
Copy the code
Basic knowledge of HTTP protocol parsing
Iso-osi seven-layer network model
HTTP
- Composition:
- application
- said
- The session
- We are familiar with the node code
require('http')
TCP
- Composition:
- transmission
- Since web pages need reliable delivery, we only care about TCP
Internet
- Composition:
- network
- Sometimes there are two meanings of surfing the Internet
- The protocol (extranet) of the application layer where the web page resides – the Internet is responsible for data transmission
- The internal network of a company is called an Intranet
4G/5G/Wi-Fi
- Composition:
- The data link
- The physical layer
- In order to complete the accurate transmission of data
- The transmissions are all point-to-point
- You must have a direct connection to transmit
Basic knowledge of TCP and IP
- flow
- The concept of transferring data in the TCP layer is “stream”
- A flow is a unit without obvious division
- It just guarantees that the order is correct
- port
- The TCP protocol is used by software inside a computer
- Every piece of software is going to fetch data from the network card
- Ports determine which data is allocated to which software
- For Node.js, it’s an app
require('net')
- package
- The concept of TCP transport is packet by packet
- Each packet can be large or small
- It depends on the transmission capacity of your entire network of intermediate devices
- The IP address
- Where should the IP packet go according to the address
- The connection relationship on the Internet is very complex, with some large routing nodes in the middle
- When we access an IP address, we connect to our cell address and then to the trunk of the telecommunication
- If it is visiting a foreign country, it will be added to the international trunk address
- This IP address is the unique identifier that connects to every device on the Internet
- So an IP packet, it’s an IP address to find out where it needs to be sent, right
- libnet/libpcap
- The IP protocol calls these two libraries in C++
- Libnet takes care of constructing IP packets and sending them
- Labpcap is responsible for fetching all IP packets flowing through the network card from the network card
- If we use a switch instead of a router for networking, we can use the underlying LabpCap packets to catch a lot of IP packets that are not destined for us
HTTP
- composition
- Request the Request
- The Response to return
- In contrast to the TCP full-duplex channel, packets can be sent or received without priority relationship
- What is special about HTTP is that the client must first initiate a request
- And then the server comes back with a response
- So every request must have a response
- If there are too many requests or responses, the protocol is wrong
HTTP request – server environment preparation
Before we write our own browser, let’s first build a Node.js server.
First we write a node.js server:
const http = require('http');
http
.createServer((request, response) = > {
let body = [];
request
.on('error'.err= > {
console.error(err);
})
.on('data'.chunk= > {
body.push(chunk.toString());
})
.on('end'.() = > {
body = Buffer.concat(body).toString();
console.log('body', body);
response.writeHead(200, { 'Content-Type': 'text/html' });
response.end(' Hello World\n');
});
})
.listen(8080);
console.log('server started');
Copy the code
Understand the HTTP Request protocol
Before writing our client code, we need to understand the HTTP protocol.
Let’s take a look at the HTTP request first:
POST / HTTP/1.1
Host: 127.0.0.1
Content-Type: application/x-www-form-urlencoded
field1=aaa&code=x%3D1
- HTTP is a text protocol. A text protocol is generally opposed to a binary protocol, which means that everything in this protocol is a string, and each byte is part of the string.
- The first line of the HTTP protocol is called
request line
Contains three parts:- Method: For example, POST and GET
- Path: Default is “/”
- HTTP and HTTP version: HTTP/1.1
- And then what happens is
Headers
- The number of Header lines is not fixed
- Each line is separated by a colon
key: value
format - Headers ends with a blank line
- And the last part is
body
Parts:- The content of this section is
Content-Type
To determine the - What format does the content-Type dictate, so what format should the body be
- The content of this section is
Now we can start writing code!
Implementing HTTP requests
- Design an HTTP request class
- Content Type is a required field to have a default value
- Body is in KV format
- Different Content-Types affect the formatting of the body
The Request class
class Request {
constructor(options) {
// First give the default values you need to use from constructor
this.method = options.method || 'GET';
this.host = options.host;
this.port = options.port || 80;
this.path = options.path || '/';
this.body = options.body || {};
this.headers = options.headers || {};
if (!this.headers['Content-Type']) {
this.headers['Content-Type'] = 'application/x-www-form-urlencoded';
}
// Convert the body format to content-type
if (this.headers['Content-Type'= = ='application/json') {
this.bodyText = JSON.stringify(this.body);
} else if (this.headers['Content-Type'= = ='application/x-www-form-urlencoded') {
this.bodyText = Object.keys(this.body)
.map(key= > `${key}=The ${encodeURIComponent(this.body[key])}`)
.join('&');
}
// Automatically calculates the length of the body content. If the length is incorrect, it is an illegal request
this.headers['Content-Length'] = this.bodyText.length;
}
// the method that sends the request, returning a Promise object
send() {
return new Promise((resolve, reject) = > {
/ /...}); }}Copy the code
Request method
/** * request method */
void (async function () {
let request = new Request({
method: 'POST'.host: '127.0.0.1'.port: '8080'.path: '/'.headers: {['X-Foo2'] :'custom',},body: {
name: 'tridiamond',}});let response = await request.end();
console.log(response); }) ();Copy the code
Write the send function in the Request class
- The Send function is a form of Promise
- Therefore, a response is gradually received during the send process
- Finally, after the response is constructed, let the Promise get resolve
- Because the process is progressively receiving information, we need to design a ResponseParse
- This allows Parse to construct different parts of the Response object by gradually receiving information about the response
// the method that sends the request, returning a Promise object
send() {
return new Promise((resolve, reject) = > {
const parser = new ResponseParser();
resolve(' ');
});
}
Copy the code
Design ResponseParser
- The Receive function accepts strings
- String by string is then processed using a state machine
- So we need to loop through each string and add
recieveChar
Function to process each character
class ResponseParser {
constructor() {}
receive(string) {
for (let i = 0; i < string.length; i++) {
this.receiveChar(string.charAt(i)); }}receiveChar(char){}}Copy the code
Understand HTTP Response protocol
In the next section, we need to parse HTTP Response in our code, so I’ll take a look at HTTP Response.
HTTP / 1.1 200 OK
Content-Type: text/html
Date: Mon, 23 Dec 2019 06:46:19 GMT
Connection: keep-alive
Transfer-Encoding: chunked
26
Hello World
0
- Let’s start with the first row
status line
As opposed to the Request line- The first part is the HTTP protocol version: HTTP/1.1
- The second part is the HTTP status code: 200 (in the implementation of our browser, for simplicity, we can change the status beyond 200 to error)
- The third part is the HTTP status text: OK
- The next section is the header section
- HTML request and response contain headers
- It’s exactly the same format as request
- Finally, a blank line is used to separate the headers and body parts
- And then finally here is the body part
- Here the body format is also determined by the content-Type
- Here’s a typical format called
chunked body
(is a format returned by default by Node) - Chunked body is a single hexadecimal number on a single line
- This is followed by the content section
- And then a hexadecimal 0,0, and then the end of the body
- This is also used to separate the contents of the body
Implement send request
Here we get into the game by implementing the logic in the SEND function to actually send the request to the server.
- Design to support existing connections or add your own
- Data is received and passed to parser
- Resolve Promise based on the parser status
With the above ideas, we will implement the code:
// the method that sends the request, returning a Promise object
send(connection) {
return new Promise((resolve, reject) = > {
const parser = new ResponseParser();
// Check whether the connection has been sent
// Create a TCP connection based on the Host and port
// 'toString' assembles the Request parameters in the format of the HTTP Request
if (connection) {
connection.write(this.toString());
} else {
connection = net.createConnection(
{
host: this.host,
port: this.port,
},
() = > {
connection.write(this.toString()); }); }// Listen for connection data
// Pass it to Parser as it is
Resolve if parser is already terminated
// Finally disconnect
connection.on('data'.data= > {
console.log(data.toString());
parser.receive(data.toString());
if(parser.isFinished) { resolve(parser.response); connection.end(); }});// Listen for connection error
// If the request fails, reject the promise first
// Then disconnect the connection to avoid occupying the connection
connection.on('error'.err= > {
reject(err);
connection.end();
});
});
}
/** * Assemble HTTP Request text content */
toString() {
return `The ${this.method} The ${this.path}HTTP / 1.1 \ rThe ${Object.keys(this.headers)
.map(key => `${key}: The ${this.headers[key]}`)
.join('\r\n')}\r\r
The ${this.bodyText}`;
}
Copy the code
Implement RequestParser class
Now let’s implement the code for the RequestParser class.
- Response has to be piecewise constructed, so we’re going to assemble it with a Response Parser.
- ResponseParser sectioned the Response Text, and we used the state machine to analyze the Text structure
/** * Response parser */
class ResponseParser {
constructor() {
this.state = this.waitingStatusLine;
this.statusLine = ' ';
this.headers = {};
this.headerName = ' ';
this.headerValue = ' ';
this.bodyParser = null;
}
receive(string) {
for (let i = 0; i < string.length; i++) {
this.state = this.state(string.charAt(i)); }}receiveEnd(char) {
return receiveEnd;
}
/** * Wait status line contents *@param {*} Char * / text
waitingStatusLine(char) {
if (char === '\r') return this.waitingStatusLineEnd;
this.statusLine += char;
return this.waitingStatusLine;
}
/** * Wait status line ends *@param {*} Char * / text
waitingStatusLineEnd(char) {
if (char === '\n') return this.waitingHeaderName;
return this.waitingStatusLineEnd;
}
/** * Wait for the Header name *@param {*} Char * / text
waitingHeaderName(char) {
if (char === ':') return this.waitingHeaderSpace;
if (char === '\r') return this.waitingHeaderBlockEnd;
this.headerName += char;
return this.waitingHeaderName;
}
/** * wait for Header space *@param {*} Char * / text
waitingHeaderSpace(char) {
if (char === ' ') return this.waitingHeaderValue;
return this.waitingHeaderSpace;
}
/** * wait for the Header value *@param {*} Char * / text
waitingHeaderValue(char) {
if (char === '\r') {
this.headers[this.headerName] = this.headerValue;
this.headerName = ' ';
this.headerValue = ' ';
return this.waitingHeaderLineEnd;
}
this.headerValue += char;
return this.waitingHeaderValue;
}
/** * Wait for the Header line to end *@param {*} Char * / text
waitingHeaderLineEnd(char) {
if (char === '\n') return this.waitingHeaderName;
return this.waitingHeaderLineEnd;
}
/** * Wait for the Header content to end *@param {*} Char * / text
waitingHeaderBlockEnd(char) {
if (char === '\n') return this.waitingBody;
return this.waitingHeaderBlockEnd;
}
/** * Wait for the body content *@param {*} Char * / text
waitingBody(char) {
console.log(char);
return this.waitingBody; }}Copy the code
Implement the Body content parser
Finally, let’s implement the parsing logic for the Body content.
- The body of a Response may have a different structure depending on the Content-Type, so we’ll use a subparser structure to solve the problem
- Using ChunkedBodyParser as an example, we also use the state machine to process the body format
/** * Response parser */
class ResponseParser {
constructor() {
this.state = this.waitingStatusLine;
this.statusLine = ' ';
this.headers = {};
this.headerName = ' ';
this.headerValue = ' ';
this.bodyParser = null;
}
get isFinished() {
return this.bodyParser && this.bodyParser.isFinished;
}
get response() {
this.statusLine.match(\ / HTTP / 1.1 ([0-9] +) ([\ s \ s] +) /);
return {
statusCode: RegExp. $1,statusText: RegExpThe $2,headers: this.headers,
body: this.bodyParser.content.join(' '),}; }receive(string) {
for (let i = 0; i < string.length; i++) {
this.state = this.state(string.charAt(i)); }}receiveEnd(char) {
return receiveEnd;
}
/** * Wait status line contents *@param {*} Char * / text
waitingStatusLine(char) {
if (char === '\r') return this.waitingStatusLineEnd;
this.statusLine += char;
return this.waitingStatusLine;
}
/** * Wait status line ends *@param {*} Char * / text
waitingStatusLineEnd(char) {
if (char === '\n') return this.waitingHeaderName;
return this.waitingStatusLineEnd;
}
/** * Wait for the Header name *@param {*} Char * / text
waitingHeaderName(char) {
if (char === ':') return this.waitingHeaderSpace;
if (char === '\r') {
if (this.headers['Transfer-Encoding'= = ='chunked') {
this.bodyParser = new ChunkedBodyParser();
}
return this.waitingHeaderBlockEnd;
}
this.headerName += char;
return this.waitingHeaderName;
}
/** * wait for Header space *@param {*} Char * / text
waitingHeaderSpace(char) {
if (char === ' ') return this.waitingHeaderValue;
return this.waitingHeaderSpace;
}
/** * wait for the Header value *@param {*} Char * / text
waitingHeaderValue(char) {
if (char === '\r') {
this.headers[this.headerName] = this.headerValue;
this.headerName = ' ';
this.headerValue = ' ';
return this.waitingHeaderLineEnd;
}
this.headerValue += char;
return this.waitingHeaderValue;
}
/** * Wait for the Header line to end *@param {*} Char * / text
waitingHeaderLineEnd(char) {
if (char === '\n') return this.waitingHeaderName;
return this.waitingHeaderLineEnd;
}
/** * Wait for the Header content to end *@param {*} Char * / text
waitingHeaderBlockEnd(char) {
if (char === '\n') return this.waitingBody;
return this.waitingHeaderBlockEnd;
}
/** * Wait for the body content *@param {*} Char * / text
waitingBody(char) {
this.bodyParser.receiveChar(char);
return this.waitingBody; }}/** * Chunked Body parser */
class ChunkedBodyParser {
constructor() {
this.state = this.waitingLength;
this.length = 0;
this.content = [];
this.isFinished = false;
}
receiveChar(char) {
this.state = this.state(char);
}
/** * Wait for Body length *@param {*} Char * / text
waitingLength(char) {
if (char === '\r') {
if (this.length === 0) this.isFinished = true;
return this.waitingLengthLineEnd;
} else {
// Convert the hexadecimal length
this.length *= 16;
this.length += parseInt(char, 16);
}
return this.waitingLength;
}
/** * Wait for the Body line to end *@param {*} Char * / text
waitingLengthLineEnd(char) {
if (char === '\n') return this.readingTrunk;
return this.waitingLengthLineEnd;
}
/** * Read Trunk contents *@param {*} Char * / text
readingTrunk(char) {
this.content.push(char);
this.length--;
if (this.length === 0) return this.waitingNewLine;
return this.readingTrunk;
}
/** * wait for a new line *@param {*} Char * / text
waitingNewLine(char) {
if (char === '\r') return this.waitingNewLineEnd;
return this.waitingNewLine;
}
/** * Wait for a new line to end *@param {*} Char * / text
waitingNewLineEnd(char) {
if (char === '\n') return this.waitingLength;
return this.waitingNewLineEnd; }}Copy the code
The last
Here we will achieve the browser HTTP Request Request, HTTP Response parsing process code.
In the next article, we will implement HTTP parsing and build a DOM tree together, and then perform CSS calculations.