I am three diamonds, one of you in “Technology Galaxy” to come together for a lifetime of drifting learning.

Praise is power, attention is recognition, comment is love! See you next time 👋!

At the forefront of

How browsers work is a very important part of the content, we often see redraw, rearrange or some explain CSS properties, will use some knowledge of how browsers work to explain. The theoretical learning of how browsers work is not very effective and boring, so here we use JavaScript to implement a browser from scratch.

By implementing a simple browser ourselves, we will gain a deeper understanding of the fundamentals of the browser.

Browser base rendering process

  • First, the browser does the overall rendering in 5 steps
  • We access a web page from a URL, which is parsed and rendered by the browser into a Bitmap
  • Finally through our graphics card driver set out the picture, let us see the completed page
  • This is a browser rendering flow
  • We only implement a simple basic flow here, but a real browser also includes many features, such as history and so on

The main thing we need to complete is the whole process from the URL request to the Bitmap page display.

Browser flow:

  1. URLPart, throughHTTPRequest, then parse the returned content, then extractHTMLcontent
  2. getHTMLAfter that, we can program an HTML text through text analysis (parse)DOM æ ‘
  3. At this timeDOMThe tree was bare, so next we did CSS computing and finally mounted CSS on the DOM tree
  4. After the calculation, we have a styled DOM tree, which is ready to be laid out (or typeset)
  5. Through layout calculation, each DOM will get a calculated box (of course, in real browsers, each CSS will generate a box, but to simplify this, we will only make one box per DOM).
  6. Finally we can Render the DOM tree with background images and background colors, and finally paint the styles onto an image. Then we can show the user the API interface provided by the operating system and the hardware driver.

Finite state machines to process strings

Because this handling of strings is a trick used throughout the browser, it would be very difficult to implement and read the browser implementation code without using the state machine. So let’s talk about what a finite state machine is.

  • Each state is a machine
    • Each machine is a powerful abstraction mechanism decoupled from each other
    • In each machine, we can do calculation, storage, output and so on
    • All of these machines accept the same input
    • Each machine in a state machine has no state of its own. If we express it as a function, it should be a pure function.
    • No side effects means that you should no longer be controlled by external inputs, which are ok
  • Each machine knows the next state
    • Every machine has a definite next state (Moore)
    • Each machine decides the next state based on the input (Mealy)

How do you do that in JavaScript

Mealy state machine:

// Each function is a state
function state (input) { // Function arguments are inputs
  // Within functions, you are free to write code that handles the logic of each state
  return next; // Return the value as the next state
}

/** ========= The following is debug ========= */
while (input) {
  // Get input
  state = state(input); // Take the return value of the state machine as the next state
}
Copy the code
  • As we saw in the code above, each function is a state
  • And then the argument to the function is the inputinput
  • The return value of this function is the next state, which means that the next return value must be a state function
  • The ideal implementation of a state machine is a series of state functions that return state functions
  • When a state function is called, it usually takes the input in a loop and passes throughstate = state(input)To make the state machine accept input to complete the state switch
  • MealyThe return value must be based oninputReturns the next state
  • MooreType state machine, the return value is andinputThere is no relationship, it is all fixed state return

Strings are not processed using state machines

Let’s start by looking at some strings that can be handled without a state machine:

Problem 1: Find the character “A” in a string

function match(string) {
  for (let letter of string) {
    if (letter == 'a') return true;
  }
  return false;
}

console.log(match('I am TriDiamond'));
Copy the code

Second problem: No regular expressions allowed, pure JavaScript logic: in a string, find the character “ab”

“Look directly for A and B, and return when you find both.”

/** * find 'a' and 'b' directly, return * when both are found@param {*} String Matched character */
function matchAB(string) {
  let hasA = false;
  for (let letter of string) {
    if (letter == 'a') {
      hasA = true;
    } else if (hasA && letter == 'b') {
      return true;
    } else {
      hasA = false; }}return false;
}

console.log( matchAB('hello abert'));
Copy the code

No regular expressions, just JavaScript logic: In a string, find the character “abcdef”

Method 1: “Use temporary memory and move pointer to detect”

/** * use temporary space, move pointer to detect *@param {*} Match Indicates the character * to be matched@param {*} String Matched character */
function matchString(match, string) {
  const resultLetters = match.split(' '); // Split matching characters into arrays to record
  const stringArray = string.split(' '); // Split the contents of the matched string into arrays
  let index = 0; // Match a pointer to a string

  for (let i = 0; i <= stringArray.length; i++) {
    // To ensure that the characters match absolutely, such as "ab "cannot be" ABC ", cannot be "ab"
    // So the two characters must be sequential
    if (stringArray[i] == resultLetters[index]) {
      // If a character matches, index + 1 searches for the next character
      index++;
    } else {
      // If the next character does not match, reset and re-match
      index = 0;
    }
    // Return true if all characters have been matched
    // The proof character contains the character to look for
    if (index > resultLetters.length - 1) return true;
  }
  return false;
}

console.log(Method of '1', matchString('abcdef'.'hello abert abcdef'));
Copy the code

Method 2: “Use subString and the length of the matching character to intercept the character to see if it equals the answer”

/** * Generic string matching - Refer to method 2 (using substring) *@param {*} Match Indicates the character * to be matched@param {*} String Matched character */
function matchWithSubstring(match, string) {
  for (let i = 0; i < string.length - 1; i++) {
    if (string.substring(i, i + match.length) === match) {
      return true; }}return false;
}

console.log(Method of '2', matchWithSubstring('abcdef'.'hello abert abcdef'));
Copy the code

Method 3: “Search one by one until you find the final result”

/** ** search one by one until you find the final result *@param {*} String Matched character */
function match(string) {
  let matchStatus = [false.false.false.false.false.false];
  let matchLetters = ['a'.'b'.'c'.'d'.'e'.'f'];
  let statusIndex = 0;

  for (let letter of string) {
    if (letter == matchLetters[0]) {
      matchStatus[0] = true;
      statusIndex++;
    } else if (matchStatus[statusIndex - 1] && letter == matchLetters[statusIndex]) {
      matchStatus[statusIndex] = true;
      statusIndex++;
    } else {
      matchStatus = [false.false.false.false.false.false];
      statusIndex = 0;
    }

    if (statusIndex > matchLetters.length - 1) return true;
  }
  return false;
}

console.log(Method of '3', match('hello abert abcdef'));
Copy the code

Use state machines to process characters

Here we use the state machine approach: In a string, find the character “abcdef”

  • First of all, each state is a state variable
  • We should have a start state and an end state function, respectivelystart å’Œ end
  • State function names represent the state of the current statematchedAIt’s already matchedaCharacters, and so on
  • The logic in each state is to match the next character
  • Returns the next state function if the match is successful
  • Returns the start state if the match failsstart
  • Because the last one in the character isfCharacter, somatchedEAfter success, you can directly return to the end stateend
  • endThe end state, also known as the Trap method, is left there until the loop ends because the transition is over
/** * The state machine string matches *@param {*} string* /
function match(string) {
  let state = start;

  for (let letter of string) {
    state = state(letter); // State switchover
  }

  return state === end; // Return true if the last state function is' end '
}

function start(letter) {
  if (letter === 'a') return matchedA;
  return start;
}

function end(letter) {
  return end;
}

function matchedA(letter) {
  if (letter === 'b') return matchedB;
  return start(letter);
}

function matchedB(letter) {
  if (letter === 'c') return matchedC;
  return start(letter);
}

function matchedC(letter) {
  if (letter === 'd') return matchedD;
  return start(letter);
}

function matchedD(letter) {
  if (letter === 'e') return matchedE;
  return start(letter);
}

function matchedE(letter) {
  if (letter === 'f') return end(letter);
  return start(letter);
}

console.log(match('I am abcdef'));
Copy the code

Problem upgrade: with the state machine to achieve the string “abcabx” parsing

  • The difference between this question and the one above is that “ab” repeats
  • So the logic of our analysis should be:
    • The first “B” should be followed by a “C”, and the second “B” should be followed by an “X”
    • If it’s not followed by an “x”, go back to the previous state variable
/** * The state machine matches the string *@param {*} String Matched character */
function match(string) {
  let state = start;

  for (let letter of string) {
    state = state(letter);
  }

  return state === end;
}

function start(letter) {
  if (letter === 'a') return matchedA;
  return start;
}

function end(letter) {
  return end;
}

function matchedA(letter) {
  if (letter === 'b') return matchedB;
  return start(letter);
}

function matchedB(letter) {
  if (letter === 'c') return matchedC;
  return start(letter);
}

function matchedC(letter) {
  if (letter === 'a') return matchedA2;
  return start(letter);
}

function matchedA2(letter) {
  if (letter === 'b') return matchedB2;
  return start(letter);
}

function matchedB2(letter) {
  if (letter === 'x') return end;
  return matchedB(letter);
}

console.log('result: ', match('abcabcabx'));
Copy the code

Basic knowledge of HTTP protocol parsing

Iso-osi seven-layer network model

HTTP

  • Composition:
    • application
    • said
    • The session
  • We are familiar with the node coderequire('http')

TCP

  • Composition:
    • transmission
  • Since web pages need reliable delivery, we only care about TCP

Internet

  • Composition:
    • network
  • Sometimes there are two meanings of surfing the Internet
    • The protocol (extranet) of the application layer where the web page resides – the Internet is responsible for data transmission
    • The internal network of a company is called an Intranet

4G/5G/Wi-Fi

  • Composition:
    • The data link
    • The physical layer
  • In order to complete the accurate transmission of data
  • The transmissions are all point-to-point
  • You must have a direct connection to transmit

Basic knowledge of TCP and IP

  • flow
    • The concept of transferring data in the TCP layer is “stream”
    • A flow is a unit without obvious division
    • It just guarantees that the order is correct
  • port
    • The TCP protocol is used by software inside a computer
    • Every piece of software is going to fetch data from the network card
    • Ports determine which data is allocated to which software
    • For Node.js, it’s an apprequire('net')
  • package
    • The concept of TCP transport is packet by packet
    • Each packet can be large or small
    • It depends on the transmission capacity of your entire network of intermediate devices
  • The IP address
    • Where should the IP packet go according to the address
    • The connection relationship on the Internet is very complex, with some large routing nodes in the middle
    • When we access an IP address, we connect to our cell address and then to the trunk of the telecommunication
    • If it is visiting a foreign country, it will be added to the international trunk address
    • This IP address is the unique identifier that connects to every device on the Internet
    • So an IP packet, it’s an IP address to find out where it needs to be sent, right
  • libnet/libpcap
    • The IP protocol calls these two libraries in C++
    • Libnet takes care of constructing IP packets and sending them
    • Labpcap is responsible for fetching all IP packets flowing through the network card from the network card
    • If we use a switch instead of a router for networking, we can use the underlying LabpCap packets to catch a lot of IP packets that are not destined for us

HTTP

  • composition
    • Request the Request
    • The Response to return
  • In contrast to the TCP full-duplex channel, packets can be sent or received without priority relationship
  • What is special about HTTP is that the client must first initiate a request
  • And then the server comes back with a response
  • So every request must have a response
  • If there are too many requests or responses, the protocol is wrong

HTTP request – server environment preparation

Before we write our own browser, let’s first build a Node.js server.

First we write a node.js server:

const http = require('http');

http
  .createServer((request, response) = > {
    let body = [];
    request
      .on('error'.err= > {
        console.error(err);
      })
      .on('data'.chunk= > {
        body.push(chunk.toString());
      })
      .on('end'.() = > {
        body = Buffer.concat(body).toString();
        console.log('body', body);
        response.writeHead(200, { 'Content-Type': 'text/html' });
        response.end(' Hello World\n');
      });
  })
  .listen(8080);

console.log('server started');
Copy the code

Understand the HTTP Request protocol

Before writing our client code, we need to understand the HTTP protocol.

Let’s take a look at the HTTP request first:

POST / HTTP/1.1

Host: 127.0.0.1

Content-Type: application/x-www-form-urlencoded

field1=aaa&code=x%3D1

  • HTTP is a text protocol. A text protocol is generally opposed to a binary protocol, which means that everything in this protocol is a string, and each byte is part of the string.
  • The first line of the HTTP protocol is calledrequest lineContains three parts:
    • Method: For example, POST and GET
    • Path: Default is “/”
    • HTTP and HTTP version: HTTP/1.1
  • And then what happens isHeaders
    • The number of Header lines is not fixed
    • Each line is separated by a colonkey: valueformat
    • Headers ends with a blank line
  • And the last part isbodyParts:
    • The content of this section isContent-TypeTo determine the
    • What format does the content-Type dictate, so what format should the body be

Now we can start writing code!

Implementing HTTP requests

  • Design an HTTP request class
  • Content Type is a required field to have a default value
  • Body is in KV format
  • Different Content-Types affect the formatting of the body

The Request class

class Request {
  constructor(options) {
    // First give the default values you need to use from constructor
    this.method = options.method || 'GET';
    this.host = options.host;
    this.port = options.port || 80;
    this.path = options.path || '/';
    this.body = options.body || {};
    this.headers = options.headers || {};

    if (!this.headers['Content-Type']) {
      this.headers['Content-Type'] = 'application/x-www-form-urlencoded';
    }
	// Convert the body format to content-type
    if (this.headers['Content-Type'= = ='application/json') {
      this.bodyText = JSON.stringify(this.body);
    } else if (this.headers['Content-Type'= = ='application/x-www-form-urlencoded') {
      this.bodyText = Object.keys(this.body)
        .map(key= > `${key}=The ${encodeURIComponent(this.body[key])}`)
        .join('&');
    }
    // Automatically calculates the length of the body content. If the length is incorrect, it is an illegal request
    this.headers['Content-Length'] = this.bodyText.length;
  }
 // the method that sends the request, returning a Promise object
  send() {
    return new Promise((resolve, reject) = > {
      / /...}); }}Copy the code

Request method

/** * request method */
void (async function () {
  let request = new Request({
    method: 'POST'.host: '127.0.0.1'.port: '8080'.path: '/'.headers: {['X-Foo2'] :'custom',},body: {
      name: 'tridiamond',}});let response = await request.end();

  console.log(response); }) ();Copy the code

Write the send function in the Request class

  • The Send function is a form of Promise
  • Therefore, a response is gradually received during the send process
  • Finally, after the response is constructed, let the Promise get resolve
  • Because the process is progressively receiving information, we need to design a ResponseParse
  • This allows Parse to construct different parts of the Response object by gradually receiving information about the response
// the method that sends the request, returning a Promise object
  send() {
    return new Promise((resolve, reject) = > {
      const parser = new ResponseParser();
      resolve(' ');
    });
  }
Copy the code

Design ResponseParser

  • The Receive function accepts strings
  • String by string is then processed using a state machine
  • So we need to loop through each string and addrecieveCharFunction to process each character
class ResponseParser {
  constructor() {}
  receive(string) {
    for (let i = 0; i < string.length; i++) {
      this.receiveChar(string.charAt(i)); }}receiveChar(char){}}Copy the code

Understand HTTP Response protocol

In the next section, we need to parse HTTP Response in our code, so I’ll take a look at HTTP Response.

HTTP / 1.1 200 OK

Content-Type: text/html

Date: Mon, 23 Dec 2019 06:46:19 GMT

Connection: keep-alive

Transfer-Encoding: chunked

26

Hello World

0

  • Let’s start with the first rowstatus lineAs opposed to the Request line
    • The first part is the HTTP protocol version: HTTP/1.1
    • The second part is the HTTP status code: 200 (in the implementation of our browser, for simplicity, we can change the status beyond 200 to error)
    • The third part is the HTTP status text: OK
  • The next section is the header section
    • HTML request and response contain headers
    • It’s exactly the same format as request
    • Finally, a blank line is used to separate the headers and body parts
  • And then finally here is the body part
    • Here the body format is also determined by the content-Type
    • Here’s a typical format calledchunked body(is a format returned by default by Node)
    • Chunked body is a single hexadecimal number on a single line
    • This is followed by the content section
    • And then a hexadecimal 0,0, and then the end of the body
    • This is also used to separate the contents of the body

Implement send request

Here we get into the game by implementing the logic in the SEND function to actually send the request to the server.

  • Design to support existing connections or add your own
  • Data is received and passed to parser
  • Resolve Promise based on the parser status

With the above ideas, we will implement the code:

// the method that sends the request, returning a Promise object
  send(connection) {
    return new Promise((resolve, reject) = > {
      const parser = new ResponseParser();
      // Check whether the connection has been sent
      // Create a TCP connection based on the Host and port
      // 'toString' assembles the Request parameters in the format of the HTTP Request
      if (connection) {
        connection.write(this.toString());
      } else {
        connection = net.createConnection(
          {
            host: this.host,
            port: this.port,
          },
          () = > {
            connection.write(this.toString()); }); }// Listen for connection data
      // Pass it to Parser as it is
      Resolve if parser is already terminated
      // Finally disconnect
      connection.on('data'.data= > {
        console.log(data.toString());
        parser.receive(data.toString());

        if(parser.isFinished) { resolve(parser.response); connection.end(); }});// Listen for connection error
      // If the request fails, reject the promise first
      // Then disconnect the connection to avoid occupying the connection
      connection.on('error'.err= > {
        reject(err);
        connection.end();
      });
    });
  }
  /** * Assemble HTTP Request text content */
  toString() {
    return `The ${this.method} The ${this.path}HTTP / 1.1 \ rThe ${Object.keys(this.headers)
        .map(key => `${key}: The ${this.headers[key]}`)
        .join('\r\n')}\r\r
      The ${this.bodyText}`;
  }
Copy the code

Implement RequestParser class

Now let’s implement the code for the RequestParser class.

  • Response has to be piecewise constructed, so we’re going to assemble it with a Response Parser.
  • ResponseParser sectioned the Response Text, and we used the state machine to analyze the Text structure
/** * Response parser */
class ResponseParser {
  constructor() {
    this.state = this.waitingStatusLine;
    this.statusLine = ' ';
    this.headers = {};
    this.headerName = ' ';
    this.headerValue = ' ';
    this.bodyParser = null;
  }

  receive(string) {
    for (let i = 0; i < string.length; i++) {
      this.state = this.state(string.charAt(i)); }}receiveEnd(char) {
    return receiveEnd;
  }

  /** * Wait status line contents *@param {*} Char * / text
  waitingStatusLine(char) {
    if (char === '\r') return this.waitingStatusLineEnd;
    this.statusLine += char;
    return this.waitingStatusLine;
  }

  /** * Wait status line ends *@param {*} Char * / text
  waitingStatusLineEnd(char) {
    if (char === '\n') return this.waitingHeaderName;
    return this.waitingStatusLineEnd;
  }

  /** * Wait for the Header name *@param {*} Char * / text
  waitingHeaderName(char) {
    if (char === ':') return this.waitingHeaderSpace;
    if (char === '\r') return this.waitingHeaderBlockEnd;
    this.headerName += char;
    return this.waitingHeaderName;
  }

  /** * wait for Header space *@param {*} Char * / text
  waitingHeaderSpace(char) {
    if (char === ' ') return this.waitingHeaderValue;
    return this.waitingHeaderSpace;
  }

  /** * wait for the Header value *@param {*} Char * / text
  waitingHeaderValue(char) {
    if (char === '\r') {
      this.headers[this.headerName] = this.headerValue;
      this.headerName = ' ';
      this.headerValue = ' ';
      return this.waitingHeaderLineEnd;
    }
    this.headerValue += char;
    return this.waitingHeaderValue;
  }

  /** * Wait for the Header line to end *@param {*} Char * / text
  waitingHeaderLineEnd(char) {
    if (char === '\n') return this.waitingHeaderName;
    return this.waitingHeaderLineEnd;
  }

  /** * Wait for the Header content to end *@param {*} Char * / text
  waitingHeaderBlockEnd(char) {
    if (char === '\n') return this.waitingBody;
    return this.waitingHeaderBlockEnd;
  }

  /** * Wait for the body content *@param {*} Char * / text
  waitingBody(char) {
    console.log(char);
    return this.waitingBody; }}Copy the code

Implement the Body content parser

Finally, let’s implement the parsing logic for the Body content.

  • The body of a Response may have a different structure depending on the Content-Type, so we’ll use a subparser structure to solve the problem
  • Using ChunkedBodyParser as an example, we also use the state machine to process the body format
/** * Response parser */
class ResponseParser {
  constructor() {
    this.state = this.waitingStatusLine;
    this.statusLine = ' ';
    this.headers = {};
    this.headerName = ' ';
    this.headerValue = ' ';
    this.bodyParser = null;
  }

  get isFinished() {
    return this.bodyParser && this.bodyParser.isFinished;
  }

  get response() {
    this.statusLine.match(\ / HTTP / 1.1 ([0-9] +) ([\ s \ s] +) /);
    return {
      statusCode: RegExp. $1,statusText: RegExpThe $2,headers: this.headers,
      body: this.bodyParser.content.join(' '),}; }receive(string) {
    for (let i = 0; i < string.length; i++) {
      this.state = this.state(string.charAt(i)); }}receiveEnd(char) {
    return receiveEnd;
  }

  /** * Wait status line contents *@param {*} Char * / text
  waitingStatusLine(char) {
    if (char === '\r') return this.waitingStatusLineEnd;
    this.statusLine += char;
    return this.waitingStatusLine;
  }

  /** * Wait status line ends *@param {*} Char * / text
  waitingStatusLineEnd(char) {
    if (char === '\n') return this.waitingHeaderName;
    return this.waitingStatusLineEnd;
  }

  /** * Wait for the Header name *@param {*} Char * / text
  waitingHeaderName(char) {
    if (char === ':') return this.waitingHeaderSpace;
    if (char === '\r') {
      if (this.headers['Transfer-Encoding'= = ='chunked') {
        this.bodyParser = new ChunkedBodyParser();
      }
      return this.waitingHeaderBlockEnd;
    }
    this.headerName += char;
    return this.waitingHeaderName;
  }

  /** * wait for Header space *@param {*} Char * / text
  waitingHeaderSpace(char) {
    if (char === ' ') return this.waitingHeaderValue;
    return this.waitingHeaderSpace;
  }

  /** * wait for the Header value *@param {*} Char * / text
  waitingHeaderValue(char) {
    if (char === '\r') {
      this.headers[this.headerName] = this.headerValue;
      this.headerName = ' ';
      this.headerValue = ' ';
      return this.waitingHeaderLineEnd;
    }
    this.headerValue += char;
    return this.waitingHeaderValue;
  }

  /** * Wait for the Header line to end *@param {*} Char * / text
  waitingHeaderLineEnd(char) {
    if (char === '\n') return this.waitingHeaderName;
    return this.waitingHeaderLineEnd;
  }

  /** * Wait for the Header content to end *@param {*} Char * / text
  waitingHeaderBlockEnd(char) {
    if (char === '\n') return this.waitingBody;
    return this.waitingHeaderBlockEnd;
  }

  /** * Wait for the body content *@param {*} Char * / text
  waitingBody(char) {
    this.bodyParser.receiveChar(char);
    return this.waitingBody; }}/** * Chunked Body parser */
class ChunkedBodyParser {
  constructor() {
    this.state = this.waitingLength;
    this.length = 0;
    this.content = [];
    this.isFinished = false;
  }

  receiveChar(char) {
    this.state = this.state(char);
  }

  /** * Wait for Body length *@param {*} Char * / text
  waitingLength(char) {
    if (char === '\r') {
      if (this.length === 0) this.isFinished = true;
      return this.waitingLengthLineEnd;
    } else {
      // Convert the hexadecimal length
      this.length *= 16;
      this.length += parseInt(char, 16);
    }
    return this.waitingLength;
  }

  /** * Wait for the Body line to end *@param {*} Char * / text
  waitingLengthLineEnd(char) {
    if (char === '\n') return this.readingTrunk;
    return this.waitingLengthLineEnd;
  }

  /** * Read Trunk contents *@param {*} Char * / text
  readingTrunk(char) {
    this.content.push(char);
    this.length--;
    if (this.length === 0) return this.waitingNewLine;
    return this.readingTrunk;
  }

  /** * wait for a new line *@param {*} Char * / text
  waitingNewLine(char) {
    if (char === '\r') return this.waitingNewLineEnd;
    return this.waitingNewLine;
  }

  /** * Wait for a new line to end *@param {*} Char * / text
  waitingNewLineEnd(char) {
    if (char === '\n') return this.waitingLength;
    return this.waitingNewLineEnd; }}Copy the code

The last

Here we will achieve the browser HTTP Request Request, HTTP Response parsing process code.

In the next article, we will implement HTTP parsing and build a DOM tree together, and then perform CSS calculations.