In this article, Po ge will take you to explore WebSocket technology from many aspects. After reading this article, you will know the following:
- Understand the birth background of WebSocket, what WebSocket is and its advantages;
- Learn what apis WebSocket contains and how to use the WebSocket API to send plain text and binary data.
- Understand WebSocket handshake protocol, data frame format, mask algorithm and other related knowledge;
- Learn how to implement a WebSocket server that supports sending plain text.
In the last part, Bob will introduce the relationship between WebSocket and HTTP, what is the difference between WebSocket and long polling, what is WebSocket heartbeat and what is a Socket.
Recommended reading (thanks for the encouragement and support 🌹🌹🌹) :
- You don’t know the Web Workers (on) [7.8 K words | multiple warning] (424 + 👍)
- Blobs you Didn’t know (215+ 👍)
- WeakMap you don’t know (55+ 👍)
- Play the front-end Video player | multiple warning (708 + 👍)
- Play front-end binary (359+ 👍)
Now let’s get to the topic. In order to let you better understand and master WebSocket technology, let’s first introduce what WebSocket is.
What is WebSocket
1.1 Background of WebSocket
In the early days, many websites used polling to implement push technology. Polling means that the browser sends HTTP requests to the server at regular intervals, and the server returns the latest data to the client. Common polling methods are divided into polling and long polling, and their differences are shown in the following figure:
To get a better sense of the difference between polling and long polling, let’s look at the code:
The obvious disadvantage of this traditional pattern is that the browser is constantly making requests to the server. However, HTTP requests and responses may contain long headers in which only a small portion of the data is actually valid, thus consuming a lot of bandwidth resources.
A relatively new polling technique is Comet. This technique allows two-way communication, but still requires repeated requests. Also, the HTTP long connections that are common in Comet can consume server resources.
In this case, HTML5 defines the WebSocket protocol, which can better save server resources and bandwidth, and can be more real-time communication. Websockets use the Uniform resource Identifier (URI) of WS or WSS, where WSS represents websockets that use TLS. Such as:
ws://echo.websocket.org
wss://echo.websocket.org
Copy the code
WebSocket uses the same TCP port as HTTP and HTTPS and can bypass most firewall restrictions. By default, WebSocket uses port 80. Port 443 is used by default when running over TLS.
1.2 introduction of WebSocket
WebSocket is a network transport protocol that enables full-duplex communication over a single TCP connection and is located at the application layer of the OSI model. The WebSocket protocol was standardized by THE IETF in 2011 into RFC 6455 and later supplemented by RFC 7936.
WebSocket makes it easier to exchange data between the client and the server, allowing the server to actively push data to the client. In the WebSocket API, the browser and server only need to complete a handshake to create a persistent connection and two-way data transfer between the two.
After introducing Polling and WebSocket, let’s take a look at the differences between XHR Polling and WebSocket:
1.3 the WebSocket advantages
- Less control overhead. When data is exchanged between the server and client after the connection is created, the packet headers used for protocol control are relatively small.
- More real-time. Because the protocol is full-duplex, the server can proactively send data to the client at any time. Compared to HTTP requests that require a client to initiate a request before the server can respond, the latency is significantly less.
- Keep the connection state. Unlike HTTP, WebSocket needs to create a connection first, which makes it a stateful protocol that can then communicate without some state information.
- Better binary support. WebSocket defines binary frames, making it easier to process binary content than HTTP.
- Extensions can be supported. WebSocket defines extensions that users can extend and implement partially customized sub-protocols.
WebSocket has the advantages mentioned above, so it is widely used in instant communication, real-time audio and video, online education, games and other fields. For front-end developers, if you want to use the powerful power WebSocket provides, you must first master the WebSocket API, the following Po brother with you to know about the WebSocket API.
Second, the WebSocket API
Before introducing the WebSocket API, let’s take a look at its compatibility:
(Photo credit: https://caniuse.com/#search=WebSocket)
As you can see from the figure above, WebSocket is now supported by major Web browsers, so it can be used with caution in most projects.
To use WebSocket capabilities in a browser, we must first create a WebSocket object that provides an API for creating and managing WebSocket connections and for sending and receiving data over that connection.
Using the WebSocket constructor, we can easily construct a WebSocket object. Next, we will introduce the WebSocket API from four aspects: WebSocket constructor, WebSocket object properties, methods and webSocket-related events. First, we start with the WebSocket constructor:
2.1 Constructors
The syntax for the WebSocket constructor is:
const myWebSocket = new WebSocket(url [, protocols]);
Copy the code
Related parameters are described as follows:
- Url: Represents the URL of the connection, which is the URL to which the WebSocket server will respond.
- Protocols (Optional) : A protocol string or an array containing protocol strings. These strings are used to specify subprotocols so that a single server can implement multiple WebSocket subprotocols. For example, you might want a server that can handle different types of interactions based on a specified protocol. If no protocol string is specified, an empty string is assumed.
A SECURITY_ERR exception is thrown when the port trying to connect is blocked.
2.2 attributes
The WebSocket object contains the following properties:
The specific meaning of each attribute is as follows:
- BinaryType: Use binary data type connections.
- BufferedAmount (read only) : Number of bytes not sent to the server.
- Extensions (read-only) : Extensions selected by the server.
- Onclose: Used to specify the callback function after the connection is closed.
- Onerror: Used to specify the callback function if the connection fails.
- Onmessage: Specifies the callback function when a message is received from the server.
- Onopen: Used to specify the callback function if the connection is successful.
- Protocol (read-only) : Used to return the name of the subprotocol selected on the server side.
- ReadyState (read-only) : Returns the connection state of the current WebSocket. There are four states:
- CONNECTING – When a connection is being made, the value is 0.
- OPEN – Connected and able to communicate, with a value of 1;
- CLOSING – The connection is CLOSING, and the corresponding value is 2;
- CLOSED – The connection is CLOSED or failed, and the value is 3.
- Url (read-only) : The return value is the absolute path to the URL when the constructor creates the WebSocket instance object.
2.3 methods
- Close ([code[, reason]]) : This method is used to close the WebSocket connection. If the connection is already closed, this method does nothing.
- Send (data) : This method queues the data that needs to be sent to the server over a WebSocket link and increases the bufferedAmount value based on the size of the data that needs to be sent. If the data cannot be transferred (for example, if the data needs to be cached and the buffer is full), the socket closes itself.
2.4 event
Use addEventListener() or assign an event listener to the OnEventName property of the WebSocket object to listen for the following events.
- Close: Triggered when a WebSocket connection is closed. This can also be set with the onclose property.
- Error: Raised when a WebSocket connection is closed due to an error, which can also be set with the onError property.
- Message: Triggered when data is received via WebSocket, which can also be set via the onMessage property.
- Open: Triggered when a WebSocket connection succeeds. This can also be set using the onopen property.
After introducing the WebSocket API, let’s take an example of sending plain text using WebSocket.
2.5 Sending Plain Text
In the example above, we created two Textareas on the page, one for the data to be sent and one for the data returned by the server. When the user finishes typing the text to be sent, clicking the send button sends the input text to the server, and the server successfully receives the message and sends it back to the client intact.
// const socket = new WebSocket("ws://echo.websocket.org");
// const sendMsgContainer = document.querySelector("#sendMessage");
function send() {
const message = sendMsgContainer.value;
if(socket.readyState ! == WebSocket.OPEN) { console.log("Connection not established, cannot send message yet"); return; } if (message) socket.send(message); } Copy the code
Of course, after the client receives the message returned by the server, the corresponding text content will be saved in the textarea text box corresponding to the received data.
// const socket = new WebSocket("ws://echo.websocket.org");
// const receivedMsgContainer = document.querySelector("#receivedMessage");
socket.addEventListener("message".function (event) {
console.log("Message from server ", event.data);
receivedMsgContainer.value = event.data;
}); Copy the code
For a more intuitive understanding of the above data interaction process, let’s use Chrome developer tools to take a look at the corresponding process:
The complete code for the above example is as follows:
<html>
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
<title>Example of WebSocket sending plain text</title> <style> .block { flex: 1; } </style> </head> <body> <h3>Po: WebSocket sends plain text example</h3> <div style="display: flex;"> <div class="block"> <p>Data to be sent:<button onclick="send()">send</button></p> <textarea id="sendMessage" rows="5" cols="15"></textarea> </div> <div class="block"> <p>Received data:</p> <textarea id="receivedMessage" rows="5" cols="15"></textarea> </div> </div> <script> const sendMsgContainer = document.querySelector("#sendMessage"); const receivedMsgContainer = document.querySelector("#receivedMessage"); const socket = new WebSocket("ws://echo.websocket.org"); // Listen for connection success events socket.addEventListener("open".function (event) { console.log("Connection successful. You can start communication."); }); // Listen for messages socket.addEventListener("message".function (event) { console.log("Message from server ", event.data); receivedMsgContainer.value = event.data; }); function send() { const message = sendMsgContainer.value; if(socket.readyState ! == WebSocket.OPEN) { console.log("Connection not established, cannot send message yet"); return; } if (message) socket.send(message); } </script> </body> </html> Copy the code
In addition to sending plain text, WebSocket also supports sending binary data, such as ArrayBuffer, Blob, or ArrayBufferView:
const socket = new WebSocket("ws://echo.websocket.org");
socket.onopen = function () {
// Send utF-8 encoded text messages
socket.send("Hello Echo Server!");
// Send utF-8 encoded JSON data
socket.send(JSON.stringify({ msg: "I am Po Brother." })); // Send the binary ArrayBuffer const buffer = new ArrayBuffer(128); socket.send(buffer); // Send binary ArrayBufferView const intview = new Uint32Array(buffer); socket.send(intview); // Send binary Blob const blob = new Blob([buffer]); socket.send(blob); }; Copy the code
After the above code runs successfully, we can see the corresponding data interaction process through Chrome Developer Tools:
Here’s how to send binary data using Blob objects as an example.
Blob (Binary Large Object) represents a Large Object of Binary type. In a database management system, the storage of binary data as a single collection of individuals. Blobs are usually video, sound, or multimedia files. Blob objects in JavaScript represent immutable file-like raw data.
If you’re interested in bloBs, you can read bloBs you Don’t know.
2.6 Sending Binary Data
In the example above, we created two Textareas on the page, one for the data to be sent and one for the data returned by the server. When the user clicks the Send button after typing the text to be sent, we take the input text, wrap it as a Blob object and send it to the server, which receives the message successfully and sends it back to the client intact.
When the browser receives a new message, it automatically converts it to a DOMString object if it is text data. If it is binary data or Blob object, it directly forwards it to the application for processing according to the returned data type.
Data transmission code
// const socket = new WebSocket("ws://echo.websocket.org");
// const sendMsgContainer = document.querySelector("#sendMessage");
function send() {
const message = sendMsgContainer.value;
if(socket.readyState ! == WebSocket.OPEN) { console.log("Connection not established, cannot send message yet"); return; } const blob = new Blob([message], { type: "text/plain" }); if (message) socket.send(blob); console.log('Number of bytes not sent to the server:${socket.bufferedAmount}`); } Copy the code
If the Blob type is Blob, the client will call the Blob object’s text() method to retrieve the Blob’s UTF-8 content. The corresponding text content is then saved into the textarea text box corresponding to the received data.
Data receiving code
// const socket = new WebSocket("ws://echo.websocket.org");
// const receivedMsgContainer = document.querySelector("#receivedMessage");
socket.addEventListener("message".async function (event) {
console.log("Message from server ", event.data);
const receivedData = event.data;
if (receivedData instanceof Blob) { receivedMsgContainer.value = await receivedData.text(); } else { receivedMsgContainer.value = receivedData; } }); Copy the code
Again, let’s take a look at the process using Chrome’s developer tools:
It is clear from the above figure that when sending a Blob object, the Data field displays the Binary Message, whereas when sending plain text, the Data field displays the text Message sent directly.
The complete code for the above example is as follows:
<html>
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
<title>Example of WebSocket sending binary data</title> <style> .block { flex: 1; } </style> </head> <body> <h3>Po: WebSocket sends binary data example</h3> <div style="display: flex;"> <div class="block"> <p>Data to be sent:<button onclick="send()">send</button></p> <textarea id="sendMessage" rows="5" cols="15"></textarea> </div> <div class="block"> <p>Received data:</p> <textarea id="receivedMessage" rows="5" cols="15"></textarea> </div> </div> <script> const sendMsgContainer = document.querySelector("#sendMessage"); const receivedMsgContainer = document.querySelector("#receivedMessage"); const socket = new WebSocket("ws://echo.websocket.org"); // Listen for connection success events socket.addEventListener("open".function (event) { console.log("Connection successful. You can start communication."); }); // Listen for messages socket.addEventListener("message".async function (event) { console.log("Message from server ", event.data); const receivedData = event.data; if (receivedData instanceof Blob) { receivedMsgContainer.value = await receivedData.text(); } else { receivedMsgContainer.value = receivedData; } }); function send() { const message = sendMsgContainer.value; if(socket.readyState ! == WebSocket.OPEN) { console.log("Connection not established, cannot send message yet"); return; } const blob = new Blob([message], { type: "text/plain" }); if (message) socket.send(blob); console.log('Number of bytes not sent to the server:${socket.bufferedAmount}`); } </script> </body> </html> Copy the code
There may be some friends who are not satisfied with the WebSocket API after learning about it. The following brother will take you to achieve a support to send ordinary text WebSocket server.
Handwriting WebSocket server
Before we introduce how to write a WebSocket server, we need to understand the life cycle of WebSocket connections.
The figure shows that full duplex communication can be achieved only with a Handshake between the client and server. Two-way data communication can be started only with a Handshake.
The handshake is after the communication circuit is created and before the transmission of information begins. Handshakes are used to agree parameters such as message transfer rate, alphabet, parity, interrupt procedure, and other protocol features. Handshakes help systems or devices of different structures to connect in a communication channel without the need to manually set parameters.
Since the handshake is the first part of the WebSocket connection life cycle, let’s first examine the WebSocket handshake protocol.
3.1 Handshake Protocol
WebSocket is an application-layer protocol that relies on TCP at the transport layer. WebSocket uses the 101 status code of the HTTP/1.1 protocol for handshake. To create a WebSocket connection, a request is made through the browser, and the server responds, a process often referred to as “Handshaking.”
There are several benefits to using HTTP to complete the handshake. First, make WebSocket compatible with the existing HTTP infrastructure: make WebSocket servers run on ports 80 and 443, which are usually the only ports open to clients. Second, it allows us to reuse and extend the HTTP Upgrade stream to add a custom WebSocket header for negotiation.
Let’s take a closer look at the handshake process using the example of sending plain text shown earlier.
3.1.1 Client Requests
Host: echo.websocket.org Origin: file:// Connection: Upgrade Upgrade: websocket Sec-WebSocket-Version: 13 Sec-WebSocket-Key: Zx8rNEkBE4xnwifpuh8DHQ== Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bitsCopy the code
Note: Some HTTP request headers are ignored
Fields that
- Connection must be set to Upgrade, indicating that the client wants to connect to the Upgrade.
- The Upgrade field must be set to WebSocket, indicating that you want to Upgrade to the WebSocket protocol.
- Sec-websocket-version Indicates the supported WebSocket Version. RFC6455 requires version 13 to be used, and previous drafts should be deprecated.
- Sec-websocket-key is a random string that the server uses to construct a summary of sha-1 information. Add “sec-websocket-key” to a special string “258eAFa5-E914-47DA-95CA-C5AB0DC85B11”, compute the SHA-1 digest, and then Base64 encoding, Returns the result as the value of the sec-websocket-accept header to the client. In this way, ordinary HTTP requests are avoided from being mistaken for WebSocket protocols.
- Sec-websocket-extensions are used to negotiate WebSocket Extensions for this connection: the client sends the supported Extensions, and the server confirms that it supports one or more Extensions by returning the same header.
- The Origin field is optional and is usually used to indicate the page from which the WebSocket connection was initiated in the browser, similar to Referer. However, unlike Referer, Origin contains only the protocol and host name.
3.1.2 Server Response
HTTP/1.1 101 WebSocket Protocol Handshake ① Connection: Upgrade ② Upgrade: websocket ③ SEC-websocket-accept: 52 rg3vw4jq1ywpkvflstsiezlqw = (4)Copy the code
Note: Some HTTP response headers are ignored
- (1) 101 Confirming the upgrade to WebSocket.
- ② Set the Connection header to “Upgrade” to indicate that this is an Upgrade request. The HTTP protocol provides a special mechanism that allows an established connection to be upgraded to a new, incompatible protocol.
- ③ The Upgrade header specifies one or more protocol names, which are sorted by priority and separated by commas. This indicates that the WebSocket protocol is upgraded.
- (4) Support key value verification protocols for signatures.
After introducing the WebSocket handshake protocol, Bob will use Node.js to develop our WebSocket server.
3.2 Implementing the handshake Function
To develop a WebSocket server, we need to implement the handshake function first. Here, Apache uses the built-in HTTP module of Node.js to create an HTTP server, as shown in the following code:
const http = require("http");
const port = 8888;
const { generateAcceptValue } = require("./util");
const server = http.createServer((req, res) = > { res.writeHead(200, { "Content-Type": "text/plain; charset=utf-8" }); res.end("Hello, I'm Po Brother. Thank you for reading "WebSocket You Didn't Know."); }); server.on("upgrade".function (req, socket) { if (req.headers["upgrade"]! = ="websocket") { socket.end("HTTP / 1.1 400 Bad Request"); return; } // Read sec-websocket-key provided by the client const secWsKey = req.headers["sec-websocket-key"]; // Generate sec-websocket-Accept using the SHA-1 algorithm const hash = generateAcceptValue(secWsKey); // Set the HTTP response header const responseHeaders = [ "HTTP/1.1 101 Web Socket Protocol Handshake". "Upgrade: WebSocket". "Connection: Upgrade". `Sec-WebSocket-Accept: ${hash}`. ]; // Returns the response information for the handshake request socket.write(responseHeaders.join("\r\n") + "\r\n\r\n"); }); server.listen(port, () => console.log(`Server running at http://localhost:${port}`) ); Copy the code
In the above code, we first introduce the HTTP module, then create an HTTP server by calling the module’s createServer() method, and then we listen for the Upgrade event, which is triggered each time the server responds to an upgrade request. Since our server only supports upgrading to WebSocket, if the client requests upgrading to a protocol other than WebSocket, we will return “400 Bad Request”.
When the server receives a WebSocket handshake request, it takes the value of “sec-websocket-key” from the request header and adds the value to the special string “258eAFa5-e914-47DA-95CA-C5AB0DC85b11”. The SHA-1 digest is then computed and Base64 encoded, returning the result as the value of the sec-websocket-Accept header to the client.
The above process may seem a bit tedious, but with the built-in crypto module of Node.js, it can be done in a few lines of code:
// util.js
const crypto = require("crypto");
const MAGIC_KEY = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
function generateAcceptValue(secWsKey) {
return crypto .createHash("sha1") .update(secWsKey + MAGIC_KEY, "utf8") .digest("base64"); } Copy the code
Once you have developed the handshake feature, you can test it using the previous example. Once the server is started, we can verify functionality by simply adjusting the “send plain text” example by replacing the previous URL address with ws://localhost:8888.
Guys who are interested can try it out. Here are the results of the local operation:
As you can see from the image above, the handshake function we implemented already works. So is it possible to fail a handshake? The answer is yes. Such as network problems, server exceptions, or incorrect value of SEC-websocket-Accept.
Let’s make some changes to the sec-websocket-accept generation rule, such as changing the MAGIC_KEY value, and revalidate the handshake function. The browser console will output the following exception message:
WebSocket connection to 'ws://localhost:8888/' failed: Error during WebSocket handshake: Incorrect 'Sec-WebSocket-Accept' header value
Copy the code
If your WebSocket server supports subprotocols, you can refer to the following code to handle subprotocols, without further elaboration.
// Read the subprotocol from the request header
const protocol = req.headers["sec-websocket-protocol"];
// If a subprotocol is included, the subprotocol is resolved
constprotocols = ! protocol ? [] : protocol.split(",").map((s) = > s.trim());
// For simplicity, we only check if there is a JSON subprotocol if (protocols.includes("json")) { responseHeaders.push(`Sec-WebSocket-Protocol: json`); } Copy the code
Ok, the WebSocket handshake protocol is pretty much covered. Next, let’s cover some of the basics you need to know to develop messaging capabilities.
3.3 Fundamentals of Message communication
In the WebSocket protocol, data is transmitted through a series of data frames. To avoid network mediations (such as intercepting proxies) or security issues, the client must add masks to all frames it sends to the server. When a server receives a frame without an added mask, it must immediately close the connection.
3.3.1 Data frame format
To implement message communication, we need to understand the format of WebSocket data frames:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+ Copy the code
Some of you may be a little confused after reading the above. Let’s further analyze it in combination with actual data frames:
In the figure above, Arbog briefly analyzes the data frame format for the “send plain text” example. Payload Length Payload Length Payload Length Payload Length Payload length Payload length Payload length Payload length
Payload Length Indicates the length of Payload data in bytes. It has the following cases:
- A value of 0-125 indicates the length of the load data.
- If it is 126, then the next two bytes are interpreted as 16-bit unsigned integers as the length of the payload data.
- If it is 127, then the next eight bytes are interpreted as a 64-bit unsigned integer (the highest bit must be 0) as the length of the payload data.
The multi-byte length metric is expressed in network byte order, and payload length refers to the length of “extended data” + “application data”. The length of “extended data” may be 0, so the payload length is the length of “application data”.
Also, unless an extension has been negotiated, the length of extended data is 0 bytes. In the handshake protocol, any extension must specify the length of the “extension data”, how the length is computed, and how the extension is to be used. If there is an extension, this “extension data” is included in the total payload length.
3.3.2 Mask algorithm
The mask field is a 32-bit value randomly selected by the client. The mask value must be unpredictable. Therefore, the mask must come from a strong entropy source, and the given mask does not make it easy for the server or agent to predict subsequent frames. The unpredictability of masks is critical to preventing authors of malicious applications from exposing relevant byte data on the web.
The mask does not affect the length of the data load, and the steps involved in the operation of the mask and the operation of the inverse mask are the same. The following algorithms are used for mask and inverse mask operations:
j = i MOD 4
transformed-octet-i = original-octet-i XOR masking-key-octet-j
Copy the code
- Original-octet-i: indicates the I th byte of the original data.
- Transformed -octet -I: indicates the i-th byte of the transformed data.
- Masking -key-octet-j: indicates the JTH byte of the mask key.
In order to make friends better understand the calculation process of the above mask, let’s mask the data of “I am Po Brother” in the example. The utF-8 encoding for “I am Po Brother” is as follows:
E6 88 91 E6 98 AF E9 98 BF E5 AE 9D E5 93 A5
Copy the code
The corresponding Masking Key is 0x08F6EFB1. According to the above algorithm, we can perform the mask operation as follows:
let uint8 = new Uint8Array([0xE6.0x88.0x91.0xE6.0x98.0xAF.0xE9.0x98. 0xBF.0xE5.0xAE.0x9D.0xE5.0x93.0xA5]);
let maskingKey = new Uint8Array([0x08.0xf6.0xef.0xb1]);
let maskedUint8 = new Uint8Array(uint8.length);
for (let i = 0, j = 0; i < uint8.length; i++, j = i % 4) { maskedUint8[i] = uint8[i] ^ maskingKey[j]; } console.log(Array.from(maskedUint8).map(num= >Number(num).toString(16)).join(' ')); Copy the code
After the above code runs successfully, the console prints the following:
ee 7e 7e 57 90 59 6 29 b7 13 41 2c ed 65 4a
Copy the code
The above results are consistent with the values corresponding to Masked payload in WireShark, as shown in the following figure:
In WebSocket protocol, the data mask is used to enhance the security of the protocol. However, the data mask is not to protect the data itself, because the algorithm itself is public and the operation is not complicated. So why introduce a data mask? Data masks were introduced to prevent problems such as proxy cache contamination attacks that existed in earlier versions of the protocol.
After understanding the WebSocket mask algorithm and the role of the data mask, we will introduce the concept of data sharding.
3.3.3 Data Fragmentation
Each WebSocket message may be split into multiple data frames. When the WebSocket receiver receives a data frame, it determines whether the last data frame of the message has been received according to the VALUE of the FIN.
With FIN and Opcode, we can send messages across frames. The opcode tells the frame what to do. If it is 0x1, the payload is text. If it is 0x2, the payload is binary data. However, if it is 0x0, the frame is a continuation frame. This means that the server should connect the payload of the frame to the last frame received from the client.
To help you understand the above, let’s take a look at an example from MDN:
Client: FIN=1, opcode=0x1, msg="hello"
Server: (process complete message immediately) Hi.
Client: FIN=0, opcode=0x1, msg="and a"
Server: (listening, new message containing text started)
Client: FIN=0, opcode=0x0, msg="happy new"
Server: (listening, payload concatenated to previous message) Client: FIN=1, opcode=0x0, msg="year!" Server: (process complete message) Happy new year to you too! Copy the code
In the example above, the client sends two messages to the server. The first message is sent in a single frame, while the second message is sent across three frames.
The first message is a complete message (FIN=1 and opcode! = 0x0), so the server can process or respond as needed. The second message is a text message (OpCode =0x1) with FIN=0, indicating that the message has not yet been sent and that there are subsequent data frames. All remaining parts of the message are sent with a continuation frame (OpCode =0x0), and the final frame of the message is marked with FIN=1.
Ok, a brief introduction to data sharding. Next, let’s start implementing message communication capabilities.
3.4 Realizing message communication function
Po brother to achieve message communication function, decomposed into message parsing and message response two sub-functions, we respectively to introduce how to achieve these two sub-functions.
3.4.1 Message Parsing
Using the relevant knowledge introduced in the basic link of message communication, Apolo has implemented a parseMessage function, which is used to parse WebSocket data frames passed by the client. For simplicity, only text frames are processed, as follows:
function parseMessage(buffer) {
// The first byte contains the FIN bit, opcode, and mask bit
const firstByte = buffer.readUInt8(0);
// [FIN, RSV, RSV, RSV, OPCODE, OPCODE, OPCODE, OPCODE];
// Right shift 7 bits to get the first digit, 1 bit to indicate whether the data is the last frame
const isFinalFrame = Boolean((firstByte >>> 7) & 0x01); console.log("isFIN: ", isFinalFrame); // Retrieve the opcode / * ** %x0: indicates a continuation frame. When Opcode is 0, it indicates that the data transmission adopts a data fragment, and the received data frame is one of the data fragments.* %x1: indicates a text frame;* %x2: indicates a binary frame;* %x3-7: Reserved operation code for later defined non-control frames;* %x8: The connection is down.* %x9: indicates a heartbeat request (ping).* %xA: this is a heartbeat response (PONG);* % xb-f: Reserved operation code for later defined control frames.* / const opcode = firstByte & 0x0f; if (opcode === 0x08) { // The connection is closed return; } if (opcode === 0x02) { // Binary frame return; } if (opcode === 0x01) { // Currently only text frames are processed let offset = 1; const secondByte = buffer.readUInt8(offset); // MASK: 1 bit, indicating whether a MASK is used. The MASK must be used in the data frame sent to the server, but the server does not need the MASK when it returns const useMask = Boolean((secondByte >>> 7) & 0x01); console.log("use MASK: ", useMask); const payloadLen = secondByte & 0x7f; // The lower 7 bits represent the payload byte length offset += 1; // A four-byte mask let MASK = []; // If the value is between 0 and 125, then the next four bytes (32 bits) should be identified directly as a mask; if (payloadLen <= 0x7d) { // Load length less than 125 MASK = buffer.slice(offset, 4 + offset); offset += 4; console.log("payload length: ", payloadLen); } else if (payloadLen === 0x7e) { // If the value is 126, the following two bytes should be recognized as a 16-bit binary number indicating the size of the data content; console.log("payload length: ", buffer.readInt16BE(offset)); // If the length is 126 bytes, the next two bytes are the payload length, a 32-bit mask MASK = buffer.slice(offset + 2, offset + 2 + 4); offset += 6; } else { // If the value is 127, the following 8 bytes (64-bit) of content should be recognized as a 64-bit binary number representing the size of the data content MASK = buffer.slice(offset + 8, offset + 8 + 4); offset += 12; } // Start reading the payload, calculates the payload against the mask, and gets the original bytes const newBuffer = []; const dataBuffer = buffer.slice(offset); for (let i = 0, j = 0; i < dataBuffer.length; i++, j = i % 4) { const nextBuf = dataBuffer[i]; newBuffer.push(nextBuf ^ MASK[j]); } return Buffer.from(newBuffer).toString(); } return ""; } Copy the code
After creating the parseMessage function, let’s update the WebSocket server we created earlier:
server.on("upgrade".function (req, socket) {
socket.on("data", (buffer) => {
const message = parseMessage(buffer);
if (message) {
console.log("Message from client:" + message);
} else if (message === null) { console.log("WebSocket connection closed by the client."); } }); if (req.headers["upgrade"]! = ="websocket") { socket.end("HTTP / 1.1 400 Bad Request"); return; } // omit existing code }); Copy the code
After the update is complete, we restart the server and continue to test the message parsing capability with the “Send plain text” example. The following is the output of the WebSocket server after sending the “I am Po Brother” text message.
Server running at http://localhost:8888
isFIN: true
use MASK: true
payload length: 15
Message from client: This is Po GeCopy the code
By observing the above output information, our WebSocket server can successfully parse the client to send a data frame containing ordinary text, next we implement the message response function.
3.4.2 Message Response
To return data to the client, our WebSocket server also wraps the data in the format of WebSocket data frames. Like the parseMessage function, constructReply encapsulates the returned data as follows:
function constructReply(data) {
const json = JSON.stringify(data);
const jsonByteLength = Buffer.byteLength(json);
// Currently, only loads smaller than 65535 bytes are supported
const lengthByteCount = jsonByteLength < 126 ? 0 : 2;
const payloadLength = lengthByteCount === 0 ? jsonByteLength : 126; const buffer = Buffer.alloc(2 + lengthByteCount + jsonByteLength); // Set the first byte of the data frame to 1, indicating the text frame buffer.writeUInt8(0b10000001.0); buffer.writeUInt8(payloadLength, 1); // If payloadLength is 126, the last two bytes should be recognized as a 16-bit binary number representing the size of the data content let payloadOffset = 2; if (lengthByteCount > 0) { buffer.writeUInt16BE(jsonByteLength, 2); payloadOffset += lengthByteCount; } // Write JSON data to the Buffer Buffer buffer.write(json, payloadOffset); return buffer; } Copy the code
After creating the constructReply function, let’s update the WebSocket server we created earlier:
server.on("upgrade".function (req, socket) {
socket.on("data", (buffer) => {
const message = parseMessage(buffer);
if (message) {
console.log("Message from client:" + message);
// Add the following 👇 code socket.write(constructReply({ message })); } else if (message === null) { console.log("WebSocket connection closed by the client."); } }); }); Copy the code
Now that our WebSocket server is developed, let’s fully verify its functionality.
As you can see from the figure, the simple WebSocket server we developed can handle normal text messages. Finally, let’s look at the complete code:
custom-websocket-server.js
const http = require("http");
const port = 8888;
const { generateAcceptValue, parseMessage, constructReply } = require("./util");
const server = http.createServer((req, res) = > { res.writeHead(200, { "Content-Type": "text/plain; charset=utf-8" }); res.end("Hello, I'm Po Brother. Thank you for reading "WebSocket You Didn't Know."); }); server.on("upgrade".function (req, socket) { socket.on("data", (buffer) => { const message = parseMessage(buffer); if (message) { console.log("Message from client:" + message); socket.write(constructReply({ message })); } else if (message === null) { console.log("WebSocket connection closed by the client."); } }); if (req.headers["upgrade"]! = ="websocket") { socket.end("HTTP / 1.1 400 Bad Request"); return; } // Read sec-websocket-key provided by the client const secWsKey = req.headers["sec-websocket-key"]; // Generate sec-websocket-Accept using the SHA-1 algorithm const hash = generateAcceptValue(secWsKey); // Set the HTTP response header const responseHeaders = [ "HTTP/1.1 101 Web Socket Protocol Handshake". "Upgrade: WebSocket". "Connection: Upgrade". `Sec-WebSocket-Accept: ${hash}`. ]; // Returns the response information for the handshake request socket.write(responseHeaders.join("\r\n") + "\r\n\r\n"); }); server.listen(port, () => console.log(`Server running at http://localhost:${port}`) ); Copy the code
util.js
const crypto = require("crypto");
const MAGIC_KEY = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
function generateAcceptValue(secWsKey) {
return crypto .createHash("sha1") .update(secWsKey + MAGIC_KEY, "utf8") .digest("base64"); } function parseMessage(buffer) { // The first byte contains the FIN bit, opcode, and mask bit const firstByte = buffer.readUInt8(0); // [FIN, RSV, RSV, RSV, OPCODE, OPCODE, OPCODE, OPCODE]; // Right shift 7 bits to get the first digit, 1 bit to indicate whether the data is the last frame const isFinalFrame = Boolean((firstByte >>> 7) & 0x01); console.log("isFIN: ", isFinalFrame); // Retrieve the opcode / * ** %x0: indicates a continuation frame. When Opcode is 0, it indicates that the data transmission adopts a data fragment, and the received data frame is one of the data fragments.* %x1: indicates a text frame;* %x2: indicates a binary frame;* %x3-7: Reserved operation code for later defined non-control frames;* %x8: The connection is down.* %x9: indicates a heartbeat request (ping).* %xA: this is a heartbeat response (PONG);* % xb-f: Reserved operation code for later defined control frames.* / const opcode = firstByte & 0x0f; if (opcode === 0x08) { // The connection is closed return; } if (opcode === 0x02) { // Binary frame return; } if (opcode === 0x01) { // Currently only text frames are processed let offset = 1; const secondByte = buffer.readUInt8(offset); // MASK: 1 bit, indicating whether a MASK is used. The MASK must be used in the data frame sent to the server, but the server does not need the MASK when it returns const useMask = Boolean((secondByte >>> 7) & 0x01); console.log("use MASK: ", useMask); const payloadLen = secondByte & 0x7f; // The lower 7 bits represent the payload byte length offset += 1; // A four-byte mask let MASK = []; // If the value is between 0 and 125, then the next four bytes (32 bits) should be identified directly as a mask; if (payloadLen <= 0x7d) { // Load length less than 125 MASK = buffer.slice(offset, 4 + offset); offset += 4; console.log("payload length: ", payloadLen); } else if (payloadLen === 0x7e) { // If the value is 126, the following two bytes should be recognized as a 16-bit binary number indicating the size of the data content; console.log("payload length: ", buffer.readInt16BE(offset)); // If the length is 126 bytes, the next two bytes are the payload length, a 32-bit mask MASK = buffer.slice(offset + 2, offset + 2 + 4); offset += 6; } else { // If the value is 127, the following 8 bytes (64-bit) of content should be recognized as a 64-bit binary number representing the size of the data content MASK = buffer.slice(offset + 8, offset + 8 + 4); offset += 12; } // Start reading the payload, calculates the payload against the mask, and gets the original bytes const newBuffer = []; const dataBuffer = buffer.slice(offset); for (let i = 0, j = 0; i < dataBuffer.length; i++, j = i % 4) { const nextBuf = dataBuffer[i]; newBuffer.push(nextBuf ^ MASK[j]); } return Buffer.from(newBuffer).toString(); } return ""; } function constructReply(data) { const json = JSON.stringify(data); const jsonByteLength = Buffer.byteLength(json); // Currently, only loads smaller than 65535 bytes are supported const lengthByteCount = jsonByteLength < 126 ? 0 : 2; const payloadLength = lengthByteCount === 0 ? jsonByteLength : 126; const buffer = Buffer.alloc(2 + lengthByteCount + jsonByteLength); // Set the first byte of the data frame to 1, indicating the text frame buffer.writeUInt8(0b10000001.0); buffer.writeUInt8(payloadLength, 1); // If payloadLength is 126, the last two bytes should be recognized as a 16-bit binary number representing the size of the data content let payloadOffset = 2; if (lengthByteCount > 0) { buffer.writeUInt16BE(jsonByteLength, 2); payloadOffset += lengthByteCount; } // Write JSON data to the Buffer Buffer buffer.write(json, payloadOffset); return buffer; } module.exports = { generateAcceptValue, parseMessage, constructReply, }; Copy the code
In fact, the Server can also use SSE (Server-sent Events) in addition to WebSocket technology to push information to the browser. It allows the server to stream text messages to the client, such as live messages generated on the server. To achieve this, SSE designed two components: the EventSource API in the browser and the new “event-stream” data format (Text /event-stream). The EventSource allows the client to receive notifications pushed by the server in the form of DOM events, and the new data format is used to deliver each data update.
In fact, SSE provides an efficient, cross-browser implementation of XHR flows, with message delivery using only a long HTTP connection. However, unlike our own implementation of the XHR flow, the browser manages the connection for us, parses the message, and lets us focus only on the business logic. Space is limited, more details about SSE, Po brother will not be introduced, interested in SSE partners can consult relevant information.
Four, Po Ge has something to say
4.1 What is the relationship between WebSocket and HTTP
WebSocket is a different protocol than HTTP. Both are at the APPLICATION layer of the OSI model, and both rely on TCP at the transport layer. Although they are different, RFC 6455 states that WebSocket is designed to work on HTTP 80 and 443 ports and supports HTTP proxies and mediations, making it compatible with the HTTP protocol. For compatibility, the WebSocket handshake uses the HTTP Upgrade header, which is changed from the HTTP protocol to the WebSocket protocol.
Now that we’ve talked about the OSI (Open System Interconnection Model), here’s a graphic illustration of the OSI Model:
(photo: https://www.networkingsphere.com/2019/07/what-is-osi-model.html)
4.2 What is the difference between WebSocket and long polling
Long polling means that the client initiates a request. After receiving the request, the server does not respond directly. Instead, the server suspends the request and determines whether the requested data is updated. If there is an update, it responds. If there is no data, it waits a certain amount of time before returning.
The nature of long polling is still based on the HTTP protocol, and it is still a q&A (request-response) pattern. After a WebSocket handshake succeeds, it is a full-duplex TCP channel. Data can be sent from the server to the client.
4.3 What is WebSocket Heartbeat
SOCKET is used to receive and send data in the network. But if the socket is disconnected, there must be a problem sending and receiving data. But how do you know if this socket is still usable? This requires the creation of a heartbeat mechanism in the system. The so-called “heartbeat” is to periodically send a custom structure (heartbeat packet or heartbeat frame) to let the other party know that they are “online”. To ensure the effectiveness of the link.
A heartbeat packet is a simple message periodically sent by the client to the server telling it I’m still there. The code is to send a fixed message to the server every few minutes. The server replies with a fixed message. If the server does not receive a message from the client within a few minutes, the client is disconnected.
The WebSocket protocol defines the control frames for heartbeat Ping and heartbeat Pong:
- The heartbeat Ping frame contains the opcode 0x9. If a heartbeat Ping frame is received, the terminal must send a heartbeat Pong frame in response, unless a close frame has been received. Otherwise the terminal should reply to the Pong frame as soon as possible.
- The heartbeat Pong frame contains the opcode 0xA. The Pong frame sent in response must carry the entire “Application Data” field passed from the Ping frame. If an endpoint receives a Ping frame but does not send a Pong frame in response to the previous Ping frame, it can choose to send a Pong frame only for the most recently processed Ping frame. In addition, a Pong frame can be automatically sent, which is used as a one-way heartbeat.
4.4 What is a Socket
Two programs on the network through a two-way communication connection to achieve data exchange, one end of the connection is called a socket (socket), so the establishment of network communication connection at least a pair of port numbers. Socket is the encapsulation of THE TCP/IP protocol stack. It provides an interface for TCP or UDP programming, not another protocol. With sockets, you can use the TCP/IP protocol.
The original meaning of Socket in English is “hole” or “Socket”. As the process communication mechanism of BSD UNIX, take the latter meaning. A socket, also called a socket, is used to describe an IP address or port. It is a handle to a communication chain and can be used to communicate between different VMS or computers.
Hosts on the Internet generally run multiple service software and provide several services at the same time. Each service opens a Socket and is bound to a port. Different ports correspond to different services. A Socket, as its English meaning implies, is like a porous Socket. A mainframe is like a room full of sockets, each with a number, some with 220 volts AC, some with 110 volts AC, and some with cable TV programming. Customer software can plug into different numbers of sockets, can get different service. — Baidu Encyclopedia
The following points can be summarized about sockets:
- It can realize the underlying communication, almost all the application layer is through socket communication.
- The TCP/IP protocol is encapsulated to facilitate the invocation of application layer protocols. It is an intermediate abstraction layer between the two protocols.
- In the TCP/IP protocol family, there are two common protocols at the transport layer: TCP and UDP. The two protocols are different because the socket implementation process with different parameters is different.
The following figure illustrates the client/server relationship of a socket API for a connection-oriented protocol.
5. Reference resources
- Wikipedia – WebSocket
- MDN – WebSocket
- MDN – Protocol_upgrade_mechanism
- MDN – Write WebSocket server
- rfc6455
- The definitive guide to Web Performance
This article is formatted using MDNICE