This article was shared by the author “Po Brother”. The original title “WebSocket you didn’t know” has been revised and changed.
1, the introduction
This article will start from the basic concepts, technical principles, common mistakes common sense, hands-on practice and other aspects, ten thousand words long article, with you to explore the full range of WebSocket technology.
After reading this article, you will know the following:
- 1) Understand the birth background of WebSocket, what WebSocket is and its advantages;
- 2) Understand what apis WebSocket contains and how to use the WebSocket API to send plain text and binary data;
- 3) Familiar with WebSocket handshake protocol, data frame format, mask algorithm and other related knowledge;
- 4) Understand the relationship between WebSocket and HTTP, long polling, socket, etc., and sort out commonsense misunderstandings;
- 5) Understand how to implement a WebSocket server that supports sending plain text.
2. About the author
Author net name: A Bao ge
Personal blog: The whole stack of the road to immortality
Semlinker Github
3. What is WebSocket
3.1 Birth Background of WebSocket
In the early days, polling (also known as short polling) was used by many websites to implement push technology. Polling means that the browser sends HTTP requests to the server at regular intervals, and the server returns the latest data to the client.
Common polling methods are divided into polling and long polling, and their differences are shown in the following figure:
To get a better sense of the difference between polling and long polling, let’s look at the code:
The obvious disadvantage of this traditional pattern is that the browser is constantly making requests to the server. However, HTTP requests and responses may contain long headers in which only a small portion of the data is actually valid, thus consuming a lot of bandwidth resources.
PS: About short polling and long polling technology, you can read these two articles in detail: “Beginner’s Notes: The most complete Explanation of the principle of Web instant Messaging technology in history”, “Web instant messaging technology Review: short polling, Comet, Websocket, SSE”.
A relatively new polling technique is Comet. This technique allows two-way communication, but still requires repeated requests. Also, the HTTP long connections that are common in Comet can consume server resources.
In this case, HTML5 defines the WebSocket protocol, which can better save server resources and bandwidth, and can be more real-time communication.
Websockets use the Uniform resource Identifier (URI) of WS or WSS, where WSS represents websockets that use TLS.
Such as:
ws://echo.websocket.org
wss://echo.websocket.org
WebSocket uses the same TCP port as HTTP and HTTPS and can bypass most firewall restrictions.
By default:
- 1) WebSocket protocol uses port 80;
- 2) Port 443 is used by default when running over TLS.
3.2 introduction of WebSocket
WebSocket is a network transport protocol that enables full-duplex communication over a single TCP connection and is located at the application layer of the OSI model. The WebSocket protocol was standardized by THE IETF in 2011 into RFC 6455 and later supplemented by RFC 7936.
WebSocket makes it easier to exchange data between the client and the server, allowing the server to actively push data to the client. In the WebSocket API, the browser and server only need to complete a handshake to create a persistent connection and two-way data transfer between the two.
With Polling and WebSockets covered, let’s look at the differences between XHR Polling (short Polling) and WebSockets using a graph.
The differences between XHR Polling and WebSocket are shown in the following figure:
3.3 the WebSocket advantages
It is generally believed that WebSocket has the following advantages:
-
1) Less control overhead: when data is exchanged between the server and client after the connection is created, the packet header used for protocol control is relatively small;
-
2) Stronger real-time: because the protocol is full-duplex, the server can actively send data to the client at any time. Compared with HTTP requests that need to wait for the client to initiate the request before the server can respond, the latency is significantly less;
-
3) Maintain connection state: Different from HTTP, WebSocket needs to create a connection first, which makes it a stateful protocol, and part of the state information can be omitted when communicating later.
-
4) Better binary support: WebSocket defines binary frames, which can handle binary content more easily than HTTP;
-
5) Can support extension: WebSocket defines the extension, users can extend the protocol, to achieve part of the user-defined sub-protocol.
WebSocket has the advantages mentioned above, so it is widely used in instant messaging /IM, real-time audio and video, online education, games and other fields.
For front-end developers, if they want to use the powerful power WebSocket provides, they must first master the WebSocket API. Let’s take a look at the WebSocket API.
PS: If you want a more basic tutorial on how to get started with WebSocket, you can read this quick Start for Beginners: A Brief Tutorial on WebSocket and then come back.
4. Learn WebSocket API
4.1 Basic Information
Before introducing the WebSocket API, let’s take a look at its compatibility:
(Image from: caniuse.com/#search=Web…)
As you can see from the figure above, WebSocket is supported by all the major Web browsers, so it can be used with caution in most projects.
To use WebSocket capabilities in a browser, we must first create a WebSocket object that provides an API for creating and managing WebSocket connections and for sending and receiving data over that connection.
Using the WebSocket constructor, we can easily construct a WebSocket object.
We will introduce the WebSocket API from the following four aspects:
- 1) WebSocket constructor;
- 2) WebSocket object properties;
- 3) WebSocket method;
- 4) WebSocket events.
Let’s start with the WebSocket constructor.
PS: If you want a more basic tutorial on how to get started with WebSocket, you can read this quick Start for Beginners: A Brief Tutorial on WebSocket and then come back.
4.2 Constructors
The syntax for the WebSocket constructor is:
const myWebSocket = newWebSocket(url [, protocols]);
Related parameters are described as follows:
-
1) URL: represents the url of the connection, which is the URL that the WebSocket server will respond to;
-
2) Protocols (Optional) : A protocol string or an array containing protocol strings.
For point 2) : These strings are used to specify subprotocols so that a single server can implement multiple WebSocket subprotocols.
** For example: ** You might want a server to be able to handle different types of interactions based on the specified protocol. If no protocol string is specified, an empty string is assumed.
When using the WebSocket constructor, thrown when the port trying to connect is blocked
SECURITY_ERR
The exception.
PS: For a more detailed description of the WebSocket constructor, see the official API documentation.
4.3 attributes
The WebSocket object contains the following properties:
The specific meaning of each attribute is as follows:
-
1) binaryType: use binary data type connection;
-
2) bufferedAmount (read only) : number of bytes not sent to the server;
-
3) Extensions (read-only) : extensions selected by the server;
-
4) onCLOSE: used to specify the callback function after the connection is closed;
-
5) onError: used to specify the callback function after connection failure;
-
6) onMessage: used to specify the callback function when the message is received from the server;
-
7) onopen: used to specify the callback function after successful connection;
-
8) protocol (read-only) : used to return the name of the selected sub-protocol on the server side;
-
9) readyState (read-only) : Returns the connection state of the current WebSocket. There are four states:
-
-CONNECTING – The value is 0 when a connection is being established.
-
-OPEN – Is connected and can communicate. The corresponding value is 1.
-
-CLOSING – The connection is CLOSING, and the corresponding value is 2.
-
-CLOSED – The connection is CLOSED or failed, and the value is 3
-
10) URL (read-only) : The return value is the absolute path to the URL when the constructor creates the WebSocket instance object.
4.4 methods
There are two main WebSocket methods:
-
1) close([code[, reason]]) : This method is used to close WebSocket connections. If the connection is already closed, this method does not perform any operations.
-
2) Send (data) : This method queues the data that needs to be sent to the server over a WebSocket link and increases the bufferedAmount value based on the size of the data that needs to be sent. If the data cannot be transferred (for example, if the data needs to be cached and the buffer is full), the socket closes itself.
4.5 event
Use addEventListener() or assign an event listener to the OnEventName property of the WebSocket object to listen for the following events.
Here are a few events:
-
1) close: triggered when a WebSocket connection is closed, which can also be set through the onclose attribute;
-
2) Error: Triggered when a WebSocket connection is closed due to an error, which can also be set by the onError property;
-
3) Message: triggered when data is received through WebSocket, which can also be set through the onMessage attribute;
-
4) Open: Triggered when a WebSocket connection is successful, which can also be set through the onopen property.
After introducing the WebSocket API, let’s take an example of sending plain text using WebSocket.
4.6 Code practice: Send plain text
In the example above, we created two Textareas on the page, one for the data to be sent and one for the data returned by the server. When the user finishes typing the text to be sent, clicking the send button sends the input text to the server, and the server successfully receives the message and sends it back to the client intact.
// const socket = new WebSocket(“ws://echo.websocket.org”);
// const sendMsgContainer = document.querySelector(“#sendMessage”);
function send() {
const message = sendMsgContainer.value;
if(socket.readyState ! == WebSocket.OPEN) {
Console. log(” Connection not established, cannot send message yet “);
return;
}
if(message) socket.send(message);
}
Of course, after the client receives the message returned by the server, the corresponding text content will be saved in the textarea text box corresponding to the received data.
// const socket = new WebSocket(“ws://echo.websocket.org”);
// const receivedMsgContainer = document.querySelector(“#receivedMessage”);
socket.addEventListener(“message”, function(event) {
console.log(“Message from server “, event.data);
receivedMsgContainer.value = event.data;
});
To get a more intuitive understanding of the data interaction process, let’s take a look at the corresponding process using Chrome’s developer tools.
As shown below:
The complete code for the above example is as follows:
<metacharset=”UTF-8″/>
< metaname = “viewport” content = “width = device – width, initial – scale = 1.0” / >
Example of WebSocket sending plain text. Block {flex: 1; }
Example of WebSocket sending plain text
<divstyle=”display: flex;” >
<divclass=”block”>
Data to be sent: Send
<textareaid=”sendMessage”rows=”5″cols=”15″>
<divclass=”block”>
Received data:
<textareaid=”receivedMessage”rows=”5″cols=”15″>
In addition to sending plain text, WebSocket also supports sending binary data, such as ArrayBuffer, Blob, or ArrayBufferView objects.
A code example is as follows:
const socket = new WebSocket(“ws://echo.websocket.org”);
socket.onopen = function() {
// Send utF-8 encoded text messages
socket.send(“Hello Echo Server!” );
// Send utF-8 encoded JSON data
Socket.send (json.stringify ({MSG: “I am a bog “}));
// Send the binary ArrayBuffer
const buffer = newArrayBuffer(128);
socket.send(buffer);
// Send binary ArrayBufferView
const intview = new Uint32Array(buffer);
socket.send(intview);
// Send binary Blob
const blob = new Blob([buffer]);
socket.send(blob);
};
After the above code runs successfully, we can see the corresponding data interaction process through Chrome developer tools.
As shown below:
Here is an example of how to send binary data using sending Blob objects.
Blob (Binary Large Object) represents a Large Object of Binary type. In a database management system, the storage of binary data as a single collection of individuals. Blobs are usually video, sound, or multimedia files. Blob objects in JavaScript represent immutable file-like raw data.
If you’re interested in bloBs, you can read bloBs you Don’t know.
4.7 Code practice: send binary data
In the example above, we created two Textareas on the page, one for the data to be sent and one for the data returned by the server.
When the user clicks the Send button after typing the text to be sent, we take the input text, wrap it as a Blob object and send it to the server, which receives the message successfully and sends it back to the client intact.
When the browser receives a new message, it automatically converts it to a DOMString object if it is text data. If it is binary data or Blob object, it directly forwards it to the application for processing according to the returned data type.
Data sending code:
// const socket = new WebSocket(“ws://echo.websocket.org”);
// const sendMsgContainer = document.querySelector(“#sendMessage”);
function send() {
const message = sendMsgContainer.value;
if(socket.readyState ! == WebSocket.OPEN) {
Console. log(” Connection not established, cannot send message yet “);
return;
}
const blob = newBlob([message], { type: “text/plain”});
if(message) socket.send(blob);
Console. log(‘ Number of bytes not sent to server: ${socket.bufferedAmount} ‘);
}
When the client receives the message from the server, it determines the data type returned and calls the Blob object’s text() method to retrieve the Blob object’s UTF-8 content. The corresponding text content is then saved into the textarea text box corresponding to the received data.
Data receiving code:
// const socket = new WebSocket(“ws://echo.websocket.org”);
// const receivedMsgContainer = document.querySelector(“#receivedMessage”);
socket.addEventListener(“message”, async function(event) {
console.log(“Message from server “, event.data);
const receivedData = event.data;
if(receivedData instanceofBlob) {
receivedMsgContainer.value = await receivedData.text();
} else{
receivedMsgContainer.value = receivedData;
}
});
Again, let’s take a look at the process using Chrome’s developer tools:
It is clear from the above figure that when sending a Blob object, the Data field displays the Binary Message, whereas when sending plain text, the Data field displays the text Message sent directly.
The complete code for the above example is as follows:
< meta name = “viewport” content = “width = device – width, initial – scale = 1.0” / >
WebSocket sends binary data. Block {flex: 1; }
Example of WebSocket sending binary data
Data to be sent: Send
Received data:
There may be some friends who are not satisfied with the WebSocket API after learning about it. The following will take you to implement a WebSocket server to support sending plain text.
5, handwritten WebSocket server
Write 5.1 first
Before we introduce how to write a WebSocket server, we need to understand the life cycle of WebSocket connections.
The figure shows that the client and server need a Handshake before using WebSocket for full-duplex communication. Two-way data communication can be started only after the Handshake is completed.
The handshake is after the communication circuit is created and before the transmission of information begins.
A handshake is used to achieve parameters such as:
- 1) Information transmission rate
- 2) Alphabet
- 3) Parity check
- 4) Interrupt the process;
- 5) Other protocol features.
Handshakes help systems or devices of different structures to connect in a communication channel without the need to manually set parameters.
Since the handshake is the first part of the WebSocket connection life cycle, let’s first examine the WebSocket handshake protocol.
5.2 Handshake Protocol
WebSocket is an application-layer protocol that relies on TCP at the transport layer. WebSocket uses the 101 status code of the HTTP/1.1 protocol for handshake. To create a WebSocket connection, a request is made through the browser, and the server responds, a process often referred to as “Handshaking.”
Using HTTP to complete a handshake has several benefits:
-
1) First: make WebSocket compatible with the existing HTTP infrastructure — so that WebSocket servers can run on ports 80 and 443, which are usually the only ports open to clients;
-
2) Second: let us reuse and extend the HTTP Upgrade stream to add a custom WebSocket header to complete the negotiation.
Let’s take a closer look at the handshake process using the example of sending plain text shown earlier.
5.2.1) Client request:
The GET ws://echo.websocket.org/ HTTP / 1.1
Host: echo.websocket.org
Origin: file://
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: Zx8rNEkBE4xnwifpuh8DHQ==
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
** Note: ** some HTTP request headers have been ignored.
The fields in the above request are described as follows:
-
1) Connection: Upgrade must be set, indicating that the client wants to connect to Upgrade;
-
2) Upgrade: websocket must be set in the field, indicating that the websocket protocol is expected to be upgraded;
-
3) sec-websocket-version: indicates the supported WebSocket Version. RFC6455 requires version 13 to be used, and previous drafts should be deprecated.
-
4) sec-websocket-key: is a random string that is used by the server to construct a summary of sha-1 information.
-
5) Sec-websocket-extensions: used to negotiate WebSocket Extensions for this connection. The client sends supported Extensions, and the server confirms that it supports one or more Extensions by returning the same header.
-
6) Origin: The field is optional and is usually used to indicate the page from which this WebSocket connection was initiated in the browser, similar to Referer. However, unlike Referer, Origin contains only the protocol and host name.
** With reference to point 4 above: ** puts”
Sec-websocket-key adds a special string “258eAFa5-E914-47DA-95CA-C5AB0DC85B11”, calculates the SHA-1 digest, and then Base64 encodes it, Returns the result as the value of the sec-websocket-accept header to the client. In this way, ordinary HTTP requests are avoided from being mistaken for WebSocket protocols.
5.2.2) Server response:
HTTP/1.1 101 Web Socket Protocol Handshake ①
Connection: Upgrade (2)
Upgrade: websocket (3)
The Sec – WebSocket – Accept: 52 rg3vw4jq1ywpkvflstsiezlqw = (4)
** Note: ** Some HTTP response headers have been ignored.
The fields in the response are described as follows:
-
The 101 response code is upgraded to the WebSocket protocol.
-
(2) Set the Connection header to “Upgrade” to indicate that this is an Upgrade request. (THE HTTP protocol provides a special mechanism for upgrading an established Connection to a new, incompatible protocol.)
-
③ The Upgrade header specifies one or more protocol names, which are sorted by priority and separated by commas. This indicates that the WebSocket protocol is upgraded.
-
(4) Support key value verification protocols for signatures.
After introducing the WebSocket handshake protocol, we will use Node.js to develop our WebSocket server.
5.3 Enabling the Handshake Function
To develop a WebSocket server, we first need to implement the handshake function. Here I use the built-in HTTP module of Node.js to create an HTTP server.
The specific code is as follows:
const http = require(“http”);
const port = 8888;
const { generateAcceptValue } = require(“./util”);
const server = http.createServer((req, res) => {
res.writeHead(200, { “Content-Type”: “text/plain; charset=utf-8”});
Res.end (” Hello, my name is Po Ge. Thank you for reading “WebSocket you Don’t know” “);
});
server.on(“upgrade”, function(req, socket) {
if(req.headers[“upgrade”] ! == “websocket”) {
Socket. End (” HTTP / 1.1 400 Bad Request “);
return;
}
// Read sec-websocket-key provided by the client
const secWsKey = req.headers[“sec-websocket-key”];
// Generate sec-websocket-Accept using the SHA-1 algorithm
const hash = generateAcceptValue(secWsKey);
// Set the HTTP response header
const responseHeaders = [
“HTTP/1.1 101 Web Socket Protocol Handshake”,
“Upgrade: WebSocket”,
“Connection: Upgrade”,
`Sec-WebSocket-Accept: ${hash}`,
];
// Returns the response information for the handshake request
socket.write(responseHeaders.join(“\r\n”) + “\r\n\r\n”);
});
server.listen(port, () =>
console.log(`Server running at http://localhost:${port}\`)
);
** In the above code: ** We first introduce the HTTP module, then create an HTTP server by calling the module’s createServer() method, and then we listen for the Upgrade event, which is triggered each time the server responds to an upgrade request. Since our server only supports upgrading to WebSocket, if the client requests upgrading to a protocol other than WebSocket, we will return “400 Bad Request”.
When the server receives a WebSocket handshake request, it takes the value of “sec-websocket-key” from the request header and adds the value to the special string “258eAFa5-e914-47DA-95CA-C5AB0DC85b11”. The SHA-1 digest is then computed and Base64 encoded, returning the result as the value of the sec-websocket-Accept header to the client.
The above process may seem a bit tedious, but with the built-in crypto module of Node.js, you can do it in a few lines of code.
The code is as follows:
// util.js
const crypto = require(“crypto”);
const MAGIC_KEY = “258EAFA5-E914-47DA-95CA-C5AB0DC85B11”;
function generateAcceptValue(secWsKey) {
return crypto
.createHash(“sha1”)
.update(secWsKey + MAGIC_KEY, “utf8”)
.digest(“base64”);
}
Once you have developed the handshake feature, you can test it using the previous example. Once the server is started, we can verify functionality by simply adjusting the “send plain text” example by replacing the previous URL address with ws://localhost:8888.
Guys who are interested can give it a try, and here is the result of my local run:
As you can see from the image above, the handshake function we implemented already works. So is it possible to fail a handshake? The answer is yes. Such as network problems, server exceptions, or incorrect value of SEC-websocket-Accept.
Let’s change the “sec-websocket-accept” generation rules, such as MAGIC_KEY, and re-verify the handshake.
The browser console will output the following exception message:
WebSocket connection to ‘ws://localhost:8888/’failed: Error during WebSocket handshake: Incorrect ‘Sec-WebSocket-Accept’header value
If your WebSocket server supports subprotocols, you can refer to the following code for handling subprotocols, without further elaboration here.
// Read the subprotocol from the request header
const protocol = req.headers[“sec-websocket-protocol”];
// If a subprotocol is included, the subprotocol is resolved
const protocols = ! protocol ? [] : protocol.split(“,”).map((s) => s.trim());
// For simplicity, we only check if there is a JSON subprotocol
if(protocols.includes(“json”)) {
responseHeaders.push(`Sec-WebSocket-Protocol: json`);
}
Ok, the WebSocket handshake protocol is pretty much covered. Next, let’s cover some of the basics you need to know to develop messaging capabilities.
5.4 Message Communication Basics
In the WebSocket protocol, data is transmitted through a series of data frames.
To avoid network mediations (such as intercepting proxies) or security issues, the client must add masks to all frames it sends to the server. When a server receives a frame without an added mask, it must immediately close the connection.
5.4.1) Data frame format:
To implement message communication, we need to understand the format of WebSocket data frames:
Some of you may be a little confused after reading the above.
Let’s further analyze it in combination with actual data frames:
In the figure above: A brief analysis of the data frame format corresponding to the “Send plain text” example. Payload Length Payload Length Payload Length Payload Length Payload length Payload length Payload length Payload length
Payload Length Indicates the length of Payload data in bytes.
It has the following cases:
- 1) If the value is 0-125, it represents the length of the load data;
- 2) If it is 126, then the next 2 bytes are interpreted as 16-bit unsigned integers as the length of the payload data;
- 3) If it is 127, then the next 8 bytes are interpreted as a 64-bit unsigned integer (the highest bit must be 0) as the length of the payload data.
Note: The multi-byte length metric is expressed in network byte order, and the payload length refers to the length of “extended data” + “application data”. The length of “extended data” may be 0, so the payload length is the length of “application data”.
In addition: The length of Extended Data is 0 bytes unless an extension has been negotiated. In the handshake protocol, any extension must specify the length of the “extension data”, how the length is computed, and how the extension is to be used. If there is an extension, this “extension data” is included in the total payload length.
PS: For a detailed explanation of data frame format, you can read the following articles in depth:
- WebSocket from beginner to Master, half an hour is enough!
- Theory and Practice: Understanding Communication Principle, Protocol Format and Security of WebSocket from Zero
5.4.2) Mask algorithm:
The mask field is a 32-bit value randomly selected by the client. The mask value must be unpredictable. Therefore, the mask must come from a strong entropy source, and the given mask does not make it easy for the server or agent to predict subsequent frames. The unpredictability of masks is critical to preventing authors of malicious applications from exposing relevant byte data on the web.
The mask does not affect the length of the data load, and the steps involved in the operation of the mask and the operation of the inverse mask are the same.
The following algorithms are used for mask and inverse mask operations:
j = i MOD 4
transformed-octet-i = original-octet-i XOR masking-key-octet-j
Explain:
-
1) original-octet-i: indicates the i-th byte of the original data;
-
2) Transformed – OCteT-i: the i-th byte of the transformed data;
-
3) masking-key-octet-j: indicates the JTH byte of the mask key.
In order to make friends better understand the calculation process of the above mask, let’s mask the data of “I am Po Brother” in the example.
The utF-8 encoding for “I am Po Brother” is as follows:
E6 88 91 E6 98 AF E9 98 BF E5 AE 9D E5 93 A5
The corresponding Masking-Key is 0x08F6EFB1.
According to the above algorithm, we can perform the mask operation as follows:
Let uint8 = new Uint8Array([0xE6, 0x88, 0x91, 0xE6, 0x98, 0xAF, 0xE9, 0x98,0xBF, 0xE5, 0xAE, 0x9D, 0x93, 0xA5]);
let maskingKey = new Uint8Array([0x08, 0xf6, 0xef, 0xb1]);
let maskedUint8 = new Uint8Array(uint8.length);
for(let i = 0, j = 0; i < uint8.length; i++, j = i % 4) {
maskedUint8[i ] = uint8[i ] ^ maskingKey[j];
}
console.log(Array.from(maskedUint8).map(num=>Number(num).toString(16)).join(‘ ‘));
After the above code runs successfully, the console prints the following:
ee 7e 7e 57 90 59 6 29 b7 13 41 2c ed 65 4a
The above results are consistent with the values corresponding to Masked payload in WireShark, as shown in the following figure:
In WebSocket protocol, the data mask is used to enhance the security of the protocol. However, the data mask is not to protect the data itself, because the algorithm itself is public and the operation is not complicated.
So why introduce a data mask? Data masks were introduced to prevent problems such as proxy cache contamination attacks that existed in earlier versions of the protocol.
After understanding the WebSocket mask algorithm and the role of the data mask, we will introduce the concept of data sharding.
5.4.3) Data fragmentation:
Each WebSocket message may be split into multiple data frames. When the WebSocket receiver receives a data frame, it determines whether the last data frame of the message has been received according to the VALUE of the FIN.
With FIN and Opcode, we can send messages across frames.
The opcode tells the frame what to do:
- 1) If it is 0x1, the payload is text;
- 2) If it is 0x2, the payload is binary data;
- 3) If it is 0x0, the frame is a continuation frame (meaning that the server should connect the frame’s payload to the last frame received from the client).
To give you a better understanding of the above, let’s take a look at one fromMDNExamples of:
Client: FIN=1, opcode=0x1, msg=”hello”
Server: (process complete message immediately) Hi.
Client: FIN=0, opcode=0x1, msg=”and a”
Server: (listening, newmessage containing text started)
Client: FIN=0, opcode=0x0, msg=”happy new”
Server: (listening, payload concatenated to previous message)
Client: FIN=1, opcode=0x0, msg=”year!”
Server: (process complete message) Happy newyear to you too!
In the example above: The client sends two messages to the server, the first in a single frame and the second across three frames.
Where: the first message is a complete message (FIN=1 and opcode! = 0x0), so the server can process or respond as needed. The second message is a text message (OpCode =0x1) with FIN=0, indicating that the message has not yet been sent and that there are subsequent data frames. All remaining parts of the message are sent with a continuation frame (OpCode =0x0), and the final frame of the message is marked with FIN=1.
Ok, a brief introduction to data sharding. Next, let’s start implementing message communication capabilities.
5.5 Realizing message Communication
The author divides the message communication function into two sub-functions: message parsing and message response. Here we introduce how to implement these two sub-functions respectively.
5.5.1) Message parsing:
Using the relevant knowledge introduced in the message communication basics section, I implemented a parseMessage function, which is used to parse WebSocket data frames from the client.
For simplicity, only text frames are processed, as follows:
function parseMessage(buffer) {
// The first byte contains the FIN bit, opcode, and mask bit
const firstByte = buffer.readUInt8(0);
// [FIN, RSV, RSV, RSV, OPCODE, OPCODE, OPCODE, OPCODE];
// Right shift 7 bits to get the first digit, 1 bit to indicate whether the data is the last frame
const isFinalFrame = Boolean((firstByte >>> 7) & 0x01);
console.log(“isFIN: “, isFinalFrame);
// Retrieve the opcode
/ * *
* %x0: indicates a continuation frame. When Opcode is 0, it indicates that the data transmission adopts a data fragment, and the received data frame is one of the data fragments.
* %x1: indicates a text frame;
* %x2: indicates a binary frame;
* %x3-7: Reserved operation code for later defined non-control frames;
* %x8: The connection is down.
* %x9: indicates a heartbeat request (ping).
* %xA: this is a heartbeat response (PONG);
* % xb-f: Reserved operation code for later defined control frames.
* /
const opcode = firstByte & 0x0f;
if(opcode === 0x08) {
// The connection is closed
return;
}
if(opcode === 0x02) {
// Binary frame
return;
}
if(opcode === 0x01) {
// Currently only text frames are processed
let offset = 1;
const secondByte = buffer.readUInt8(offset);
// MASK: 1 bit, indicating whether a MASK is used. The MASK must be used in the data frame sent to the server, but the server does not need the MASK when it returns
const useMask = Boolean((secondByte >>> 7) & 0x01);
console.log(“use MASK: “, useMask);
const payloadLen = secondByte & 0x7f; // The lower 7 bits represent the payload byte length
offset += 1;
// A four-byte mask
let MASK = [];
// If the value is between 0 and 125, then the next four bytes (32 bits) should be identified directly as a mask;
if(payloadLen <= 0x7d) {
// Load length less than 125
MASK = buffer.slice(offset, 4 + offset);
offset += 4;
console.log(“payload length: “, payloadLen);
} elseif(payloadLen === 0x7e) {
// If the value is 126, the following two bytes should be recognized as a 16-bit binary number indicating the size of the data content;
console.log(“payload length: “, buffer.readInt16BE(offset));
// If the length is 126 bytes, the next two bytes are the payload length, a 32-bit mask
MASK = buffer.slice(offset + 2, offset + 2 + 4);
offset += 6;
} else{
// If the value is 127, the following 8 bytes (64-bit) of content should be recognized as a 64-bit binary number representing the size of the data content
MASK = buffer.slice(offset + 8, offset + 8 + 4);
offset += 12;
}
// Start reading the payload, calculates the payload against the mask, and gets the original bytes
const newBuffer = [];
const dataBuffer = buffer.slice(offset);
for(let i = 0, j = 0; i < dataBuffer.length; i++, j = i % 4) {
const nextBuf = dataBuffer[i ];
newBuffer.push(nextBuf ^ MASK[j]);
}
return Buffer.from(newBuffer).toString();
}
return “”;
}
After creating the parseMessage function, let’s update the WebSocket server we created earlier:
server.on(“upgrade”, function(req, socket) {
socket.on(“data”, (buffer) => {
const message = parseMessage(buffer);
if(message) {
console.log(“Message from client:”+ message);
} elseif(message === null) {
console.log(“WebSocket connection closed by the client.”);
}
});
if(req.headers[“upgrade”] ! == “websocket”) {
Socket. End (” HTTP / 1.1 400 Bad Request “);
return;
}
// omit existing code
});
After the update is complete, we restart the server and continue to test the message parsing capability with the “Send plain text” example.
The WebSocket server outputs the following message after sending the “I am Po Brother” text message:
Server running at http://localhost:8888
isFIN: true
use MASK: true
payload length: 15
Message from client: This is Po Ge
By observing the above output information, our WebSocket server can successfully parse the client to send a data frame containing ordinary text, next we implement the message response function.
5.5.2) Message Response:
To return data to the client, our WebSocket server also wraps the data in the format of WebSocket data frames.
As with the parseMessage function introduced earlier, I’ve also wrapped a constructReply function to encapsulate the returned data.
The code for this function is as follows:
function constructReply(data) {
const json = JSON.stringify(data);
const jsonByteLength = Buffer.byteLength(json);
// Currently, only loads smaller than 65535 bytes are supported
const lengthByteCount = jsonByteLength < 126 ? 0:2;
const payloadLength = lengthByteCount === 0 ? jsonByteLength : 126;
const buffer = Buffer.alloc(2 + lengthByteCount + jsonByteLength);
// Set the first byte of the data frame to 1, indicating the text frame
buffer.writeUInt8(0b10000001, 0);
buffer.writeUInt8(payloadLength, 1);
// If payloadLength is 126, the last two bytes should be recognized as a 16-bit binary number representing the size of the data content
let payloadOffset = 2;
if(lengthByteCount > 0) {
buffer.writeUInt16BE(jsonByteLength, 2);
payloadOffset += lengthByteCount;
}
// Write JSON data to the Buffer Buffer
buffer.write(json, payloadOffset);
return buffer;
}
After creating the constructReply function, let’s update the WebSocket server we created earlier:
server.on(“upgrade”, function(req, socket) {
socket.on(“data”, (buffer) => {
const message = parseMessage(buffer);
if(message) {
console.log(“Message from client:”+ message);
// Add the following code
socket.write(constructReply({ message }));
} elseif(message === null) {
console.log(“WebSocket connection closed by the client.”);
}
});
});
Now that our WebSocket server is developed, let’s fully verify its functionality.
As you can see from the figure above, the simple WebSocket server developed above can handle normal text messages.
Finally, let’s look at the complete code.
Custom – websocket server. Js file:
const http = require(“http”);
const port = 8888;
const { generateAcceptValue, parseMessage, constructReply } = require(“./util”);
const server = http.createServer((req, res) => {
res.writeHead(200, { “Content-Type”: “text/plain; charset=utf-8”});
Res.end (” Hello, my name is Po Ge. Thank you for reading “WebSocket you Don’t know” “);
});
server.on(“upgrade”, function(req, socket) {
socket.on(“data”, (buffer) => {
const message = parseMessage(buffer);
if(message) {
console.log(“Message from client:”+ message);
socket.write(constructReply({ message }));
} else if(message === null) {
console.log(“WebSocket connection closed by the client.”);
}
});
if(req.headers[“upgrade”] ! == “websocket”) {
Socket. End (” HTTP / 1.1 400 Bad Request “);
return;
}
// Read sec-websocket-key provided by the client
const secWsKey = req.headers[“sec-websocket-key”];
// Generate sec-websocket-Accept using the SHA-1 algorithm
const hash = generateAcceptValue(secWsKey);
// Set the HTTP response header
const responseHeaders = [
“HTTP/1.1 101 Web Socket Protocol Handshake”,
“Upgrade: WebSocket”,
“Connection: Upgrade”,
`Sec-WebSocket-Accept: ${hash}`,
];
// Returns the response information for the handshake request
socket.write(responseHeaders.join(“\r\n”) + “\r\n\r\n”);
});
server.listen(port, () =>
console.log(`Server running at http://localhost:${port}\`)
);
Util. Js file:
const crypto = require(“crypto”);
const MAGIC_KEY = “258EAFA5-E914-47DA-95CA-C5AB0DC85B11”;
function generateAcceptValue(secWsKey) {
return crypto
.createHash(“sha1”)
.update(secWsKey + MAGIC_KEY, “utf8”)
.digest(“base64”);
}
function parseMessage(buffer) {
// The first byte contains the FIN bit, opcode, and mask bit
const firstByte = buffer.readUInt8(0);
// [FIN, RSV, RSV, RSV, OPCODE, OPCODE, OPCODE, OPCODE];
// Right shift 7 bits to get the first digit, 1 bit to indicate whether the data is the last frame
const isFinalFrame = Boolean((firstByte >>> 7) & 0x01);
console.log(“isFIN: “, isFinalFrame);
// Retrieve the opcode
/ * *
* %x0: indicates a continuation frame. When Opcode is 0, it indicates that the data transmission adopts a data fragment, and the received data frame is one of the data fragments.
* %x1: indicates a text frame;
* %x2: indicates a binary frame;
* %x3-7: Reserved operation code for later defined non-control frames;
* %x8: The connection is down.
* %x9: indicates a heartbeat request (ping).
* %xA: this is a heartbeat response (PONG);
* % xb-f: Reserved operation code for later defined control frames.
* /
const opcode = firstByte & 0x0f;
if(opcode === 0x08) {
// The connection is closed
return;
}
if(opcode === 0x02) {
// Binary frame
return;
}
if(opcode === 0x01) {
// Currently only text frames are processed
let offset = 1;
const secondByte = buffer.readUInt8(offset);
// MASK: 1 bit, indicating whether a MASK is used. The MASK must be used in the data frame sent to the server, but the server does not need the MASK when it returns
const useMask = Boolean((secondByte >>> 7) & 0x01);
console.log(“use MASK: “, useMask);
const payloadLen = secondByte & 0x7f; // The lower 7 bits represent the payload byte length
offset += 1;
// A four-byte mask
let MASK = [];
// If the value is between 0 and 125, then the next four bytes (32 bits) should be identified directly as a mask;
if(payloadLen <= 0x7d) {
// Load length less than 125
MASK = buffer.slice(offset, 4 + offset);
offset += 4;
console.log(“payload length: “, payloadLen);
} else if(payloadLen === 0x7e) {
// If the value is 126, the following two bytes should be recognized as a 16-bit binary number indicating the size of the data content;
console.log(“payload length: “, buffer.readInt16BE(offset));
// If the length is 126 bytes, the next two bytes are the payload length, a 32-bit mask
MASK = buffer.slice(offset + 2, offset + 2 + 4);
offset += 6;
} else{
// If the value is 127, the following 8 bytes (64-bit) of content should be recognized as a 64-bit binary number representing the size of the data content
MASK = buffer.slice(offset + 8, offset + 8 + 4);
offset += 12;
}
// Start reading the payload, calculates the payload against the mask, and gets the original bytes
const newBuffer = [];
const dataBuffer = buffer.slice(offset);
for(let i = 0, j = 0; i < dataBuffer.length; i++, j = i % 4) {
const nextBuf = dataBuffer[i ];
newBuffer.push(nextBuf ^ MASK[j]);
}
return Buffer.from(newBuffer).toString();
}
return “”;
}
function constructReply(data) {
const json = JSON.stringify(data);
const jsonByteLength = Buffer.byteLength(json);
// Currently, only loads smaller than 65535 bytes are supported
const lengthByteCount = jsonByteLength < 126 ? 0:2;
const payloadLength = lengthByteCount === 0 ? jsonByteLength : 126;
const buffer = Buffer.alloc(2 + lengthByteCount + jsonByteLength);
// Set the first byte of the data frame to 1, indicating the text frame
buffer.writeUInt8(0b10000001, 0);
buffer.writeUInt8(payloadLength, 1);
// If payloadLength is 126, the last two bytes should be recognized as a 16-bit binary number representing the size of the data content
let payloadOffset = 2;
if(lengthByteCount > 0) {
buffer.writeUInt16BE(jsonByteLength, 2);
payloadOffset += lengthByteCount;
}
// Write JSON data to the Buffer Buffer
buffer.write(json, payloadOffset);
return buffer;
}
module.exports = {
generateAcceptValue,
parseMessage,
constructReply,
};
In fact, the Server can also use SSE (Server-sent Events) in addition to WebSocket technology to push information to the browser. It allows the server to stream text messages to the client, such as live messages generated on the server.
To achieve this, SSE designed two components: the EventSource API in the browser and the new “event-stream” data format (Text /event-stream). The EventSource allows the client to receive notifications pushed by the server in the form of DOM events, and the new data format is used to deliver each data update.
What SSE provides is an efficient, cross-browser implementation of XHR flows, with message delivery using only a long HTTP connection. However, unlike our own implementation of the XHR flow, the browser manages the connection for us, parses the message, and lets us focus only on the business logic. Space is limited, more details about SSE will not be introduced. Interested partners in SSE can read the following articles by themselves:
- Instant Messaging on the Web: Short Polling, Comet, Websocket, SSE
- SSE Technical Details: A new HTML5 Server Push event Technology
- Implementation of Web Side Message Push using WebSocket and SSE Technology
- Evolution of Web Side Communication: From Ajax and JSONP to SSE and Websocket
- Webside IM Communication Technology quick start: Short polling, long polling, SSE, WebSocket
- WebSocket, Socket. IO, SSE
6, WebSocket learning process error prone common sense
6.1 What is the relationship between WebSocket and HTTP?
WebSocket is a different protocol than HTTP. Both are at the APPLICATION layer of the OSI model, and both rely on TCP at the transport layer.
Although they are different, RFC 6455 states that WebSocket is designed to work on HTTP 80 and 443 ports and supports HTTP proxies and mediations, making it compatible with the HTTP protocol. For compatibility, the WebSocket handshake uses the HTTP Upgrade header, which is changed from the HTTP protocol to the WebSocket protocol.
Now that we’ve talked about the OSI (Open System Interconnection Model), here’s a nice diagram of the OSI Model (see the diagram below).
(image references from: www.networkingsphere.com/2019/07/wha.)
Of course, the relationship between WebSocket and HTTP is obviously not clear enough. Interested readers can read the following two articles:
- WebSocket in Detail (4) : Getting to the bottom of HTTP and WebSocket relationship (part 1)
- WebSocket details (5) : Getting to the bottom of HTTP and WebSocket relationship (Part 2)
6.2 What is the difference between WebSocket and long polling?
Long polling: The client initiates a request. After receiving the request, the server does not respond directly. Instead, the server suspends the request and determines whether the requested data is updated. If there is an update, it responds. If there is no data, it waits a certain amount of time before returning.
The nature of long polling is still based on the HTTP protocol, and it is still a q&A (request-response) pattern. After a WebSocket handshake succeeds, it is a full-duplex TCP channel. Data can be sent from the server to the client.
To understand the difference between WebSocket and long polling, you need to have a deep understanding of the technical principles of long polling. It is recommended to read the technical descriptions of long polling in the following three articles:
- Comet Technology in Detail: Real-time Web Communication Technology Based on HTTP Long Connection
- Beginner’s Notes: The Most Complete Explanation of Instant Messaging Technology on the Web
- Instant Messaging on the Web: Short Polling, Comet, Websocket, SSE
- Webside IM Communication Technology quick start: Short polling, long polling, SSE, WebSocket
6.3 What Is WebSocket Heartbeat?
Socket is used to receive and send data in the network. But if the socket is disconnected, there must be a problem sending and receiving data.
But how do you know if this socket is still usable? This requires the creation of a heartbeat mechanism in the system.
The so-called “heartbeat” is to periodically send a custom structure (heartbeat packet or heartbeat frame) to let the other party know that they are “online” to ensure the validity of the link.
A heartbeat packet is a simple message periodically sent by the client to the server telling it I’m still there. The code is to send a fixed message to the server every few minutes. The server replies with a fixed message. If the server does not receive a message from the client within a few minutes, the client is disconnected.
The WebSocket protocol defines the control frames for heartbeat Ping and heartbeat Pong:
-
1) Heartbeat Ping frame contains opcode 0x9: If a heartbeat Ping frame is received, then the terminal must send a heartbeat Pong frame in response, unless a close frame has been received. Otherwise the terminal should reply to Pong frame as soon as possible;
-
2) The heartbeat Pong frame contains the opcode 0xA: The Pong frame sent in response must carry the entire “application data” field passed from the Ping frame.
For point 2) : If an endpoint receives a Ping frame but does not send a Pong frame in response to the previous Ping frame, then the endpoint can choose to send a Pong frame only for the most recently processed Ping frame. In addition, a Pong frame can be automatically sent, which is used as a one-way heartbeat.
PS: Here is a summary of WebSocket heartbeat in IM practice article, you can read “Web side IM practice dry: How to make your WebSocket down and reconnect faster?” .
6.4 What is a Socket?
Two programs on the network through a two-way communication connection to achieve data exchange, one end of the connection is called a Socket (Socket), so the establishment of network communication connection at least a pair of port numbers.
Socket essence: The Socket encapsulates the TCP/IP protocol stack and provides an interface for TCP or UDP programming rather than another protocol. With sockets, you can use the TCP/IP protocol.
The description of Socket on Baidu Encyclopedia is as follows:
Socket: BSD UNIX process communication mechanism, take the latter meaning. A socket, also called a socket, is used to describe an IP address or port. It is a handle to a communication chain and can be used to communicate between different VMS or computers.
Hosts on the Internet generally run multiple service software and provide several services at the same time. Each service opens a Socket and is bound to a port. Different ports correspond to different services. A Socket, as its English meaning implies, is like a porous Socket. A mainframe is like a room full of sockets, each with a number, some with 220 volts AC, some with 110 volts AC, and some with cable TV programming. Customer software can plug into different numbers of sockets, can get different service.
The following points can be summarized about sockets:
- 1) It can realize the underlying communication, almost all the application layer is through socket communication;
- 2) The TCP/IP protocol is encapsulated to facilitate the application layer protocol invocation, which belongs to the intermediate abstraction layer between the two;
- 3) In the TCP/IP protocol family, there are two general protocols at the transport layer: TCP and UDP. The two protocols are different, because the socket implementation process of different parameters is also different.
The following figure illustrates the client/server relationship of a socket API for a connection-oriented protocol:
PS: To say the relationship between WebSocket and Socket, this “WebSocket in detail (six) : Get to the bottom of the relationship between WebSocket and Socket” has been dedicated to detailed sharing, recommended reading.
7. Reference materials
[1] Quick start for beginners: WebSocket tutorial
[2] WebSocket from entry to master, half an hour is enough!
[3] Beginner’s Note: The most comprehensive explanation of the principles of Instant messaging technology on the Web
[4] Web instant messaging technology inventory: short polling, Comet, Websocket, SSE
[5] SSE technical details: A new HTML5 server push event technology
[6] Comet technology Description: Web end real-time communication technology based on HTTP long connection
[7] WebSocket: HTTP and WebSocket
[8] WebSocket details (5) : The root of HTTP and WebSocket relationship (part 2)
[9] WebSocket details (6) : Probing the relationship between WebSocket and Socket
[10] Web im Practice Dry Matter: How to make your WebSocket down and reconnect faster?
[11] Connecting theory with practice: Understanding the communication principle, protocol format and security of WebSocket from zero
[12] Introduction to WebSocket core: 200 lines of code to teach you how to handlehand a WebSocket server
[13] Webside IM communication technology quick introduction: short polling, long polling, SSE, WebSocket
[14] Understanding modern Web instant messaging technology is enough: WebSocket, socket. IO, SSE
This article has been simultaneously published at: www.52im.net/thread-3713…