It was briefly used in the project before, but the reason, advantages and principle of using it were unclear to the author before, so during this period, after a systematic understanding, I will record it here.
Without further ado, let’s cut to the chase. Probably most people don’t know what this protocol does when they learn about it, or why it is needed, so let’s start with the basics, a little bit of understanding.
Or stick a github address with the following code here.
1. Basic knowledge of WebSocket
1.1 WebSocket Introduction
WebSocket is a network transport protocol (HTTP) that communicates over a single TCP connection in full duplex, at the application layer of the OSI model.
WebSocket makes it easier to exchange data between the client and the server, allowing the server to push data to the client. In the WebSocket API, the browser and server only need to complete a handshake to create a persistent link and two-way data transfer between the two.
The WebSocket protocol specification defines WebSocket (WS) and WebSocket Secure (WSS) as two new Uniform Resource Identifiers, corresponding to plaintext and encrypted connections respectively.
1.2 Why is WebSocket needed
The biggest feature of WebSocket is that the server can actively push information to the client, and the client can also actively send information to the server.
Before this protocol, many web sites used polling to implement push technology. Polling is when the browser makes AN HTTP request to the server at a specific time interval, and the server returns the latest data to the client’s browser. This traditional pattern has obvious drawbacks, namely that the browser needs to make continuous requests to the server, whereas HTTP requests may contain long headers in which only a small portion of the data is actually valid, which obviously consumes a lot of bandwidth resources.
In this case, HTML5 defines the WebSocket protocol, which can better save server resources and bandwidth, and can be more real-time communication.
1.3 the WebSocket characteristics
- Based on TCP protocol, the implementation of the server side is relatively easy.
- It has good compatibility with HTTP protocol. The default ports are also 80 and 443, and the handshake phase uses HTTP protocol, so it is not easy to mask the handshake and can pass various HTTP proxy servers.
- The data format is relatively light, with low performance overhead and high communication efficiency.
- You can send text or binary data.
- There are no same-origin restrictions, and clients can communicate with any server.
- The protocol identifier is WS (or WSS if encrypted), and the server URL is the URL.
1.4 the WebSocket advantages
After the brief introduction of WebSocket above, I think we can also sum up some advantages of WebSocket, compared to HTTP further summary
-
Less control overhead: When data is exchanged between the server and client after the connection is created, the packet headers used for protocol control are relatively small. Without extensions, this header size is only 2 to 10 bytes (depending on packet length) for server-to-client content; For client-to-server content, an additional 4-byte mask is required for this header. This overhead is significantly reduced compared to HTTP requests that carry the full header each time.
-
Better real-time performance: Because the protocol is full-duplex, the server can actively send data to the client at any time. Compared to HTTP requests that require a client to initiate a request before the server can respond, the latency is significantly less
-
Keep the connection state: Websocket needs to create the connection first, which makes it a stateful protocol that can then communicate without some state information. HTTP requests, on the other hand, may need to carry status information (such as authentication) with each request.
-
Better binary support: Websocket defines binary frames, making it easier to handle binary content than HTTP.
Second, WebSocket advanced knowledge
To put it simply, the WebSocket protocol consists of two parts: the connection establishment process (handshake) and data transmission.
2.1 Establishing a Connection (Handshake)
In the first part of the introduction, we mentioned that WebSocket requires a handshake before creating a persistent connection, and that WebSocket uses HTTP handshake channels for compatibility. Specifically, the client negotiates the upgrade protocol with the WebSocket server through HTTP requests. After the protocol upgrade, the subsequent data exchange follows the WebSocket protocol.
Client: Upgrade the protocol version
First, the client initiates a protocol upgrade request. As shown in the following figure, the request adopts the standard HTTP packet format and supports only the GET method.
To highlight the significance of the four above:
Connection: Upgrade
: indicates that the protocol needs to be upgradedUpgrade: WebSocket
: indicates that the WebSocket protocol is to be upgradedSec-WebSocket-Key
And the following server-side response headerSec-WebSocket-Accept
It is compatible and provides basic protection, such as malicious or unintentional linksSec-WebSocket-Version: 13
: Indicates the WebSocket version. If the server does not support this version, you need to return oneSec-WebSocket-Version
theheader
Which contains the version numbers supported by the server
Note that the above request omits part of the unfocused request header. Since it is a standard HTTP request, the headers of requests such as Host, Origin, and Cookie are sent as usual. During the handshake phase, security restrictions and permission verification can be performed through the relevant request headers.
Server: responds to protocol upgrade
The status code 101 indicates protocol switchover. After the protocol upgrade is complete, subsequent data exchanges follow the new protocol.
The calculation of the Sec – WebSocket – Accept
Sec-websocket-accept Is calculated based on the sec-websocket-key in the header of the client request.
The calculation formula is:
Spliced sec-websocket-key with 258eAFa5-E914-47DA-95CA-C5AB0DC85B11. The digest is computed by SHA1 and converted to a Base64 string. The pseudocode is as follows:
>toBase64( sha1( Sec-WebSocket-Key + 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 ) )
Copy the code
2.2 Data Transfer
Once the WebSocket client and server establish a connection, subsequent operations are based on the transmission of data frames.
Because data frames are involved here, you can first look at the section 2.3 Data Frame Format.
1. Data sharding
Each WebSocket message may be split into multiple data frames. When the WebSocket receiver receives a data frame, it determines whether the last data frame of the message has been received according to the VALUE of the FIN.
FIN=1 indicates that the current data frame is the last data frame of the message. In this case, the receiver has received the complete message and can process the message. If FIN=0, the receiver needs to continue listening to receive other data frames.
In addition, opCode represents the type of data in the case of data exchange. 0x01 indicates text, and 0x02 indicates binary. 0x00 is special and represents a continuation frame, which, as the name suggests, means that the data frame corresponding to the complete message has not been received.
2. Examples of data sharding
It’s better to look at examples. The following example from MDN is a good example of data sharding. The client sends messages to the server twice. The server responds to the client after receiving the messages. This section describes the messages sent from the client to the server.
First message
FIN=1, indicating the last data frame of the current message. Once the server receives the current data frame, it can process the message. Opcode =0x1: indicates that the client sends a text message.
Second message
-
FIN=0, opCode =0x1, indicating that the message type is text, and the message has not been sent yet, and there are subsequent data frames.
-
FIN=0, opCode =0x0, indicating that the message has not been sent yet and there are subsequent data frames. The current data frame must be followed by the previous data frame.
-
FIN=1, opCode =0x0: Indicates that the message has been sent and no subsequent data frame is displayed. The current data frame must be followed by the previous one. The server can assemble the associated data frames into complete messages.
Client: FIN=1, opcode=0x1, msg="hello"
Server: (process complete message immediately) Hi.
Client: FIN=0, opcode=0x1, msg="and a"
Server: (listening, new message containing text started)
Client: FIN=0, opcode=0x0, msg="happy new"
Server: (listening, payload concatenated to previous message)
Client: FIN=1, opcode=0x0, msg="year!"
Server: (process complete message) Happy new year to you too!
Copy the code
2.3 Data frame format
The data exchange between client and server is inseparable from the definition of data frame format. So let’s take a look at WebSocket’s data frame format here.
The data exchange between client and server is inseparable from the definition of data frame format. So, before we actually talk about data exchange, let’s take a look at the data frame format of WebSocket.
The minimum unit of communication between WebSocket client and server is frame, which consists of one or more frames to form a complete message.
-
Sender: the message is cut into multiple frames and sent to the server;
-
Receiver: Receives message frames and reassembles the associated frames into complete messages;
The focus of this section is to explain the format of data frames.
1. Overview of data frame format
A unified format for WebSocket data frames is given below.
From left to right, in bits. For example, FIN and RSV1 each occupy one bit, and Opcode each occupies four bits. The content includes identification, operation code, mask, data, and data length.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |
+---------------------------------------------------------------+
Copy the code
2. Detailed explanation of data frame format
For the previous format overview diagram, you can refer to the protocol specification for field by field explanation.
FIN: 1 bit.
The value is 1, indicating that it was the last fragment of message. The value is 0, indicating that it was not the last fragment of message.
RSV1, RSV2, RSV3: each occupies one bit.
In general, they’re all 0’s. When the client and server negotiate to use WebSocket extension, the three flag bits can be non-0, and the meaning of the value is defined by the extension. If a non-zero value is present and the WebSocket extension is not used, the connection fails.
Opcode: 4 bits.
The value of Opcode determines how subsequent data payloads should be resolved. If the operation code is unknown, the receiver should fail the connection. Optional operation codes are as follows:
-
%x0: indicates a continuation frame. When Opcode is 0, data shards are used in data transmission, and the received data frame is one of the data shards.
-
%x1: indicates a text frame.
-
%x2: indicates a binary frame.
-
%x3-7: Reserved operation code for later defined non-control frames.
-
%x8: The connection is down.
-
%x9: indicates a ping operation.
-
%xA: indicates this is a PONG operation.
-
% xb-f: Reserved operation code for subsequent defined control frames.
Mask: 1 bit
Indicates whether to mask the data payload. When sending data from the client to the server, mask the data. When sending data from the server to the client, there is no need to mask the data.
If the data received by the server has not been masked, the server needs to disconnect the data.
If Mask is 1, Masking key is defined in Masking-key and used to Mask the data payload. Mask 1 is used for all data frames sent by the client to the server.
Payload Length: Indicates the length of the data Payload, in bytes. It is 7 bits, or 7+16 bits, or 1+64 bits
Payload length === x
-
If x ranges from 0 to 126, the length of data is x bytes.
-
X is 126: the next two bytes represent a 16-bit unsigned integer whose value is the length of the data.
-
X is 127: The next 8 bytes represent a 64-bit unsigned integer (highest bit 0) whose value is the length of the data.
In addition, if the payload length occupies more than one byte, the binary representation of the payload length is big endian.
Masking-key: 0 or 4 bytes (32 bits)
All data frames transmitted from the client to the server are masked with Mask 1 and 4-byte Masking key. If the Mask is 0, there is no Masking-key.
Note: Payload data length, excluding mask key length.
Payload data :(x+y) bytes
Load data: includes extended data and application data. Where, the extension data is x bytes, and the application data is Y bytes.
-
Extended data: 0 bytes of extended data if no extension is negotiated. All extensions must declare the length of the extended data, or how the length of the extended data can be calculated. In addition, how the extension will be used must be negotiated during the handshake phase. If the extended data exists, the payload data length must include the length of the extended data.
-
Application data: Any application data, after the extension data (if any), occupies the remaining space of the data frame. The length of the application data is obtained by subtracting the payload data length from the extension data length.
3. Masking algorithm Masking-key is a random 32-bit number selected by the client. The mask operation does not affect the length of the data payload. The following algorithms are used for mask and inverse mask operations:
First, assume:
-
Original-octet-i: indicates the I th byte of the original data.
-
Transformed -octet -I: indicates the i-th byte of the transformed data.
-
J: is the result of I mod 4.
-
Masking -key-octet-j: indicates the JTH byte of the mask key.
The algorithm is described as original-octet-i and masking-key-octet-j, and then transformed- OCtet-i is obtained.
j = i MOD 4
transformed-octet-i = original-octet-i XOR masking-key-octet-j
Copy the code
2.4 Connection Retention (Long Connection Retention)
WebSocket To maintain real-time bidirectional communication between the client and server, ensure that the TCP channel between the client and server is not disconnected. However, if a connection is maintained for a long time without data exchange, the connection resources may be wasted.
However, in some scenarios, the client and server need to be connected even though no data has been exchanged for a long time. At this point, a heartbeat can be used to achieve this.
- Sender -> Receiver:
ping
Receiver -> Sender: 'Pong', 'ping', 'pong operation', corresponding to the two control frames of WebSocket, 'opcode'
Is 0 x9 respectively,
0 xa `.
At the end of this part, two knowledge points are explained (without detailed explanation).
1. The function of sec-websocket-key /Accept is to provide basic protection and reduce malicious and unexpected connections. The functions are summarized as follows:
-
Prevent the server from receiving illegal WebSocket connections (for example, if an HTTP client accidentally requests to connect to the WebSocket service, the server can directly reject the connection)
-
Make sure the server understands websocket connections. Since the WS handshake phase uses HTTP, it is possible that the WS connection is processed and returned by an HTTP server, in which case the client can use sec-websocket-key to ensure that the server is aware of the WS protocol. (Not 100% safe, there are always boring HTTP servers, light sec-websocket-key, but no WS protocol…)
-
Sec-websocket-key and other related headers are disallowed when setting headers for ajax requests in the browser. This avoids websocket upgrade when the client sends ajax requests.
-
This prevents the reverse proxy (which does not understand the WS protocol) from returning incorrect data. For example, when the reverse proxy receives two ws connection upgrade requests, it returns the first one to the cache, and then returns the second one directly to the cache (meaningless return).
-
The main purpose of SEC-websocket-key is not to ensure data security, because the calculation formula of sec-websocket-key and SEC-websocket-accept conversion is public and very simple, and the main function is to prevent some common accidents (unintentional).
2. Functions of data mask:
In the WebSocket protocol, data masks are used to enhance protocol security (not to prevent data leaks, but to prevent proxy cache contamination attacks and other issues that existed in earlier versions of the protocol). . However, the data mask is not to protect the data itself, because the algorithm itself is public and the operation is not complicated. There don’t seem to be many effective ways to secure communications other than encrypting the channel itself.
WebSocket instance
3.1 Client code examples
<input type="text" id="sendTxt">
<button id="sendBtn"</button> <div id="recv"></div> <script> /** * The WebSocket object is used as a constructor to create a new instance of WebSocket * after executing the following statement, the client will connect to the server */let webSocket = new WebSocket("wss://echo.websocket.org"); /** * WebSocket instance object attributes and methods * 1. Attribute * 1.1 websocket. readyState (attribute returns the current state of the instance object) *. CONNECTING: A value of 0 indicates that a connection is being made. *. OPEN: the value is 1, indicating that the connection is successful and communication can be started. *. CLOSING: A value of 2 indicates that the connection is CLOSING. *. CLOSED: the value is 3, indicating that the connection is CLOSED or fails to be opened. */ /** * 1.2 websocket. onopen(used to specify the callback function if the connection is successful) */ websocket. onopen =function () {
console.log("webSocket open");
document.getElementById('recv').innerHTML = "Connected"; }; /** * 1.3 websocket. onclose(used to specify the callback function after the connection is closed) */ websocket. onclose =function () {
console.log("webSocket close"); } /** * 1.4 websocket.onMessage (used to specify the callback function after receiving server data) */ websocket.onMessage =function (e) {
console.log(e.data);
document.getElementById('recv').innerHTML = e.data; } // Send information document.getelementById ('sendBtn').onclick = function () {
var text = document.getElementById('sendTxt').value; /** * 2. Method * 2.1 websocket.send () (used to send data to the server) */ websocket.send (text); } </script>Copy the code
There is a brief introduction and use in the code above the client API. If you want to see more specific documentation, you can view it in MDN.
Compared to the implementation of the server side, the use of the client side is slightly easier, so we continue to implement the server side WebSocket.
3.2 Implementation of the server
Because the author is currently limited to JS, so the implementation of the server side is the use of Node, commonly used Node implementation has the following three:
-
(including web sockets
-
Socket.IO
-
WebSocket-Node
Since socket. IO is used in the project, I will explain the following based on my own experience. Other implementation methods should be similar, and you can implement some by yourself if you are interested.
IO to achieve two-way communication, of course, WebSocket is an essential technology, but socket. IO is not only the encapsulation of WebSocket, in the environment that does not support WebSocket, socket. IO also has a variety of polling solutions, to ensure that it can work properly.
Socket.IO (Socket.IO, Socket.IO, Socket. Because the official documents are all in English, the author found a Chinese document here. It is suggested to compare the two documents. If you have the ability, of course, it is better to read the full English document, because the content is more accurate.
Socket.IO
Socket.Io consists of two parts:
-
The socket. IO module is integrated into the HTTP server of Node.js
-
IO supports a variety of transport mechanisms, such as WebSocket, Adobe Flash Sockets, XHR polling, JsonP polling, which are isolated under a unified interface. This means that any browser can act as a client.
Note that standard WebSocket servers do not communicate directly with socket. Io clients.
1.1 introduction
Socket. IO is a WebSocket library that includes client-side JS and server-side NodeJS. Its goal is to build real-time applications that can be used in different browsers and mobile devices. It will automatically according to the browser from WebSocket, AJAX long polling, Iframe stream and so on a variety of ways to choose the best way to achieve real-time network applications, very convenient and user-friendly, and support the browser as low as IE5.5.
1.2 the use of
Node.js has a wide variety of frameworks, such as: Express, ThinkJS, Koa, egg.js, etc. Each of these frameworks may further encapsulate socket. IO. For example, the egg.js framework used by the author provides egg-socket. IO plug-ins. Therefore, for the use of the framework, the reader is still required to use according to the requirements of the documentation, because of this factor, so the reader is only introduced here in the case of using any framework.
The installation
$ npm install socket.io
Copy the code
Use the Node HTTP server
We’ll go straight to the code (basic) and follow the official documentation for the rest.
// index.html
<script src="./node_modules/socket.io-client/dist/socket.io.js"></script>
<script>
let socket = io('http://localhost');
socket.on('news', (data) => {// listen'news'Console. log(data) is output after the result is generated. socket.emit('my other event', {// trigger'my other event'My events:'data'
})
})
</script>
// app.js
let app = require('http').createServer(handler); // Create an Http service using Nodelet io = require('socket.io')(app); // Bind the server created abovelet fs = require('fs');
app.listen(80);
var handler = (req, res) => {
fs.readFile(__dirname + './index.html', (err, data) => {
if (err) {
res.writeHead(500);
return res.end('Error loading index.html');
}
res.writeHead(200);
res.end(data);
})
}
io.on('connection', (socket) => {
socket.emit('news', {// trigger'news'Hello events:'world'
});
socket.on('my other event', (data) => {// listen'my other event'Console. log(data) is output after the result is generated. })})Copy the code
After the above code is complete, run Node app.js, open index.html, and then open the browser console. {hello: “world”} will be printed on the browser console, and {my: “Data “}, note that the two are in order, this code is clear, not to say more.
1.3 emit and on
Emit and ON are the two most important apis, corresponding to send and listen events, respectively.
-
Socket. emit(eventName[,…args]) : emits an event
-
Socket. on(eventName, callback) : listens for an emit event
We can very freely define and send an event emit on the server side and then listen on the client side and vice versa. The content can be in a variety of formats, ranging from basic data types such as Number, String, Boolean, Object, Array, and even functions. Using callback functions allows for a more portable interaction.
The sample code in Section 1.2 is the use of these two apis, but I won’t go into more detail here.
1.4 Broadcast
Broadcast is broadcast to all socket connections by default, but not to the sender itself.
Note: Ensure that socket connections are in the same namespace
Code explanation:
io.on('connection', (socket) => {// Send it to other clients except socket.broadcast.emit('news', {
hello: 'world'})})Copy the code
At this point, if you want to see the effect, can create an HTML page, the code can, after on the browser open two pages at the same time, when a page refresh, refresh a page is equivalent to trigger an event), the console output no content on the page, another page of the console will be output (can create more page views).
If you want to receive messages for yourself, you can
io.on('connection', (socket) => {// Send to yourself socket.emit('news', {
hello: 'world'})})Copy the code
1.5 Namespace (Namespace)
A namespace means that a message sent in a different domain can only be received by the socket of the current domain.
Function: Minimizes the number of resources (TCP connections), and provides channel partitioning for applications. (So that multiple application modules can share a single TCP connection)
If you want to isolate scopes, or partition business modules, you can use namespaces. Namespaces are equivalent to creating new channels that allow you to isolate different connections, times, and middleware on a socket. IO service.
The default namespace is /, and socket. IO clients connect to this namespace by default, and servers listen to this namespace by default.
Custom namespaces
Important: Namespaces are an implementation detail of the socket. IO protocol, independent of the actual URL underlying the transmission. The actual URL for the underlying transmission defaults to /socket. IO /… .
The first way to use a namespace is to add a subdomain name directly after the link, which is still using the same socket server process – soft isolation
Server code:
io
.of('my-nsp')
.on('connection', (socket) => {// Send it to other clients except socket.broadcast.emit('news', {
hello: 'world'}) // Send yourself socket.emit('news', {
hello: 'world'})})Copy the code
The client needs to modify the code:
let socket = io('http://localhost:3000/my-nsp');
Copy the code
The second way to use the namespace is with the path parameter, which actually restarts a socket service.
1.6 Room
Namespace, room, and socket
The socket will belong to a room, and if not specified, there will be a default room. This room will belong to a namespace, and if not specified, the default is /. (A namespace can have multiple rooms)
If the client does not specify room, the server will add the socket to the specified namespace. If the client does not specify room, the server will add the socket to the default room. Or the server can put it into the bar room with the code socket.join(bar).
By default, each ID is a room named socket.id (after a namespace is specified, the socket is preceded by a namespace, and the socket is automatically added to the room identified by this ID). After you customize the room, the original default controls still exist. Room is an object that contains sockets and length that are currently entering the room.
Code examples:
io
.on('connection', (socket) => {// Add a socket to a room on the server.'manannan', () => { console.log(socket.rooms); }); // Enter the room, and subsequent events are published only in the room IO. To ('manannan').emit('news', {
hello: 'world'}) // leave the room socket.leave('mananan')})Copy the code
The above content is the basic use, but in the actual project, it will certainly be more complicated than these, here is not a repeat, when we use things we have not used before, we must be good at checking official documents and Baidu.
So for more information on client apis and server apis, check the official documentation when you need it.