preface

Suppose we now have a requirement for a multiplayer chat room. The idea is that the client sends a message to the server, and the server returns all the chat messages. It seems simple, but there is a problem. If I don’t send a message, how will the server return other people’s chat messages? A clumsy way to do this is to keep asking the server if other people are sending messages. This is because HTTP protocol is one-way communication, only supports client to server communication, not server to client communication, then we have a simpler solution? In order to solve the real-time data transmission and two-way communication requirements, our protagonist WebSocket came on stage.

First, WebSocket introduction

1. What is WebSocket

HTML5 specification provides a browser and server for full duplex communication network technology, its biggest feature is that the server can take the initiative to push information to the client, the client can also take the initiative to send information to the server, is a real two-way equal dialogue, belongs to a server push technology.

2. The advantages

As opposed to the HTTP protocol

  1. Support for two-way communication

  2. Simple to use, just need to call the API in the browser to complete the protocol switch

  3. Support extension, you can implement custom sub-protocols in the protocol

  4. The header cost is low and can be transmitted faster over the network

What does WebSocket need to learn

For a network protocol, the first thing we need to know is how does it work

1. How do I establish a connection

Client: Applies for a protocol upgrade

First, the client initiates an HTTP request to upgrade the protocol.

WebSocket request header

GET ws://localhost:8080/ws HTTP/1.1 Host: localhost:8080 **Connection: Upgrade Pragma: No-cache cache-control: no-cache user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36 **Upgrade: Websocket // Indicates that the websocket protocol is to be upgraded. Origin: http://localhost:3000 **Sec- websocket-version: 13 // Indicates the WebSocket Version accept-encoding: gzip, deflate, br Accept-Language: zh-CN,zh; Q= 0.9 ** sec-websocket-key: C7dHJDei+oA4n+deqF1sVQ== // Browser generated with Version, providing basic protection ** sec-websocket-extensions: permessage-deflate; Client_max_window_bits // This header field is only used for WebSocket opening handshakes.Copy the code

HTTP request headers

POST /api/proxy? Info =studylist-all HTTP/1.1 Host: oms.test.igetget.dc Connection: keep-alive Content-Length: 132 Accept: Application /json, text/plain, */* X-requested-with: XMLHttpRequest user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36 Content-Type: application/json; charset=UTF-8 Origin: http://oms.test.igetget.dc Referer: http://oms.test.igetget.dc/oms-host Accept-Encoding: gzip, deflate Accept-Language: zh-CN,zh; Q =0.9 Cookie: // The Cookie is too long and deletedCopy the code

Server: responds to protocol upgrade

WeSocket response headers

HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade SEC-websocket-accept: AJ50yH17ZJy90K+0rmqlyIfdsBM= // According to the client request header sec-websocket-key calculated.Copy the code

The HTTP response headers

HTTP/1.1 200 OK Server: nginx/1.16.1 Date: Wed, 04 Aug 2021 07:09:56 GMT Content-Type: Application /json; charset=utf-8 Transfer-Encoding: chunked Connection: keep-alive x-frame-options: SAMEORIGIN x-xss-protection: 1; mode=block x-content-type-options: nosniff x-download-options: noopen x-readtime: 19Copy the code

2. Data frame format

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload  Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+Copy the code

FIN: 1 bit. Indicates whether it is the last shard

RSV1, 2,3: each occupies one bit. The value must be 0 unless a non-zero extension is used. If the server receives a non-zero value and has no extension, the server automatically disconnects

Opcode: 4 bits.

The value of Opcode determines how subsequent data payloads should be resolved. If the operation code is unknown, the receiver should fail the connection. Optional operation codes are as follows:

  • 0x0: indicates a continuation frame.

  • 0x1: indicates that this is a text frame

  • 0x2: indicates that this is a binary frame

  • 0x3-7: Reserved for further non-control frames

  • 0x8: The connection is down.

  • 0x9: indicates that this is a ping operation.

  • 0xA: indicates this is a PONG operation.

  • 0xB-F: Reserved for further control frames

0x3-7 and 0xB-F are reserved frames, but one is a non-control frame and the other is a control frame

Mask: 1 bit.

Defines whether to mask payload data. If set to 1, there is a mask key in masking-key, which is used to unmask “payload data.” All frames sent from the client to the server have this bit set to 1.

Payload length:

The length of the payload data, in bytes: the payload length if 0-125. If it is 126, the next two bytes interpreted as a 16-bit unsigned integer are the payload length. If it is 127, the following eight bytes interpreted as a 64-bit unsigned integer (the most significant bit must be 0) are the payload length. Multibyte length metrics are expressed in network byte order. Note that in all cases the minimum number of bytes must be used to encode the length; for example, the length of a 124-byte string cannot be encoded as the sequence 126, 0, 124. The payload length is the length of “Extended data” + the length of “Application data”. The length of “extended data” can be zero, in which case the payload length is the length of “application data”.

Masking-key: 0 or 4 bytes

All data frames transmitted from the client to the server are masked with Mask 1 and 4-byte Masking key. If the Mask is 0, there is no Masking-key.

Payload data :(x+y) bytes

Payload data is defined as extended data connected to application data.

Extended data: Extended data is 0 bytes unless the extension has been negotiated. Any extension must specify the length of the extension data, or how to calculate that length, and how the extension usage must be negotiated during the opening of the handshake. If present, the extended data is included in the total payload length.

Application data: The length of the application data is equal to the payload length minus the length of the extension data.

3. Mask algorithm

Masking-key is a random 32-bit number selected by the client. The mask operation does not affect the length of the data payload. The following algorithms are used for mask and inverse mask operations.

Assumptions:

  • Original-octet-i: indicates the I th byte of the original data.

  • Transformed -octet -I: indicates the i-th byte of the transformed data.

  • J: is the result of I mod 4.

  • Masking -key-octet-j: indicates the JTH byte of the mask key.

The algorithm is described as follows:

After original-octet-i and masking-key-octet-j xor, the transformed- OCtet-i is obtained.

That is:

J = I MOD 4 // MOD is a modular operator. // Hello WebSocket let uint8 = new Uint8Array([ 0x68, 0x65, 0x6C, 0x6f, 0x20, 0x77, 0x65, 0x62, 0x73, 0x6F, 0x63, 0x6b, 0x65,0x74, 0x0A, ]) let maskingKey = new Uint8Array([0x08, 0xf6, 0xef, 0xb1]) let maskedUint8 = new Uint8Array(uint8.length) for (let i = 0, j = 0; i < uint8.length; i++, j = i % 4) { maskedUint8[i] = uint8[i] ^ maskingKey[j] } console.log( Array.from(maskedUint8) .map((num) => Number(num).toString(16)) .join(" ") ) // 60 93 83 de 28 81 8a d3 7b 99 8c da 6d 82 e5Copy the code

❓ Consider: Data masks enhance protocol security. However, the data mask is not to protect the data itself, because the algorithm itself is public and the operation is not complicated. So why introduce a data mask? 🤔

Data masks were introduced to prevent problems such as proxy cache contamination attacks that existed in earlier versions of the protocol.

4. Data transfer

In websocket, data is passed in the form of fragments

4.1 Data Sharding

I don’t know if you have seen the TCP packet format, but websocket sharding is similar to TCP, that is, a complete piece of data is divided into several segments and sent separately. This is called data sharding. Websocket checks whether the message is the last shard through FIN. FIN=0 indicates that the message has not been received yet and needs to wait. FIN=1 indicates that the message is the last shard. During data transmission, if the opcode is 0x1, the data is a piece of text (UTF-8). 0x2 indicates binary data. 0x0 represents a continuation frame

4.2 Data Sharding Example

Source MDN

Client: FIN=1, opcode=0x1, msg="hello"
Server: (process complete message immediately) Hi.

Client: FIN=0, opcode=0x1, msg="and a"
Server: (listening, new message containing text started)
Client: FIN=0, opcode=0x0, msg="happy new"
Server: (listening, payload concatenated to previous message)
Client: FIN=1, opcode=0x0, msg="year!"
Server: (process complete message) Happy new year to you too!
Copy the code

In this example, the client sends two messages to the server, the first in a single frame and the second across three frames.

The first message is a complete message that the server can respond to or act on after seeing FIN=1 and OpCode =0x1.

In the second message, the server sees FIN=0 and opCode =0x1, knows that the text message is not finished, waits until it receives the shard with FIN=1, combines them together, and then responds or acts on the message.

5. How do I maintain the connection

There are two special opcodes in the above opcode, the Ping frame and the Pong frame, which are combined to ensure that the connection is not broken.

The client can send a Ping frame at any time after establishing a connection and before disconnecting, and the server must immediately respond to the Pong frame upon receiving the ping frame.

If a server receives a ping frame but has not yet responded to the previous ping frame, it can respond to the latest ping frame.

Ping frames can be used both as a means of maintaining a connection and to verify that the remote endpoint is still responding

❓ thinking: WebSocket is two-way communication, so can the server send pong frames separately

A Pong frame may be sent unsolicited. This serves as a unidirectional heartbeat. A response to an unsolicited Pong frame is not expected.

6. How do I close the connection

The process of closing a connection is similar to TCP, except that instead of four waves, only one wave is needed. Closing a WebSocket connection can be initiated by either endpoint, or both, by sending a close frame

❓ consider: Under what circumstances would disconnection be rejected 🤔

If another data fragment is not sent to one of the endpoints in the network, and the closed frame arrives before the data frame, the breakpoint will not be disconnected immediately, but will wait until all the data is received. The end point of the closed frame will only send once, and will not send more closed frames.

Disconnection also includes abnormal shutdown. For details, see RFC6455.

WebSocket example

The chat room

index.js

const app = require("express")() const server = require("http").Server(app) const WebSocket = require("ws") const MyWs =  new WebSocket.Server({ port: 8080 }) MyWs.on("open", () => { console.log("connected") }) MyWs.on("close", () => { console.log("disconnected") }) MyWs.on("connection", (ws, req) => { const port = req.connection.remotePort const clientName = port console.log(`${clientName} is connected`) // Ws. send("Welcome :" + clientName + "join chat room ") ws.on("message", ForEach ((client) => {if (client.readyState === websocket.open) {// Broadcast a message to all clients myws.client.foreach ((client) => {if (client.readyState === websocket.open) { client.send(clientName + " -> " + message) } }) }) }) app.get("/", (req, res) => { res.sendfile(__dirname + "/index.html") }) app.listen(3000)Copy the code

index.html

<! DOCTYPE html> <html> <head> <meta charset="UTF-8" /> <title>WebSocket DEMO</title> </head> <body> <script type="text/javascript"> if (! window.WebSocket) { window.WebSocket = window.MozWebSocket } const socket = new WebSocket("ws://localhost:8080/ws") socket.onmessage = function (event) { let ta = document.getElementById("responseText") ta.value = ta.value + "\n" + Event.data} socket.onOpen = function (event) {let ta = document.getelementById ("responseText") ta. Value = "Connectionopen!" } socket.onclose = function (event) {let ta = document.getelementById ("responseText") ta.value = ta.value + "Connection closed"} function send(message) { if (! Window.websocket) {return} if (socket.readyState == websocket.open) {socket.send(message)} else {alert(" Connection is not OPEN ")}  } </script> <form onsubmit="return false;" > < span style="width: 500px; height: 300px" ></textarea> <br /> <input type="text" name="message" style="width: 300px" value="Hello" /> <input type="button" value=" send(this.form.message. Value)" /> <input Type = "button" onclick = "javascript: the document. The getElementById (' the responseText). Value = '" value =" empty chat "/ > < / form > < / body > </html>Copy the code

At the end of small language

In this article, a brief introduction of the WebSocket protocol connection establishment process, data frame format, data transfer process, and how to maintain the connection and close the connection, more detailed process can see the full text of the WebSocket specification protocol, WebSocket can be introduced there are many, Websocket extensions, WebSocket security, and more can be found in the links below.

It has been 10 years since the websocket protocol came out, but it is still not very popular. Personally, it is still related to the use scenario. Long connections will occupy server resources, which may cause a certain waste of resources. If you have any questions about the introduction, please point out.

The resources

RFC6455:datatracker.ietf.org/doc/html/rf…

WebSocket: 5 minutes from beginner to master juejin.cn/post/684490…

In-depth WebSocket communication protocol details: www.52im.net/thread-332-…

Cross-site WebSocket hijacking: www.52im.net/thread-793-…

MDN documentation: developer.mozilla.org/zh-CN/docs/…