Ali CBU front-end team recruitment, interested partners add my wechat Casperchen, together with the previous appendix of their own blog or github address, resume can be directly sent to
I. Content overview
WebSocket makes the browser have the ability of real-time bidirectional communication. This article goes through the details of how WebSockets establish connections, exchange data, and format data frames. It also gives a brief introduction to security attacks against Websockets and how the protocol protects against such attacks.
What is WebSocket
HTML5 began to provide a browser and server for full duplex communication network technology, belongs to the application layer protocol. It is based on the TCP transport protocol and reuses the HTTP handshake channel.
This description is a bit boring for most Web developers, but just keep a few things in mind:
- WebSocket can be used in the browser
- Support for two-way communication
- It’s easy to use
1. What are the advantages
In terms of advantages, the comparison here is HTTP, which in a nutshell supports two-way communication, is more flexible, more efficient, and has better scalability.
- Support two-way communication, more real-time.
- Better binary support.
- Less control overhead. After the connection is created, when the WS client and server exchange data, the packet header controlled by the protocol is small. Without the header, the client-to-client header is only 2 to 10 bytes (depending on the packet length), with an additional 4-byte mask for client-to-server. HTTP requires a complete header for each communication.
- Support for extensions. The WS protocol defines extensions, and users can extend the protocol or implement custom subprotocols. (such as support for custom compression algorithms)
For the latter two points, students who have not studied the WebSocket protocol specification may not understand it intuitively, but it does not affect the learning and use of WebSocket.
2. What you need to learn
For network application layer protocol learning, the most important is often the connection establishment process, data exchange tutorial. Of course, the format of the data is unavoidable, as it directly determines the capabilities of the protocol itself. Good data formats make protocols more efficient and scalable.
The following points are mainly discussed below:
- How to Establish a connection
- How to exchange data
- Data frame format
- How to maintain connections
3. Examples for getting started
Before going into the details of the protocol, let’s take a look at a simple example to get a feel for it. Examples include WebSocket server and WebSocket client (web page). The full code can be found here.
Here the server uses the WS library. The WS implementation is lighter and more suitable for learning purposes than the familiar socket. IO.
1. Server
The code is as follows: listen on port 8080. When a new connection request arrives, a log is printed and a message is sent to the client. Logs are also generated when a message is received from the client.
var app = require('express') ();var server = require('http').Server(app);
var WebSocket = require('ws');
var wss = new WebSocket.Server({ port: 8080 });
wss.on('connection'.function connection(ws) {
console.log('server: receive connection.');
ws.on('message'.function incoming(message) {
console.log('server: received: %s', message);
app.get('/'.function (req, res) {
res.sendfile(__dirname + '/index.html');
Copy the code
2. Client
Initiate a WebSocket connection to port 8080. After the connection is established, logs are generated and messages are sent to the server. Logs are also generated after receiving messages from the server.
var ws = new WebSocket('ws://localhost:8080');
ws.onopen = function () {
console.log('ws onopen');
ws.send('from client: hello');
ws.onmessage = function (e) {
console.log('ws onmessage');
console.log('from server: ' +;
Copy the code
3. Running results
You can view server logs and client logs separately.
Server output:
server: receive connection.
server: received hello
Copy the code
Client output:
client: ws connection is open
client: received world
Copy the code
How to establish a connection
As mentioned earlier, WebSockets reuse the HTTP handshake channel. Specifically, the client negotiates the upgrade protocol with the WebSocket server through HTTP requests. After the protocol upgrade, the subsequent data exchange follows the WebSocket protocol.
1. Client: Apply for the protocol upgrade
First, the client initiates a protocol upgrade request. As you can see, the standard HTTP packet format is adopted and only the GET method is supported.
GET / HTTP / 1.1
Host: localhost:8080
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: w4v7O6xFTi36lq3RNcgctw==
Copy the code
The significance of the key request header is as follows:
Connection: Upgrade
: indicates that the protocol needs to be upgradedUpgrade: websocket
: indicates that the webSocket protocol is to be upgraded.Sec-WebSocket-Version: 13
: Indicates the websocket version. If the server does not support this version, you need to return oneSec-WebSocket-Version
Header, which contains the version number supported by the server.Sec-WebSocket-Key
: and the following server response headerSec-WebSocket-Accept
It is compatible and provides basic protection, such as malicious or unintentional connections.
Note that the above request omits part of the unfocused request header. Since it is a standard HTTP request, the headers of requests such as Host, Origin, and Cookie are sent as usual. During the handshake phase, security restrictions and permission verification can be performed through the relevant request headers.
2, server: response protocol upgrade
The status code 101 indicates protocol switchover. The protocol upgrade is completed here, and subsequent data interaction is based on the new protocol.
HTTP / 1.1 101 Switching Protocols
Upgrade: websocket
Sec-WebSocket-Accept: Oy4NRAQ13jhfONC7bP8dTKb4PTU=
Copy the code
Note: Each header ends with \r\n and an extra blank line \r\n is added to the last line. In addition, the HTTP status code that the server responds to can only be used during the handshake phase. After the handshake phase, only specific error codes can be used.
3. Calculation of sec-websocket-accept
Sec-websocket-accept Is calculated based on the sec-websocket-key in the header of the client request.
The calculation formula is:
- will
Joining together. - The digest is computed by SHA1 and converted to a Base64 string.
The pseudocode is as follows:
>toBase64( sha1( Sec-WebSocket-Key + 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 ) )
Copy the code
Verify the previous result:
const crypto = require('crypto');
const magic = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11';
const secWebSocketKey = 'w4v7O6xFTi36lq3RNcgctw==';
let secWebSocketAccept = crypto.createHash('sha1')
.update(secWebSocketKey + magic)
// Oy4NRAQ13jhfONC7bP8dTKb4PTU=
Copy the code
5. Data frame format
The data exchange between client and server is inseparable from the definition of data frame format. So, before we actually talk about data exchange, let’s take a look at the data frame format of WebSocket.
The minimum unit of communication between WebSocket client and server is frame, which consists of one or more frames to form a complete message.
- Sender: the message is cut into multiple frames and sent to the server;
- Receiver: Receives message frames and reassembles the associated frames into complete messages;
The focus of this section is to explain the format of data frames. Refer to section 5.2 of RFC6455 for detailed definitions.
1. Overview of data frame format
A unified format for WebSocket data frames is given below. Those of you familiar with TCP/IP are familiar with this diagram.
- From left to right, in bits. Such as
One bit each,opcode
It takes 4 bits. - The content includes identification, operation code, mask, data, and data length. (Expanded in the next section)
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+Copy the code
2. Detailed explanation of data frame format
Based on the previous format overview diagram, this section explains each field one by one. If there is any ambiguity, please refer to the protocol specification or message exchange.
FIN: 1 bit.
The value is 1, indicating that it was the last fragment of message. The value is 0, indicating that it was not the last fragment of message.
RSV1, RSV2, RSV3: each occupies one bit.
In general, they’re all 0’s. When the client and server negotiate to use WebSocket extension, the three flag bits can be non-0, and the meaning of the value is defined by the extension. If a non-zero value is present and the WebSocket extension is not used, the connection fails.
Opcode: 4 bits.
The value of Opcode determines how subsequent data payloads should be resolved. If the operation code is unknown, the receiver should fail the connection. Optional operation codes are as follows:
- %x0: indicates a continuation frame. When Opcode is 0, data shards are used in data transmission, and the received data frame is one of the data shards.
- %x1: indicates a text frame.
- %x2: indicates a binary frame.
- %x3-7: Reserved operation code for later defined non-control frames.
- %x8: The connection is down.
- %x9: indicates a ping operation.
- %xA: indicates this is a PONG operation.
- % xb-f: Reserved operation code for subsequent defined control frames.
Mask: 1 bit.
Indicates whether to mask the data payload. When sending data from the client to the server, mask the data. When sending data from the server to the client, there is no need to mask the data.
If the data received by the server has not been masked, the server needs to disconnect the data.
If Mask is 1, Masking key is defined in Masking-key and used to Mask the data payload. Mask 1 is used for all data frames sent by the client to the server.
The algorithm and usage of the mask are explained in the next section.
Payload Length: Indicates the length of the data Payload, in bytes. It is 7 bits, or 7+16 bits, or 1+64 bits.
Payload length === x
- If x ranges from 0 to 126, the length of data is x bytes.
- X is 126: the next two bytes represent a 16-bit unsigned integer whose value is the length of the data.
- X is 127: The next 8 bytes represent a 64-bit unsigned integer (highest bit 0) whose value is the length of the data.
In addition, if the payload length occupies more than one byte, the binary representation of the payload length is big endian.
Masking-key: 0 or 4 bytes (32 bits)
All data frames transmitted from the client to the server are masked with Mask 1 and 4-byte Masking key. If the Mask is 0, there is no Masking-key.
Note: Payload data length, excluding mask key length.
Payload data :(x+y) bytes
Load data: includes extended data and application data. Where, the extension data is x bytes, and the application data is Y bytes.
Extended data: 0 bytes of extended data if no extension is negotiated. All extensions must declare the length of the extended data, or how the length of the extended data can be calculated. In addition, how the extension will be used must be negotiated during the handshake phase. If the extended data exists, the payload data length must include the length of the extended data.
Application data: Any application data, after the extension data (if any), occupies the remaining space of the data frame. The length of the application data is obtained by subtracting the payload data length from the extension data length.
3. Mask algorithm
Masking-key is a random 32-bit number selected by the client. The mask operation does not affect the length of the data payload. The following algorithms are used for mask and inverse mask operations:
First, assume:
- Original-octet-i: indicates the I th byte of the original data.
- Transformed -octet -I: indicates the i-th byte of the transformed data.
- J:
i mod 4
Results. - Masking -key-octet-j: indicates the JTH byte of the mask key.
The algorithm is described as original-octet-i and masking-key-octet-j, and then transformed- OCtet-i is obtained.
j = i MOD 4 transformed-octet-i = original-octet-i XOR masking-key-octet-j
6. Data transmission
Once the WebSocket client and server establish a connection, subsequent operations are based on the transmission of data frames.
WebSocket distinguishes operation types based on Opcode. For example, 0x8 indicates disconnection, and 0x0-0x2 indicates data interaction.
1. Data sharding
Each WebSocket message may be split into multiple data frames. When the WebSocket receiver receives a data frame, it determines whether the last data frame of the message has been received according to the VALUE of the FIN.
FIN=1 indicates that the current data frame is the last data frame of the message. In this case, the receiver has received the complete message and can process the message. If FIN=0, the receiver needs to continue listening to receive other data frames.
In addition, opCode represents the type of data in the case of data exchange. 0x01 indicates text, and 0x02 indicates binary. 0x00 is special and represents a continuation frame, which, as the name suggests, means that the data frame corresponding to the complete message has not been received.
2. Examples of data sharding
It’s better to look at examples. The following example from MDN is a good example of data sharding. The client sends messages to the server twice. The server responds to the client after receiving the messages. This section describes the messages sent from the client to the server.
First message
FIN=1, indicating the last data frame of the current message. Once the server receives the current data frame, it can process the message. Opcode =0x1: indicates that the client sends a text message.
Second message
- FIN=0, opCode =0x1, indicating that the message type is text, and the message has not been sent yet, and there are subsequent data frames.
- FIN=0, opCode =0x0, indicating that the message has not been sent yet and there are subsequent data frames. The current data frame must be followed by the previous data frame.
- FIN=1, opCode =0x0: Indicates that the message has been sent and no subsequent data frame is displayed. The current data frame must be followed by the previous one. The server can assemble the associated data frames into complete messages.
Client: FIN=1, opcode=0x1, msg="hello"
Server: (process complete message immediately) Hi.
Client: FIN=0, opcode=0x1, msg="and a"
Server: (listening, new message containing text started)
Client: FIN=0, opcode=0x0, msg="happy new"
Server: (listening, payload concatenated to previous message)
Client: FIN=1, opcode=0x0, msg="year!"
Server: (process complete message) Happy new year to you too!
Copy the code
Seven, maintain the connection + heartbeat
WebSocket To maintain real-time bidirectional communication between the client and server, ensure that the TCP channel between the client and server is not disconnected. However, if a connection is maintained for a long time without data exchange, the connection resources may be wasted.
However, in some scenarios, the client and server need to be connected even though no data has been exchanged for a long time. At this point, a heartbeat can be used to achieve this.
- Sender -> Receiver: ping
- Recipient -> Sender: Pong
Ping and pong operations correspond to two control frames of WebSocket with opcode 0x9 and 0xA respectively.
For example, a WebSocket server can ping a client using the following code (using the WS module)' '.false.true);
Copy the code
Sec-websocket-key /Accept
As mentioned earlier, sec-websocket-key/sec-websocket-Accept is used to provide basic protection against malicious and unexpected connections.
The functions are summarized as follows:
- Prevent the server from receiving illegal WebSocket connections (for example, if an HTTP client accidentally requests to connect to the WebSocket service, the server can directly reject the connection)
- Make sure the server understands websocket connections. Since the WS handshake phase uses HTTP, it is possible that the WS connection is processed and returned by an HTTP server, in which case the client can use sec-websocket-key to ensure that the server is aware of the WS protocol. (Not 100% safe, there are always boring HTTP servers, light sec-websocket-key, but no WS protocol…)
- Sec-websocket-key and other related headers are disallowed when setting headers for ajax requests in the browser. This avoids websocket upgrade when the client sends ajax requests.
- This prevents the reverse proxy (which does not understand the WS protocol) from returning incorrect data. For example, when the reverse proxy receives two ws connection upgrade requests, it returns the first one to the cache, and then returns the second one directly to the cache (meaningless return).
- The main purpose of SEC-websocket-key is not to ensure data security, because the calculation formula of sec-websocket-key and SEC-websocket-accept conversion is public and very simple, and the main function is to prevent some common accidents (unintentional).
Note: the conversion of SEC-websocket-key/SEC-websocket-Accept can only bring basic guarantee, but there is no actual guarantee whether the connection is safe, whether the data is safe, whether the client/server is legitimate WS client, WS server.
Ix. Functions of data masks
In the WebSocket protocol, the data mask enhances the security of the protocol. However, the data mask is not to protect the data itself, because the algorithm itself is public and the operation is not complicated. There don’t seem to be many effective ways to secure communications other than encrypting the channel itself.
So why is it necessary to introduce the mask calculation? It seems that there is not much benefit except increasing the amount of computation of the computing machine (which is also the point of confusion of many students).
The answer is in two words: safety. Not to prevent data leaks, but rather to prevent proxy cache contamination attacks and other issues that existed in earlier versions of the protocol.
1. Proxy cache contamination attacks
Here’s an excerpt from a 2010 speech on security. It mentioned security issues that could result from a flaw in the proxy server’s protocol implementation. Bash the source.
“We show, empirically, that the current version of the WebSocket consent mechanism is vulnerable to proxy cache poisoning attacks. Even though the WebSocket handshake is based on HTTP, which should be understood by most network intermediaries, The handshake uses the esoteric “Upgrade” mechanism of HTTP [5]. In our experiment, we find that many proxies do not implement the Upgrade mechanism properly, which causes the handshake to succeed even though subsequent traffic over the socket will be misinterpreted by the Proxy.”
[TALKING] Huang, L-S., Chen, E., Barth, A., Rescorla, E., and C.
Jackson, "Talking to Yourself for Fun and Profit", 2010,
Copy the code
Before formally describing the attack steps, we assume the following actors:
- Attackers, servers controlled by attackers themselves (referred to as “evil servers”), forged resources by attackers (referred to as “evil resources”)
- Victims, resources that victims want to access (” justice resources “)
- The server that the victim actually wants to access (” Justice server “for short)
- Intermediate proxy server
Attack Step 1:
- The attacker browser initiates a WebSocket connection to the evil server. According to the previous section, the first is a protocol upgrade request.
- The protocol upgrade request actually reaches the proxy server.
- The proxy server forwards protocol upgrade requests to the evil server.
- The evil server agrees to the connection, and the proxy server forwards the response to the attacker.
Because of a bug in the upgrade implementation, the proxy server thought it was forwarding plain HTTP messages. Therefore, when the protocol server agrees to the connection, the proxy server assumes that the session has ended.
Attack Step 2:
- The attacker sends data to the nefarious server over the WebSocket interface from the previously established connection, and the data is carefully constructed HTTP text. This contains the address of the justice resource and a fake host (pointing to the justice server). (See the following message)
- The request arrives at the proxy server. Although the previous TCP connection was reused, the proxy server thought it was a new HTTP request.
- A proxy server requests an evil resource from an evil server.
- Evil server returns evil resources. Proxy servers cache evil resources (url is correct, but host is the address of the good server).
Here’s where the victim comes in:
- The victim accesses justice resources on the Justice server through a proxy server.
- The proxy server checks the url and host of the resource and finds a local cache (forged).
- Proxy servers return evil resources to victims.
- Victim pawns.
Attached: the carefully constructed “HTTP request message” mentioned earlier.
The Client and Server: POST/path/of HTTP / 1.1 / attackers/choice Host: Sec - WebSocket - Key: <connection-key> Server → Client: HTTP/1.1 200 OK sec-websocket-accept: <connection-key>Copy the code
2. Current solution
The original proposal was to encrypt the data. Considering security and efficiency, a compromise scheme is finally adopted: mask processing for data load.
It should be noted that the browser is only limited to the data payload mask processing, but the bad guys can fully realize their own WebSocket client, server, not according to the rules, the attack can be carried out as usual.
But by putting this restriction on browsers, you can make the attack much more difficult and reach. If there is no such limitation, you just need to put a phishing website on the Internet to deceive people to visit, and you can launch a large-scale attack in a short time.
Write in the back
There are a lot of things WebSocket can write, such as WebSocket extensions. How clients and servers negotiate and use extensions. WebSocket extension can add a lot of capabilities and imagination to the protocol itself, such as data compression, encryption, and multiplexing.
Space is limited, here not to expand, interested students can leave a message exchange. Please point out any mistakes or omissions in the article.
11. Related links
RFC6455: WebSocket specification
Specification: Data frame mask details
Specification: Data frame format
Write a Websocket server
Attacks on network infrastructure (something to be prevented by data mask operations)
Talking to Yourself for Fun and Profit
What is Sec-WebSocket-Key for?
10.3. Attacks On Infrastructure (Masking)
Talking to Yourself for Fun and Profit
Why are WebSockets masked?
How does websocket frame masking protect against cache poisoning?
What is the mask in a WebSocket frame?