An overview of the

This paper is the fifth chapter of WebSocket protocol. The main content of this paper is the data related content transmitted by WebSocket.

Those interested in the previous chapters of this document can see:

  • WebSocket Protocol — Abstract
  • WebSocket Protocol Chapter 1 — Introduction
  • WebSocket Protocol Chapter 2 — Conformance Requirements
  • WebSocket Protocol chapter 3 — WebSocket URIs
  • WebSocket Protocol chapter 4 — Opening Handshake

Data frame (Protocol body)

5.1 an overview

In the WebSocket protocol, data is transmitted through a series of data frames. To avoid network mediations (such as some intercepting proxy) or security reasons discussed in Section 10.3, the client must add masks to all frames it sends to the server (see Section 5.3 for details). (Note: Frames need masks regardless of whether TLS is used by the WebSocket protocol). When a server receives a frame without an added mask, it must immediately close the connection. In this case, the server can send a close frame with the status code 1002 (protocol error) defined in Section 7.4.1. The server cannot add a mask when sending data frames to the client. If a client receives a frame with an added mask, it must immediately close the connection. In this case, it can use the 1002 (Protocol Error) status code defined in Section 7.4.1 (p. (These rules may be liberalized in future specifications).

The underlying data frame protocol defines the frame type using an opcode, payload length, and a specified location defined in payload data for placing “extended data” and “reference data.” Certain bits and opcodes are reserved for future protocol extensions.

A data frame can be transmitted through either the client or the server at any time after the start handshake is complete and before an endpoint sends a close frame (Section 5.5.1, p. 541).

5.2 Basic Frame protocol

The wired format for the data transfer part in this section is specified by ABNFRFC5234. Note: Unlike the rest of this document, ABNF in this section operates on groups of bits. The length of each bit group is shown in the comments. When coding on line, the highest bit is at the far left of ABNF. An advanced preview of data frames can be seen below. If there is a conflict between the following specification and the ABNF specification later in this section, the following specification shall prevail.

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload  Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+Copy the code

FIN: 1 bit

Indicates that this is the last fragment of the message. The first clip could be the last.

RSV1, RSV2, RSV3: each 1 bit

Must be set to 0 unless an extension of the meaning of a non-0 value is extended. If a non-zero value is received but does not extend the meaning of any non-zero value, the receiving terminal must disconnect the WebSocket connection.

Opcode: 4 bit

Define the interpretation of “payload data”. If an unknown opcode is received, the receiving terminal must disconnect the WebSocket connection. The following values are defined.

%x0 represents a duration frame

%x1 represents a text frame

%x2 represents a binary frame

%x3-7 is reserved for future non-control frames

%x8 represents a connection closure package

%x9 indicates a ping package

%xA represents a pong package

% xb-f reserved for future control frames

Mask: 1 bit

Mask flag bit that defines whether to add a mask to payload Data. If set to 1, the Key value for the mask is in Masking-Key, which is typically used to decode “payload data,” as described in Section 5.3. All frames sent from the client to the server need to set this bit to 1.

Payload length: 7 bits, 7+16 bits, or 7+64 bits

The length of “payload data” in bytes, representing the length of payload data if the value is 0-125. If it is 126, then the next two bytes are interpreted as 16-bit unsigned integers as the length of the payload data. If it is 127, then the next eight bytes are interpreted as a 64-bit unsigned integer (the highest bit must be 0) as the length of the payload data. The multi-byte length metric is expressed in network byte order. In all cases, the length value must be encoded using the minimum number of bytes. For example, 124-byte strings are not encoded using the sequence 126,0,124. Payload length refers to the length of extended data + Application data. The length of “extended data” may be 0, so the payload length is the length of “application data”.

Masking-Key: 0 or 4 bytes

All frames sent from the client to the server have been computed with a 32-bit mask contained in the frame. If the mask flag bit (1 bit) is 1, then this field exists, if the flag bit is 0, then this field does not exist. See section 5.3 for more information on adding a client-to-server mask.

Payload data: (x+y) bytes

“Payload data” means “extended data” and “application data”.

Extension data: x bytes

The length of Extended Data is 0 bytes unless an extension has been negotiated. In the handshake protocol, any extension must specify the length of the “extension data”, how the length is computed, and how the extension is to be used. If there is an extension, this “extension data” is included in the total payload length.

Application data: y bytes

Any “application data” takes up all the remaining fields after “Extended Data”. The length of Application data is equal to the payload length minus the extended Application length.

The basic data frame protocol is formally defined through ABNF. It’s important to know that this data is binary, not ASCII characters. For example, a 1 bit field with a value of %x0 / %x1 represents a single bit with a value of 0/1, rather than an entire byte (8 bits) representing the ASCII encoded characters “0” and “1”. A field value with a length of 4 bits in the range %x0 -f represents 4 bits, not the ASCII code value corresponding to the byte (8 bits). Do not specify character encoding: “Rules are resolved to a final set of values, sometimes characters. In ABNF, a character is simply a non-negative number. In a particular context, the encoding set (for example ASCII) is mapped according to a particular value. Here, the encoding type specified is the binary-encoded final data that encodes each field into a specific bits array.

ws-frame =

  • frame-fin; The length is 1 bit
  • frame-rsv1; The length is 1 bit
  • frame-rsv2; The length is 1 bit
  • frame-rsv3; The length is 1 bit
  • frame-opcode; The length is 4 bits
  • frame-masked; The length is 1 bit
  • frame-payload-length; The length can be 7 or 7+16 or 7+64 bits
  • [frame-masking-key]; The value contains 32 bits
  • frame-payload-data; Length n*8 bits greater than 0 (n>0)

frame-fin =

  • %x0, except for 1 below
  • %x1, last message frame
  • The length is 1 bit

frame-rsv1 =

  • %x0 / %x1. The length is 1 bit, and must be 0 if there is no negotiation

frame-rsv2 =

  • %x0 / %x1. The length is 1 bit, and must be 0 if there is no negotiation

frame-rsv3 =

  • %x0 / %x1. The length is 1 bit, and must be 0 if there is no negotiation

frame-opcode =

  • frame-opcode-non-control
  • frame-opcode-control
  • frame-opcode-cont

frame-opcode-non-control

  • %x1, text frame
  • %x2, binary frames
  • %x3-7, reserved for future non-control frames
  • The length is 4 bits

frame-opcode-control

  • %x8, connection closed
  • % x9, ping frames
  • % xA, pong frames
  • % xb-f, reserved for future control frames
  • The length is 4 bits

frame-masked

  • %x0, no mask added, no frame-masking-key
  • %x1, add mask, there is frame-masking-key
  • The length is 1 bit

frame-payload-length

  • % x00-7d, the length is 7 bits
  • %x7E frame-payload-length-16 The value is 7+16 bits
  • %x7F frame-payload-length-63 The value is 7+64 bits

frame-payload-length-16

  • %x0000-FFFF, the length is 16 bits

frame-payload-length-63

  • %x0000000000000000- 7ffFFfffffffff The value contains 64 bits

frame-masking-key

  • 4(% x00-ff), available when frame-mask is 1. The length is 32 bits

frame-payload-data

  • Frame-masked -extension-data frame-masked-application-data, when frame-masked is 1
  • Frame-unmasked -extension-data frame-unmasked-application-data, when frame-unmasked is 0

frame-masked-extension-data

  • *(% x00-ff), reserved for future extensions, length n*8, where n>0

frame-masked-application-data

  • *(% x00-ff), the length is n*8, where n>0

frame-unmasked-extension-data

  • *(% x00-ff), reserved for future extensions, length n*8, where n>0

frame-unmasked-application-data

  • *(% x00-ff), the length is n*8, where n>0

5.3 Adding a Mask from the Client to the Server

Data frames that add masks must have the frame-masked field set to 1, as defined in Section 5.2.

The mask value is completely contained in the frame-masking-key in the frame as mentioned in Section 5.2. It is used to add a mask to the data containing Extension data and Application data in the Payload data field defined in the same section.

The mask field is a 32-bit value randomly selected by the client. When preparing the mask frame, the client must inform you of the new mask value from the allowed 32bit value. The mask value must be unpredictable; Therefore, the mask must come from a strong entropy source, and the given mask does not make it easy for the server or agent to predict subsequent frames. The unpredictability of masks is critical to preventing malicious application authors from exposing relevant byte data on the web. RFC 4086 discusses what a strong entropy source is appropriate for security-sensitive applications.

The mask does not affect the length of Payload data. The masked data is converted to unmasked data, or vice versa, according to the following algorithm. The same algorithm can be applied to any direction of operation. For example, the steps involved in masking the data are the same as those involved in masking the data.

I (transform-OCtet-i) representing the transformed data is the XOR operation (XOR) of I (original- OCtet-i) representing the original data and the mask value (masking- key-OCtet-j) obtained by the index I mod 4:

j = i MOD 4 transfromed-octed-i = original-octet-i XOR masking-key-octet-j

The length of the payload in the frame-payload-length field defined in the specification, excluding the length of the mask value. It’s just the length of the Payload data. The number of bytes in the array following the mask value.

5.4 Message Fragmentation

The main purpose of message sharding is to allow the sending of a message of unknown length that does not need to be cached once the message has been sent. If the message cannot be sharded, one side must cache the entire message, so the length of the message must be calculated before the first byte is sent. If a message is fragmented, the server or proxy can choose a reasonable cache length and send a fragment to the network when the cache is full.

The second message sharding scenario is a multiplexing scenario where it is not appropriate to transmit a large message in a logical channel over the entire output channel. Multiplexing requires the ability to freely slice messages into smaller pieces to share output channels. (Note: multiplexing extensions are not discussed in this document).

Frames have no semantic meaning unless otherwise specified in the extension. If there are no negotiated extension fields between the client and the service, or if there are negotiated extension fields between the server and the client, and the agent is fully aware of all negotiated extension fields and knows how to merge and split frames in the presence of these extended fields, the agent may merge or split frames. One implication of this is that in the absence of an extended field, neither the sender nor the receiver can rely on the existence of a particular frame boundary.

The rules for message sharding are as follows:

  • An unsharded message contains a single frame with the FIN field set (marked 1) and an opcode other than 0.
  • A sharded message contains a single frame with an unset FIN field (marked 0) and an opcode other than 0, followed by zero or more frames with an unset FIN field and opcode 0, and ends with a frame with the FIN field and opcode 0 set. The payload field of a fragmented message combined in frame order is equivalent to the value contained in a single larger message payload field. However, if the extension field exists, because the extension field is definedExtension dataTherefore, the previous conclusion may not hold. Such as:Extension dataIt may appear only at the beginning of the first clip and apply to subsequent clips, or it may occur in every clipExtension data, but only for certain segments. inExtension dataThe following example demonstrates how message sharding works when it does not exist. Example: A text needs to be sent in three fragments. The first fragment contains the opcode 0x1 and the FIN field is not set, the second fragment has the opcode 0x0 and the FIN field is not set, and the third fragment has the opcode 0x0 and the FIN field is set.
  • A control frame (see Section 5.5) may be inserted in the middle of a shard message. The control frame cannot be fragmented.
  • Message fragments must be sent sequentially at the sending end to the receiving end.
  • Unless this nesting logic is defined in the extension, a slice of a message cannot be transported in a nested way with a slice of another message.
  • An endpoint must be able to process control frames in a fragmented message.
  • The sender may create non-control message fragments of any size.
  • The client and server must support both sharded and non-sharded messages.
  • Control frames cannot be sharded, and agents are not allowed to change the fragments of control frames.
  • If reserved fields are in use and the broker cannot understand the values of those fields, the broker cannot change the fragment of the message.
  • An agent cannot change any fragment of a message when an extension field has already been negotiated, but the agent does not know the specific semantics of the negotiated extension field. Similarly, the extension cannot see the WebSocket handshake (and does not receive notification content), causing the WebSocket connection to prohibit changing any message fragment during the connection.
  • As a conclusion of these rules, all message fragments are of the same type, and the opccode field for the first fragment is set. Control frames cannot be shard. All message shard types must be either text or binary, or either of the reserved opcodes.

Note: If the control frame is not interrupted, the heartbeat (ping) wait time may become very long, for example after a large message. Therefore, it is necessary to insert the control frame in the message transmission of the shard.

Practical note: If the extended field does not exist, the receiver does not need to use the cache to store the entire message fragment for processing. For example, if you use a streaming API, you can send the data to the upper application when you receive a partial frame. However, this assumption may not hold true for all future WebSocket extensions.

5.5 control frame

Control frames are distinguished by the highest bit of the opcode having a value of 1. The control frame opcodes currently defined include 0x8 (off), 0x9 (heartbeat Ping), and 0xA (heartbeat Pong). The opcode 0xB-0xf is not defined and is currently reserved for future control frames.

The control frame is used for WebSocket communication state. A control frame can be inserted into a message fragment for transmission.

All control frames must have a load length of 126 bytes or less and cannot be fragmented.

5.5.1 Closing (Close)

The opcode value of the control frame is 0x8.

The closed frame may contain the body (the “application data” part of the frame) to indicate why the connection was closed, such as a terminal break, or the terminal receiving a frame that is too large, or the terminal receiving a content that does not conform to the expected format. If this content exists, the first two bytes of the content must be an unsigned integer (in network byte order) to represent the status code defined in Section 7.4. These two integer bytes can be followed by utF-8 encoded data values (reasons), which are not defined in this document. Data values are not necessarily human-readable, but must be useful for debugging or convey information about the connection currently open. Data values are not guaranteed to be readable, so they cannot be presented to end users.

Control frames sent from the client to the server must be masked, as described in Section 5.3.

The application disallows sending any data frames after sending a closed control frame.

If an endpoint receives a closed control frame and has not previously sent a closed frame, it must send a closed frame in response. (When sending a close frame in response, the terminal usually outputs the status code it receives.) The close frame of the response should be sent as soon as possible. An endpoint may delay sending the close frame until the current message has been sent (for example, if most of the shard messages have been sent, the endpoint may send the remaining message fragments before sending the close frame). However, an endpoint that has sent a close frame is not guaranteed to continue processing the received message.

After a close frame has been sent and received, the endpoint considers the WebSocket connection to be closed and must close the underlying TCP connection. The server must close the underlying TCP connection immediately. The client should wait for the server to close the connection, but can close the connection at any time after receiving the close frame. For example, if no TCP shutdown instruction is received within a reasonable period of time.

If the client and server send a close frame at the same time, both ends will send and receive a close message, and the WebSocket connection should be considered closed and the underlying TCP connection closed.

5.5.2 heartbeat Ping

The heartbeat Ping frame contains the opcode 0x9.

The close frame may contain “application data”.

If a heartbeat Ping frame is received, the terminal must send a heartbeat Pong frame in response, unless a close frame has been received. The terminal should restore the Pong frame as soon as possible. Pong frames will be discussed in section 5.5.3.

An endpoint may send a Ping frame at any time between establishing a connection and closing it.

Note: The Ping frame may be used to keep alive or to verify that the remote end is still responding.

5.5.3 heartbeat Pong

The heartbeat Ping frame contains the opcode 0xA.

Section 5.5.2 specifies the requirements for Ping and Pong frames.

The Pong frame sent in response must carry the entire “Application Data” field passed from the Ping frame.

If an endpoint receives a Ping frame but does not send a Pong frame in response to the previous Pong frame, it may choose to reply to the most recently processed Ping frame with a Pong frame.

Pong frames can be actively sent. This will act as a single heartbeat. The expected response of Pong packages is not specified.

5.6 data frame

Data frames (such as non-control frames) are defined as the highest bit value of the opcode being 0. The data frame operation currently defined contains 0x1 (text) and 0x2 (binary). Opcodes 0x3-0x7 are opcodes reserved as non-control frames.

Data frames carry application/extension layer data. The opcode determines how the data is parsed:

The text

The “load field” is text data encoded in UTF-8. Note that special text frames may contain partial UTF-8 sequences; However, the entire message must be valid UTF-8 encoded data. The processing of invalid UTF-8 encoded data after recombining messages is described in Section 8.1.

binary

The “load field” is arbitrary binary data, which is resolved only by the application layer.

5.7 the sample

  • A single frame of unmasked text message 0x81 0x05 0x48 0x65 0x6C 0x6C 0x6F (content “Hello”)
  • A single frame add mask text message 0x81 0x85 0x37 0xFA 0x21 0x3D 0x7F 0x9F 0x4D 0x51 0x58 (content Hello”)
  • A fragmented text message without added mask 0x01 0x03 0x48 0x65 0x6C (content “Hel”) 0x80 0x02 0x6C 0x6f (content “LO “)
  • Unmasked Ping requests and mask-added Ping responses 0x89 0x05 0x48 0x65 0x6c 0x6C 0x6f (containing “Hello”, But the text content is arbitrary) 0x8A 0x85 0x37 0xFA 0x21 0x3D 0x7F 0x9F 0x4D 0x51 0x58 (contains the content of “Hello”, matches the content of ping)
  • 256 bytes of binary data into an unmasked data frame 0x82 0x7E 0x0100 [256 bytes of binary data\]
  • 64KB Binary data in an unmasked frame 0x82 0x7F 0x0000000000010000 [65536 bytes of binary data\]

5.8 scalability

The protocol is designed to allow for extensibility, adding capabilities to the base protocol. The connection of the terminal must negotiate all extensions used during the handshake. Opcodes from 0x3-0x7 and 0xB-0xF are provided in the specification. The “extended Data” field, frame-rsv1, frame-Rsv2, and frame-Rsv3 fields in the data frame Header can all be used for extension. The extended negotiation discussion will be discussed in more detail in section 9.1. Here are some of the expected extensions. The following list is not complete and is not part of the specification.

  • Extended Data can be placed before application data in Load Data.
  • Reserved fields can be used for each frame as needed.
  • The value of the reserved opcode can be defined.
  • If more opcodes are required, reserved opcode fields can be defined.
  • Reserved fields or “extended” opcodes can be defined in additional places in the “load data” allocation, which can define larger opcodes or more fields per frame.