The article introduces
I’ve written two previous articles about WebSocket: “From Getting started with WebSocket to writing an Open Source library” and “How Python crawls WebSocket Data in Real time.” Today’s article, in general, is similar in content structure to previous articles. But the quality is further, suitable for friends who want to fully master WebSocket protocol, so they come to share with you.
WebSocket is a protocol for full duplex communication on a single TCP connection, which makes the data exchange between the client and the server easier. WebSocket is usually applied in scenarios requiring high real-time performance, such as event data, stock and securities, web chat and online drawing, etc.
WebSocekt is a completely different protocol from HTTP, but it is also widely used. Whether back-end developers, front-end developers, crawler engineers or information security workers, should master the knowledge of WebSocekt protocol.
In this article, you will learn the following:
- Read the WebSocket protocol specification document RFC6455
- WebSocket versus HTTP
- Data frame format and field meaning
- Client – server interaction flow
- How do clients and servers stay connected
- When to disconnect
This article is intended for web developers and product managers
start
WebSocket is a protocol for full duplex communication over a single TCP connection. The WebSocket communication protocol was standardized by the IETF in 2011 as RFC6455, and is supplemented by RFC7936. This leaves many readers wondering: What is an RFC?
An RFC is a series of numbered documents consisting of a series of drafts and standards. Almost all Internet communication protocols are recorded in RFC, such as HTTP protocol standard, WebSocket protocol standard, Base64 coding specification and so on. In addition, RFC has added many topics. In this Chat, our study and discussion of WebSocekt will be based on RFC6455.
Source of the WebSocket protocol
Before the WebSocket protocol, web sites used polling to achieve something like “real-time data updates.” It is important to note that “real-time data update” is quoted here, which means it is not really real-time data update. Polling refers to the act of a client making an active HTTP request to a server to confirm whether there is new data at a specific time interval. The following diagram depicts the polling process:
First, the client sends an HTTP request to the server. The purpose of this request is to ask the server, “Do you have any new data?” . After receiving the request, the server responds according to the actual situation (with or without data) :
- If there is data, I will send it to you.
- No data, you can ask later;
The obvious disadvantage of this question-and-answer approach is that the browser is constantly making requests to the server. Since HTTP requests contain long headers (such as User-Agent, Referer, Host, etc.), only a small portion of the actual valid data is likely to waste a lot of bandwidth resources.
A better way to “update data in real time” than polling is Comet. This technique allows two-way communication, but still requires repeated requests. And in Comet, HTTP long connections are used, which also consumes server resources. In this case, HTML5 defines the WebSocket protocol that is more resource efficient and enables stable real-time communication between two ends. Under the WebSocket protocol, the client and server only need to complete a handshake, can directly create a persistent connection, and two-way data transmission. The following figure describes the process of two-end communication in WebSocket protocol:
The advantages of the WebSocket
Compared with HTTP, WebSocket has the advantages of low overhead, high real-time performance, binary message transmission, extension and better compression. These advantages are described below:
Less overhead
WebSocket requires only one handshake and only a data frame is transmitted each time the data is transmitted. Under THE HTTP protocol, each request needs to carry complete request header information, such as User-Agent, Referer and Host. So WebSocket is much less expensive than HTTP.
More real-time
Because the protocol is full-duplex, the server can proactively send data to the client at any time. Compared with the Q&A HTTP, the latency of data transmission under WebSocket protocol is significantly less.
Binary message transfer is supported
WebSocket defines binary frames to make it easier to process binary content.
Support extended
Developers can extend the protocol or implement partially customized sub-protocols.
Better compression
Websocket, with appropriate extension support, can use the context of the previous content. This can significantly improve the compression rate when passing similar structured data.
WebSocket protocol specification
WebSocket is a communication protocol whose specifications and standards are recorded in RFC6455. There are 14 parts of the agreement, but only 11 parts are related to the protocol specification:
- introduce
- Terms and other conventions
- WebSocket URI
- Shake hands with specification
- Data frame
- Send and receive data
- Close the connection
- Error handling
- extension
- Communications security
- Matters needing attention
The contents related to Chat in this paper are part 4, 5, 6 and 7, which are also important contents in WebSocket. Next, we will learn these knowledge.
A two-end interactive process
Before the connection between the client and server is successful, the communication protocol is HTTP. After the connection is successful, the WebSocket protocol is used. The following figure illustrates the flow of a two-end interaction:
First, the client sends an HTTP request to the server, carrying information specified by the server and indicating that it wants to upgrade the protocol to WebSocket. This request is called an upgrade request, and the entire process of a two-end upgrade protocol is called a handshake. Then the server verifies the message sent by the client, replaces the protocol with WebSocket if it meets the specification, and responds to the client with the message that the upgrade is successful. Finally, both parties can push information to each other based on WebSocket protocol. Now, the first thing we need to learn is handshake.
Double side shaking hands
Let’s start with RFC6455’s Opening Handshak. The passage reads as follows:
The opening handshake is intended to be compatible with HTTP-based server-side software and intermediaries, so that a single port can be used by both HTTP clients talking to that server and WebSocket clients talking to that server. To this end, the WebSocket clientHandshake is an HTTP Upgrade Request: GET /chat HTTP/1.1 Host: server.example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Origin: http://example.com Sec-WebSocket-Protocol: chat, superchat Sec-WebSocket-Version: 13 In compliance with [RFC2616], header fields in the handshake may be sent by the client in any order, so the order in which different header fields are received is not significant.Copy the code
The handshake is not WebSocekt, but HTTP. The handshake is called an upgrade request. During the handshake phase, the client notifies the server through the Connection and Upgrade header fields and corresponding values to Upgrade the current communication protocol to a specified protocol, in which the WebSocket protocol is specified. Other header domain names and values are used as follows:
GET/chat HTTP / 1.1
The request is based on HTTP/1.1, and the request mode isGET
;Sec-WebSocket-Protocol
Used to specify subprotocols;Sec-WebSocket-Version
Indicates the protocol version. The versions on both ends must be consistent. The default version of the Current WebSocekt protocol is13
;Origin
Indicate which site the request came from;Host
Indicates the target host;Sec-WebSocket-Key
Prevents attackers from spoofing the server.
In other words, the client only needs to send an HTTP request to the server as described above.
After receiving a request from a client, the server verifies the request information according to RFC6455. If the authentication succeeds, the handshake succeeds. In this case, the server should respond to the client with the following content as agreed:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat
Copy the code
The server provides a response status code representing the connection result. The 101 status code indicates that the request is successfully processed by the server. Connection and Upgrade indicate that the Websocket protocol is switched. Sec-websocket-accept is an encrypted sec-websocket-key that is confirmed by the server. This value is generated based on the sec-websocket-key sent by the client. Sec-websocket-protocol Indicates the subprotocol of the two-end convention.
This completes the handshake between the client and server. The communication protocol will be switched from HTTP to WebSocket.
Send and receive data
After the two parties shake hands and confirm the agreement, they can send messages to each other. The client and server send messages to each other in the same way that we send messages to each other in social applications. For example:
client: Hello, Server boy.
server: Hello, Client Man.
Copy the code
Of course, Hello Server boy and Hello Client Man are metaphors that help us understand. In fact, the data transfer format in the WebSocket protocol is not directly presented in this way.
Data frame
WebSocket transmits data frames one by one. The convention of data frames is as follows:
In the WebSocket Protocol, data is transmitted using a sequence of frames. To avoid confusing network intermediaries (such as intercepting proxies) andfor security reasons that are further discussed inSection 10.3, a client MUST mask all frames that it sends to the server (see Section 5.3)for further details). (Note that masking is done whether or not the WebSocket Protocol is running over TLS.) The server MUST close the connection upon receiving a frame that is not masked. In this case, a server MAY send a Close frame with a status code of 1002 (protocol error) as defined in Section 7.4.1. A server MUST NOT mask any frames that it sends to the client. A client MUST close a connection if it detects a masked frame. In this case, it MAY use the status code 1002 (protocol error) as defined in Section 7.4.1. (These rules might be relaxed in a future specification.)
The base framing protocol defines a frame type with an opcode, a payload length, and designated locations for "Extension data" and "Application data".which together define the "Payload data". Certain bits and opcodes are reserved for future expansion of the
protocol.
A data frame MAY be transmitted by either the client or the server at any time after opening handshake completion and before that endpoint has sent a Close frame (Section 5.5.1).
Copy the code
The protocol does not use Unicode encoding for data transfer, but data frames. The following figure describes the composition of data frames:
A Data frame consists of FIN, RSV1, RSV2, RSV3, opcode, MASK, Payload Length, Payload Data, and Masking-key. Let’s look at the general meaning or role of the data frame component.
FIN
The value is 0 or 1, and the corresponding meanings are as follows:
0: is not the last shard of the message. 1: is the last shard of the message.Copy the code
RSV1 RSV2 RSV3
Both of them occupy 1 bit. Generally, the value is 0. When the client and server negotiate to use WebSocket extension, the three flag bits can be non-0, and the meaning of the value is defined by the extension. If a non-zero value occurs but the WebSocket extension is not used, the connection fails.
Opcode
The value can be any of %x0, %x1, %x2, %x3~7, %x8, %x9, %xA, or %xB~F. Values have the following meanings:
%x0: indicates a continuation frame. When Opcode is 0, it indicates that the data transmission adopts a data fragment, and the received data frame is one of the data fragments. %x1: indicates a text frame; %x2: indicates a binary frame; %x3-7: reserved operation code for subsequent defined non-control frames; %x8: indicates that the connection is disconnected. It is a control frame. %x9: indicates a heartbeat request (ping); %xA: indicates a heartbeat response (pong); % xb-f: reserved operation code for subsequent defined control frames;Copy the code
Mask
Accounting for 1 bit, the value is 0 or 1. A value of 0 indicates that a mask xor operation is to be performed on the data and vice versa.
Payload length
The value contains 7 bits, 7+16 bits, or 7+64 bits. The value can be any number from 0 to 127. Values have the following meanings:
0 to 126: The data length is equal to the value. 126: The next two bytes represent a 16-bit unsigned integer whose value is the length of the data; 127: The next 8 bytes represent a 64-bit unsigned integer (highest bit 0) whose value is the length of the data.Copy the code
mask
Masks are not intended to prevent data leaks, but rather to prevent proxy cache contamination attacks, a problem that existed in earlier versions of the protocol. Note that when sending data from the client to the server, mask the data. When sending data from the server to the client, there is no need to mask the data.
If the data received by the server has not been masked, the server needs to disconnect the data. If Mask is 1, Masking key is defined in Masking-key and used to Mask the data payload.
Mask 1 is used for all data frames sent by the client to the server.
Mask algorithm: performs cyclic xor operation by bit. First, modulus is taken from the index of the bit to obtain the value x corresponding to Masking-key, and then xor is performed between the bit and X to obtain the real byte data.
Making-key
Bytes are 0 or 4 bytes. The value is 0 or 1. Values have the following meanings:
0: no Masking-key. 1. Masking-key is available.Copy the code
Payload Data
After receiving the Data frame, the two ends can process the Payload Data or directly extract the Data according to the values of the Data frame components.
Data Sending and Receiving Process
After knowing the format of data frame transmitted by WebSocket, we will learn the process of data sending and receiving. After a WebSocket connection is established between the two ends, either end can send a message to the other end. In this case, a message refers to a data frame. But usually we input or output information is “plaintext”, so before the message is sent, the “plaintext” needs to be converted into data frames through certain methods. At the receiving end, the data frame needs to be converted into “plaintext” according to certain rules. The following figure describes the main process of sending and receiving Hello and World at both ends:
Keep and close connections
The WebSocket dual end connection can remain open for a long time, but in practice this is not the case. If you keep all the connections open, but have many inactive members in the connection, it can be a serious waste of resources.
How does the server determine if the client is active?
The server periodically sends a data frame with opcode %x9 to all clients. This data frame is called a Ping frame. When a client receives a Ping frame, it must reply with a data frame with opcode %xA (also known as a Pong frame), otherwise the server can voluntarily disconnect. On the other hand, if the server sends a Ping frame and receives a Pong frame response from the client, the client is active and do not disconnect.
To close the connection, one end sends the other a data frame with opcode %x8, which is called a closed frame.
Put a AD
If you found this article helpful, I hope you can subscribe to my Chat on GitChat and support me to continue to share high-quality articles.
WebSocket Protocol by GitChat
GitChat: The CRUD Of Databases and Collections
Actual code reading -Python
All the above are the WebSocket protocol specifications stipulated in RFC6455. After studying the theoretical knowledge, we can use some examples (code pseudocode) to deepen our understanding of the above knowledge.
Echo Test is a Test platform provided by WebSocket.org that allows developers to Test webSocket-related connection, message sending, and message receiving capabilities. The code demo below will also be based on the Echo Test.
Client handshake
As mentioned above, when the client sends an upgrade request to the server, the request header is as follows:
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
Copy the code
The corresponding Python code is as follows:
import requests
url = 'http://echo.websocket.org/?encoding=text'
header = {
"Host": "echo.websocket.org"."Upgrade": "websocket"."Connection": "Upgrade"."Sec-WebSocket-Key": "9GxOnSwEuBNbLeBwiltymg=="."Origin": "http://www.websocket.or"."Sec-WebSocket-Protocol": "chat, superchat"."Sec-WebSocket-Version": "13"
}
resp = requests.get(url, headers=header)
print(resp.status_code)
Copy the code
When the code runs, it returns a result of 101, indicating that the code above did the work of the upgrade request.
Data is converted to data frames
Converting data to data frames involves a lot of knowledge and running a full WebSocket client. This Chat does not demonstrate the complete code structure, only the corresponding code logic. A full WebSocket client can be cloned on Github from the open source library I wrote: AioWebsocekt.
After cloning to the local open freams.py, this is responsible for the data frame conversion process of the main file.
First look at the write() method, through which data passes as it is sent by the sender. The complete code for the write() method is as follows:
async def write(self, fin, code, message, mask=True, rsv1=0, rsv2=0, rsv3=0):
"""Converting messages to data frames and sending them. Client data frames must be masked,so mask is True. """
head1, head2 = self.pack_message(fin, code, mask, rsv1, rsv2, rsv3)
output = io.BytesIO()
length = len(message)
if length < 126:
output.write(pack('! BB', head1, head2 | length))
elif length < 2**16:
output.write(pack('! BBH', head1, head2 | 126, length))
elif length < 2**64:
output.write(pack('! BBQ', head1, head2 | 127, length))
else:
raise ValueError('Message is too long')
if mask:
# pack mask
mask_bits = pack('! I', random.getrandbits(32))
output.write(mask_bits)
message = self.message_mask(message, mask_bits)
output.write(message)
self.writer.write(output.getvalue())
Copy the code
First, the pack_message() method is called to construct FIN, Opcode, RSV1, RSV2, and RSV3 in the data frame. The Payload length of the data frame is constructed based on the length of the message. The data is then masked according to whether the sender is a client or a server. Finally, the data is put into the data frame, and the data frame is sent to the receiver. The pack_message() method used here is as follows:
@staticmethod
def pack_message(fin, code, mask, rsv1=0, rsv2=0, rsv3=0):
"""Converting message into data frames conversion rule reference document: https://tools.ietf.org/html/rfc6455#section-5.2"""
head1 = (
(0b10000000 if fin else 0)
| (0b01000000 if rsv1 else 0)
| (0b00100000 if rsv2 else 0)
| (0b00010000 if rsv3 else 0)
| code
)
head2 = 0b10000000 if mask else 0 # Whether to mask or not
return head1, head2
Copy the code
The message_mask() method used to perform the mask operation is as follows:
@staticmethod
def message_mask(message: bytes, mask):
iflen(mask) ! = 4: raise FrameError("The 'mask' must contain 4 bytes")
return bytes(b ^ m for b, m in zip(message, cycle(mask)))
Copy the code
This is the main code for converting data into data frames and sending them to the receiver.
Data frames are converted to data
Again with the freams.py file, this time we’ll look at the read() method. After the receiver receives the data, it passes through this method. The complete code for the read() method is as follows:
async def read(self, text=False, mask=False, maxsize=None):
"""return information about message """
fin, code, rsv1, rsv2, rsv3, message = await self.unpack_frame(mask, maxsize)
await self.extra_operation(code, message) Determine subsequent operations based on the opcode
if any([rsv1, rsv2, rsv3]):
logging.warning('RSV not 0')
if not fin:
logging.warning('Fragmented control frame:Not FIN')
if code is DataFrames.binary.value and text:
if isinstance(message, bytes):
message = message.decode()
if code is DataFrames.text.value and not text:
if isinstance(message, str):
message = message.encode()
return message
Copy the code
First, the unpack_frame() method is called to extract FIN, Opcode, RSV1, RSV2, RSV3, and Payload Data (message in the code) from the Data frame. Subsequent actions are then determined according to Opcode, such as extracting data, closing the connection, sending a Ping or Pong frame, etc.
The complete code for the unpack_frame() method is as follows:
async def unpack_frame(self, mask=False, maxsize=None):
reader = self.reader.readexactly
frame_header = await reader(2)
head1, head2 = unpack('! BB', frame_header)
fin = True if head1 & 0b10000000 else False
rsv1 = True if head1 & 0b01000000 else False
rsv2 = True if head1 & 0b00100000 else False
rsv3 = True if head1 & 0b00010000 else False
code = head1 & 0b00001111
if (True if head2 & 0b10000000 elseFalse) ! = mask: raise FrameError("Incorrect masking")
length = head2 & 0b01111111
if length == 126:
message = await reader(2)
length, = unpack('! H', message)
elif length == 127:
message = await reader(8)
length, = unpack('! Q', message)
if maxsize and length > maxsize:
raise FrameError("Message length is too long)".format(length, maxsize))
if mask:
mask_bits = await reader(4)
message = self.message_mask(message, mask_bits) if mask else await reader(length)
return fin, code, rsv1, rsv2, rsv3, message
Copy the code
Bitwise and operations are used to extract components such as FIN, RSV1, Opcode, and Payload Data (message in code) from Data frames. If you don’t know much about bit-operation, you can refer to my article “Seven-minute comprehensive Understanding of bit-operation” published on wechat public account. The message_mask() method is then called based on whether the mask is present, and the resulting component is returned to the caller.
conclusion
In this Chat, we looked at the origins of the WebSocekt protocol and discussed its benefits. Then read the convention of WebSocket in RFC6455 and learn the knowledge of two-end interaction process, keeping and closing connections. Finally, learn how to convert the WebSocket protocol into concrete code.
WebSocket has several key points: handshake, data to data frame conversion, Ping and Pong frames to keep the connection alive, and close frames to actively close the connection. Hopefully, after reading this Chat, you will have a new understanding of the WebSocket protocol.