MDNWriting WebSocket Servers MDNWriting WebSocket Servers MDNWriting WebSocket Servers The length is slightly longer, the personal ability is limited unavoidably some mistakes, throw off a brick to attract jade to progress together.

The nature of a WebSocket server

A WebSocket server is simply a TCP application that listens on any port of the server following a special protocol. The task of building a custom server is often daunting. However, implementing a simple Websocket server based on it is not that troublesome.

A WebSocket Server can be implemented using any server-side programming language that implements basic Berkeley Sockets. Examples are C (++), Python, PHP, server-side JavaScript(Node.js). The following is not a language-specific tutorial, but a guide to building your own server.

We need to understand how HTTP works and have moderate programming experience. Understanding TCP Sockets is also necessary with language-specific support. The scope of this tutorial is to cover the minimum knowledge required to develop a WebSocket Server.

This article will explain a WebSocket Server from a very low-level point of view. WebSocket Servers are usually separate, dedicated servers(for load balancing and other reasons), so a reverse proxy (such as a standard HTTP server) is usually used to discover the WebSocket handshake protocol, Preprocess them and send client information to the real WebSocket Server. This means that WebSocket Server doesn’t have to be riddled with cookie and signature processing. It can be handled in a proxy.

Websocket handshake rules

First, the server must listen for incoming socket connections using the standard TCPsocket. Based on our platform, this is probably handled by us (mature server-side languages provide these interfaces so we don’t have to start from scratch). For example, suppose our server listens on port 8000 for example.com, and the Socket Server responds to a GET request for /chat.

Warning: The server can choose to listen on any port, but may experience firewall or proxy problems beyond 80 or 443. Port 443 is fine in most cases, but requires a secure connection (TLS/SSL) of course. Also, note that most browsers do not allow connections from secure pages to insecure Websocket servers. The handshake in WebSockets is the Web, the bridge between HTTP and WS. By handshake, the details of the connection are determined, and each part can be terminated before completion if the condition is not met. The server must carefully parse all information requested by the client, or security problems will occur.

Client handshake request

Even though we are developing a server, the client still needs to initiate a Websocket handshake. So we must know how to parse client requests. The client will send a standard HTTP request, similar to the example below (HTTP version 1.1 and above, request method GET).

    GET /chat HTTP/1.1
    Host: example.com:8000
    Upgrade: websocket
    Connection: Upgrade
    Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
    Sec-WebSocket-Version: 13Copy the code

This is where the client can initiate extensions or sub-protocols. See Miscellaneous for more details. Similarly, public headers like user-agent, Referer, Cookie, or authentication can also include, in one sentence, do what you want. These are not directly related to WebSockets, and it is safe to ignore them. In many common Settings, a proxy server handles these messages.

If some of the headers are unrecognized or have invalid values, the server should send a ‘400 Bad Request’ and immediately close the socket. The reason for the handshake failure is usually given in the HTTP return body, but this information may not be displayed (because the browser does not display them). If the server does not recognize the Version of WebSockets, it should return an SEC-websocket-version header indicating the acceptable Version (preferably V13 and up to date). Let’s take a look at the most mysterious header, sec-websocket-key.

Tip:

  • All browsers will send an Origin header, and we can use this header to do security restrictions (check if the Origin is the same). If the Origin is not as expected, return a 403 Forbidden. Then note that non-browser clients can send a fake Origin, and many applications will reject requests without that header.
  • The request resource locator (/chat in this case) is not clearly defined in the specification, so many people use it cleverly to have a single server handle multiple WebSocket applications. For example, example.com/chat can point to a multi-user chat app, while /game on the same server points to a multi-user game. That is, paths under the same domain name can point to different applications.
  • The canonical HTTP code can only be used before the handshake, and when the handshake is successful, a different code set should be used. See section 7.4 of the specification

Server handshake return

When the server receives the request, it should send a rather strange response that looks something like this, but still follows the HTTP specification. Note that each header ends with \r\n and adds an extra \r\n after the last one.

    HTTP/1.1 101 Switching Protocols
    Upgrade: websocket
    Connection: Upgrade
    Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=Copy the code

In addition, the server can decide to extend or subprotocol requests here. Please see Miscellaneous for more details. The sec-websocket-accept part is interesting. The server must get it based on the sec-websocket-key requested by the client as follows: Link sec-websocket-key to “258eAFa5-E914-47DA-95CA-C5AB0DC85b11”, hash the result through SHA-1, and return the Base64 encoding of the result.

Since this seemingly complicated process exists, the client does not care whether the server supports Websockets. In addition, the importance of this process is security, if a server parses a Websocket connection as an HTTP request, there is no small problem.

So, if the key is “dGhlIHNhbXBsZSBub25jZQ==”, Accept will be “s3pPLMBiTxaQ9kYGzzhZRbK+xOo=”, and once the server sends these headers, the handshake protocol is complete.

The server can send other headers like set-cookie, request signature, redirect, and so on before replying to the handshake.

Trace client

Although not directly related to the Websocket protocol, it is worth noting. The server will keep track of client sockets, so we don’t have to do a handshake with a client that has already completed the handshake protocol. The same client IP address can be tried multiple times (but the server can choose to reject it if they try multiple times to save their denial-of-service trail)

FramesEdit data exchange

Both the client and the server can send messages at any time, which is the magic of WebSocket. Extracting information from data frames, however, is less magical. Although all frames follow the same specific format, data from the client to the server is processed through XOR encryption (using a 32-bit key), which is described in detail in Chapter 5 of the specification.

format

Each data frame sent from the client to the server follows the following format:

The frame format: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127)   |
     | |1|2|3|       |K|             |                               |
     +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
     |     Extended payload length continued, if payload len == 127  |
     + - - - - - - - - - - - - - - - +-------------------------------+
     |                               |Masking-key, if MASK set to 1  |
     +-------------------------------+-------------------------------+
     | Masking-key (continued)       |          Payload Data         |
     +-------------------------------- - - - - - - - - - - - - - - - +
     :                     Payload Data continued ...                :
     + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
     |                     Payload Data continued ...                |
     +---------------------------------------------------------------+Copy the code

MASK (MASK: a string of binary code that performs bitwise operations on the target field, masking the current input bit.) Bits only indicate whether the message has been masked. Messages from the client must be processed, so we should set it to 1(in fact, section 5.1 shows that the server must disconnect if the client sends unmasked messages) when sending a frame to the client, do not process the data and do not set the mask bit. Here’s why. Note: we must process messages using a secure socket. Rsv1-3 can be ignored; this is the bit to be extended.

The opcode field defines how to parse valid data:

  • 0x0 Continue processing
  • 0x1 text(must be UTF-8 encoding)
  • 0x2 Binary and other data called control code.
  • 0x3-0x7 0xB-0xF This version of WebSockets is not valid

FIN indicates whether it is the last message in the data set. If it is 0, the server continues to listen for the rest of the message. Otherwise, the server assumes that the message has been fully sent.

Valid encoding data length

In order to parse validly encoded data, we must know when to end. This is important to know the length of valid data. Unfortunately, there are some complications. Let’s look at it step by step.

  1. Read 9-15 bits and interpret them as unsigned integers, if less than or equal to 125, this is the length of the data. If it is 126, go to Step 2. If it is 127, read Step 3
  2. Read the next 16 bits and interpret them as unsigned integers, end
  3. Read the next 64 bits and interpret them as unsigned integers, end

Read and invert the mask data

If the MASK bit is set (and of course it should be, for a message from client to server), the last 4 bytes (i.e., 32 bits), the encrypted key, are read. Once the data length and encryption key are decoded, we can read batches of bytes directly from the socket. Get the encoded data and the mask key, decode it, loop over the encrypted byte (octets, the unit of text data) and perform xOR operation with the (I %4) bit mask byte (mod I divided by 4), if using JS as follows (this rule is only the encryption and decryption rule, there is no need to go further, As long as you know how to use it.

var DECODED = "";
    for (var i = 0; i < ENCODED.length; i++) {
        DECODED[i] = ENCODED[i] ^ MASK[i % 4];
    }Copy the code

Now we can see what the decoded data means for our application.

Message segmentation

The FIN and Opcode fields work together to split a message into individual frames, a process called message splitting, which is only available if opcodes are 0x0-0x2 (as mentioned earlier, other values are meaningless in the current version).

Recall that Opcode specifies what a frame is going to do, and if it’s 0x1, the data is text. If it is 0x2, the verse is binary data. However, when it is 0x0, the frame is a continuation frame, indicating that the server should link the valid data for that frame to the last frame received by the server. This is a sketch of how the server reacts when a client sends a text message, the first message is sent in a single frame, while the second message consists of three frames. The FIN and Opcode details are only shown to the client. A look at the following example should make it easier to understand.

Client: FIN=1, opcode=0x1, msg="hello"Server: (Message transmission is complete.) Hi. Client: FIN=0, opCode =0x1, MSG ="and a"Server: (listens, new message contains the start text) Client: FIN=0, opcode=0x0, MSG ="happy new"Client: FIN=1, opCode =0x0, MSG ="year!"Server: Happy New Year to you too!Copy the code

Note: The first frame includes a complete message (FIN=1 and opcode! =0x0), so the server can return when it finds the end. Frame 2 valid data is text(opCode =0x1), but the full message does not arrive (FIN=0). All the rest of the message is sent via the continue frame (OpCode =0x0) and is finally identified as a frame via FIN=1.

WebSockets heartbeat: Ping and pong

At any point after the handshake acceptance, either the client or the server can choose to send a ping to the other part. When ping is received, the receiver must return a Pong as soon as possible. We can use this method to ensure that the connection is still valid.

A ping or pong is just a regular frame, but a control frame. Pings’ Opcode is 0x9, and Pong is 0xA. When we get a ping, we return Pong with exactly the same valid data. (For ping and Pong, the maximum effective data length is 125.) We might get a Pong without sending a ping. Please ignore this case.

Before sending pong, if we receive more than one ping, we need only respond with one pong.

Close the connection

To close the connection between the client and the server, we can start the closed handshake protocol by sending a control frame containing data for a specific control queue. When this frame is received, the other party sends a closed frame in response. The former then closes the connection. Any data received after the connection is closed is discarded.

More and more

The WebSocket extension and subprotocol are agreed through headers during the handshake. Sometimes extensions and subprotocols are too similar to be distinguished. The basic difference is that the extension controls the WebSocket frame and modifies the valid data. However, subprotocols constitute websocket valid data and never modify anything. Extensions are optional and generalized, and subprotocols are necessarily limited.

extension

Think of the extension as compressing a file before sending it, no matter what you do, you will send the same data just with different frames. The recipient will end up receiving the same data as your local copy, but in a different way. That’s what extensions do. Websockets define a protocol and a basic way to send data, while extensions such as compression can deliver the same data in shorter frames.

The child agreement

Think of a subprotocol as a custom XML table or document type specification. You are using XML and its syntax, but you are limited to the structure you agree on. This is the case with the WebSocket subprotocol. They don’t introduce any other fancy stuff, just establish the structure, like a document type and table, that both parts (client & Server) agree on. Unlike document types and tables, subprotocols are implemented by the server and cannot be referenced by the client. A client must request a specific subprotocol, and to do so will send something like the following as part of the original handshake.

A GET HTTP / 1.1 / chat... Sec-WebSocket-Protocol: soap, wampCopy the code

Or the equivalent

. Sec-WebSocket-Protocol: soap Sec-WebSocket-Protocol: wampCopy the code

Now, the server must select a protocol that the client suggests and supports. If there is more than one, send the first one sent by the client. Imagine that our server could use either SOAP or WAMP, and then the returned handshake would be sent in the following form.

Sec-WebSocket-Protocol: soapCopy the code

The server cannot send more than one SEC-Websocket-protocol header. If the server does not want to use any of the subprotocols, it should not send the SEC-Websocket-protocol header. It is an error to send a blank header. The client may close the connection if the desired subprotocol cannot be obtained.

If we want our server to adhere to certain subprotocols, naturally we need extra code in our server. Imagine we use a subprotocol JSON, and based on that subprotocol, all the data will be passed as JSON. If a client solicits a subprotocol and the server wants to use it, the service you need to have a JSON parse. To be honest, there will be a library of tools, but the server will also need to pass data.

To avoid name conflicts, it is recommended to select a part of domain as the name of the subprotocol. If we were developing a chat app that uses a specific format, we might use the name sec-websocket-protocol: chat.example.com Note that this is not required. Just as an optional convention, we can use any character we want.

conclusion

The purpose of the translation of this document is to see most of them are in Chinese on the websocket client related content, and interested in server-side implementation, there is no find the right information, had to read the English, in line with its English translation, the purpose of improve their hopes to be helpful to other students, the original view. Look forward to node’s hands-on implementation of webSocket servers.

Source Document Source

MDNWriting WebSocket Servers