WebSocket is designed to solve the problem of two-way communication because HTTP is designed to be one-way and can only be sent from one side to the other. On the other hand, HTTP is built on the TCP connection, HTTP request will turn TCP off, and THE TCP connection itself is a long connection, as long as both sides of the connection continue to close the connection it will always be connected, so it is necessary to make a WebSocket thing? We can consider how to implement long connections without using websockets:

(1) HTTP has a keep-alive field. This field is used to reuse TCP connections. A TCP connection can be used to send multiple HTTP requests, so as to avoid three handshakes for new TCP connections. The keep-alive time server, such as Apache, is 5s, while nginx’s default time is 75s. If this time is exceeded, the server will actively close the TCP connection, because otherwise a large number of TCP connections will occupy system resources. Therefore, keep-Alive is not designed for long connections, but only for improving the efficiency of HTTP requests. As mentioned above, HTTP requests are one-way, with either the server sending data or the client uploading data.

(2) Using HTTP polling, which is also a very common method, before websocket, basically the chat function of the web page is so realized, every few seconds to the server to send a request to pull new messages. The problem with this approach is that it also requires constantly establishing TCP connections, and HTTP headers are large and inefficient.

(3) Directly establish a TCP connection with the server and keep the connection uninterrupted. This is not possible, at least on the browser side, because there is no API. So WebSocket directly establishes a TCP connection with the server.

A TCP connection is established using a socket. If you have written Linux services, you know how to establish a TCP connection using the underlying API (C language), which uses the socket. The process is as follows:

// Create a socket and return a handle similar to the tId returned by setTimout
// AF_INET means using an IPv4 address, SOCK_STREAM means establishing a TCP connection (as opposed to UDP)
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
// Bind the socket handle to an address such as localhost:9000
bind(sockfd, servaddr, sizeof(servaddr));
// Start socket listening with a maximum pending count of 100
listen(sockfd, 100);Copy the code

The client also uses the socket to connect:

The client also creates a socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
// Connect to a serverAddr with this socket
connect(sockfd, servaddr, sizeof(servaddr));
// Send data to this socket
send(sockfd, sendline, strlen(sendline), 0);
// Close the connection
close(sockfd);Copy the code

That is, both TCP and UDP connections are created using sockets, hence the name WebSocket, which is essentially a socket and has become a standard. Browsers open up apis that allow web developers to create sockets directly to communicate with servers. And it’s up to you to decide when the socket is closed, unlike HTTP, where the browser or server automatically closes the TCP socket connection.

So WebSocket is not a magic thing, it is a socket. At the same time, WebSocket has to rely on the existing network infrastructure, which would be costly if it started from scratch. The only thing it could connect to the service before was HTTP requests, so it had to use HTTP requests to set up a native socket connection, hence the protocol conversion stuff.

Setting up a WebSocket connection for the browser is as simple as a few lines of code:

// Create a socket
const socket = new WebSocket('ws: / / 192.168.123.20:9090');
// The connection succeeded
socket.onopen = function (event) {
    console.log('opened');
    // Send data
    socket.send('hello, this is from client');
};Copy the code

Because the browser is already documented, what about the server that creates a WebSocket? Here we put aside Chrome source code, first look at the server side of the implementation, and then look at the browser client side of the implementation. Prepare to implement a WebSocket server in Node.js to see how the whole process of establishing a connection and receiving and sending data works.

WebSocket has been standardized in RFC 6455. As long as we implement it according to the provisions of the document, we can interconnect with the browser. This document is interesting, especially in Part 1, interested readers can have a look. Readers can try to implement one themselves if they have time, and then go back and compare this implementation.

1. Establish the connection

Create a Hello, world HTTP service using Node.js as shown in index.js:

let http = require("http");
const hostname = "192.168.123.20"; // 或者是localhost
const port = "9090";

// Create an HTTP service
let server = http.createServer((req, res) = > {
    // The request was received
    console.log("recv request");
    console.log(req.headers);
    // Respond and send data
    // res.write('hello, world');
    // res.end();
});

// Start listening
server.listen(port, hostname, () => {
    // The startup succeeded
    console.log(`Server running at ${hostname}:${port}`);
});Copy the code

Notice that there are no errors and exception handling, which is omitted. In the actual code, exception handling is needed to improve the robustness of the program, especially for this kind of server class service. You can’t let a single request bring down the entire server. Refer to the node.js documentation for error handling.

Save the file and execute Node index.js to start the service.

Then write an index. HTML and request the service:


       
<html>
<head>
    <meta charset="utf-8">
</head>
<body>
<script>
!function() {
    const socket = new WebSocket('ws: / / 192.168.123.20:9090');
    socket.onopen = function (event) {
        console.log('opened');
        socket.send('hello, this is from client'); }; } ();</script>
</body>
</html>Copy the code

Node.js has another upgrade event:

// Protocol upgrade
server.on("upgrade", (request, socket, head) => {
    console.log(request.headers);
});Copy the code

Because WebSocket needs to be upgraded first, the upgrade request can be received in upgrade. Print the received request header as follows:

{host: ‘192.168.123.20:9090’, connection: ‘Upgrade’, pragma: ‘no-cache’, ‘cache-control’: ‘no-cache’, Upgrade: ‘websocket’, Origin: ‘http://127.0.0.1:8080’, ‘sec-websocket-version’: ’13’, ‘user-agent’: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36’, ‘accept-encoding’: ‘gzip, deflate’, ‘accept-language’: ‘en,zh-CN; Q = 0.9, useful; Q = 0.8, useful – TW; Q =0.7’, ‘sec-websocket-key’: ‘KR6cP3rhKGrnmIY2iu04Uw==’, ‘sec-websocket-extensions’: ‘permessage-deflate; client_max_window_bits’ }

This is the first request we receive to establish a connection. There are two key fields in this request, one is connection: ‘Upgrade’ indicates that it is an Upgrade protocol request, and the other is sec-websocket-key, which is a random Base64 string used to identify the other party, as will be used below.

We need to respond to this request, and according to the documentation, we need to include the following fields:

server.on("upgrade", (request, socket, head) => {
    let base64Value = ' ';
    // The first line is the Response line, which returns the status code 101
    socket.write('HTTP/1.1 101 Web Socket Protocol Handshake\r\n' +
        // HTTP response header fields are separated by \r\n
        'Upgrade: WebSocket\r\n' +
        'Connection: Upgrade\r\n' +
        // This is an identity string for the browser
        `Sec-WebSocket-Accept: ${base64Value}\r\n` +
        '\r\n');
});Copy the code

The first line is the response line, which contains the HTTP version number, status code 101, and description of the status code. Each header field is separated by \r\n, of which the most critical one is sec-websocket-accept, which needs to be computed to return to the browser. So how do we calculate that? The document states:

GUID(Globally_Unique_Identifier) = ‘258EAFA5-E914-47DA-95CA-C5AB0DC85B11’

Sec-WebSocket-Accept = base64(sha1(Sec-Websocket-key + GUID))

Using the sec-websocket-key value that the browser gave me, I typed a fixed string called the global unique identifier, then took its SHA1 value, base64 encoded it, and returned it to the browser. If the browser detects an incorrect value, it throws an exception and rejects the connection:

Since it finds out that you are a fake WebSocket service, at least not in accordance with the document implementation, so not the same world, no common language, the following communication is not necessary.

To calculate this value, a sha1 library is introduced, and the Base64 conversion can use node.js’s Buffer conversion, as shown in the following code:

let sha1 = require('sha1');
// Protocol upgrade
server.on("upgrade", (request, socket, head) => {
    // Retrieve the key sent by the browser
    let secKey = request.headers['sec-websocket-key'];
    // Global identifier (GUID) specified in RFC 6455
    const UNIQUE_IDENTIFIER = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11';
    // Calculate sha1 and base64 values
    let shaValue = sha1(secKey + UNIQUE_IDENTIFIER),
        base64Value = Buffer.from(shaValue, 'hex').toString('base64');
    socket.write('HTTP/1.1 101 Web Socket Protocol Handshake\r\n' +
        'Upgrade: WebSocket\r\n' +
        'Connection: Upgrade\r\n' +
        `Sec-WebSocket-Accept: ${base64Value}\r\n` +
        '\r\n');
});Copy the code

The accept value calculated using the key sent by the browser above is:

RWMSYL3Zmo91ZR+r39JVM2+PxXc=

Send that value to the browser, and Chrome won’t report that the check just went wrong, checked the eye, met the right person. The WebSocket connection is set up. Yes, it’s that simple. Websocket connections in the Chrome Developer Tools Network panel will now change from pending to 101 or 200 if the connection is closed.

The browser code also sends a message after the connection is established:

socket.send('hello, this is from client');Copy the code

How do I read this data?

2. Receive data

Data transmission, the document specifies the WebSocket data frame format, which looks like this:

Don’t be intimidated by this, it’s pretty easy to break it down. The frame header field is mainly used to interpret the frame. For example, if the first bit FIN is set to 1, it indicates that it is an end frame. If the Data is long, it is divided into several frames and sent. A FIN value of 1 indicates that it is the last frame of the current data stream. The fourth to seventh opcode is used for instruction control. A value of 1 means Payload Data is text, 2 means binary content, and 8 means connection is closed. Payload Len indicates the number of bytes of the Payload. The maximum number of binary bytes is 127. If the Payload number is larger than 127, the Extended Payload length is required.

Mask at bit 8, if set to 1, indicates that the payload content of the frame has been masked. Frames sent from the client to the server need to be masked, while frames sent from the server to the client do not need to be masked. Why is the mask used and how does the mask calculation work? The calculation of the mask is simple. The Data to be sent is xxed or combined with another number and then added to the Payload Data, which is the Masking-key in the Data frame. It is a 32-bit number. The receiver can get the original Data by adding or multiplying the Payload Data.

a ^ b ^ b = a

And the making-key requirement in each frame is random and not predictable by the service. Why? Here’s what the document says:

The unpredictability of the masking key is essential to prevent authors of malicious applications from selecting the bytes that appear on the wire

This explanation is a bit vague, and someone on Stackoverflow says it’s to avoid proxy Cache poisoning attacks. See Http Cache Poinsing for details.

So we need to retrieve the key value of the mask from this frame and restore the original Paylod data.

Data is sent and transmitted by the socket object, because it is not an HTTP request, so in the HTTP response function is not received data, in the upgrade event can get the socket, listen to the socket object data event, can get the received data:

socket.on('data', buffer => {
    console.log('buffer len = ', buffer.length);
    console.log(buffer);
});Copy the code

The data type returned is the Buffer object in Node.js. Print this Buffer:

buffer len = 32

<Buffer 81 9a 4c 3f 64 75 24 5a 08 19 23 13 44 01 24 56 17 55 25 4c 44 13 3e 50 09 55 2f 53 0d 10 22 4b>

This buffer is the data frame sent to us by the Websocket client, with a total of 32 bytes. The above printing is represented in hexadecimal, which can be changed to binary 0101. It can be compared with the above data frame format diagram one by one to explain the meaning and content of this data frame. Print it out as a raw binary representation:

1000000110011010010011000011111101100100011101010010010001011010000010000001100100100011000100110100010000000001001001000101011000010111010101010010010101001100010001000001001100111110010100000000100101010101001011110101001100001101000100000010001001001011

Refer to the packet format, as shown below:

Payload Len = 26 bytes of text. This is exactly the length of the text that was sent:

The Masking key value can be stored in the range of [16, 16 + 32]. Because the extended field is not required, the Masking key is directly followed by the Payload len, and then the Payload Data. The range is [48, 48 + 26 * 8].

So this is a complete data frame, and you need to mask the payload data and restore the original data. Do this in Node.js. The Buffer class in Node.js can only operate on the byte level, such as reading the NTH byte, but not on the NTH bit, such as reading the NTH bit of data. Therefore, an additional library was introduced. I found a BitBuffer online, but there seemed to be some problems with its implementation, so I implemented one myself.

Implement a BitBuffer that can read arbitrary bits, as shown in the following code:

class BitBuffer {
    // The constructor passes a Buffer object
    constructor (buffer) {
        this.buffer = buffer;
    }
    // Get the contents of the offset bit
    _getBit (offset) {
        let byteIndex = offset / 8 >> 0,
            byteOffset = offset % 8;
        // readUInt8 can read the NTH byte of data
        // Select the MTH bit of this number
        let num = this.buffer.readUInt8(byteIndex) & (1< < (7 - byteOffset));
        return num >> (7- byteOffset); }}Copy the code

The principle is very simple, first call the readUInt8 of Node.js Buffer to read the NTH byte of data, and then calculate the number of bits to read in this byte, through and operation, take out this bit, more bit operation can refer to: use js bit operation.

Check whether the Mask Flag at bit 8 is set with this code:

socket.on('data', buffer => {
    let bitBuffer = new BitBuffer(buffer);
    let maskFlag = bitBuffer._getBit(8);
    console.log('maskFlag = ' + maskFlag);
});Copy the code

Print maskFlag = 1. So how do you get consecutive n bits, like Opcode, from 4 to 7 bits. This is easy to do by taking the 4th to 7th digits and putting them together as a number:

getBit (offset, len = 1) {
    let result = 0;
    for (let i = 0; i < len; i++) {
        result += this._getBit(offset + i) << (len - i - 1); 
    }   
    return result;
}
Copy the code

This code isn’t very efficient, but it’s easy to understand. One catch is that JS displacement only supports 32-bit integer operations, and 1 << 31 becomes a negative number. Using this function to get a 32-bit mask value is problematic.

We can use this function to fetch opcode and payload len:

socket.on('data', buffer => {
    let bitBuffer = new BitBuffer(buffer);
    let maskFlag = bitBuffer.getBit(8),
        opcode = bitBuffer.getBit(4.4), 
        payloadLen = bitBuffer.getBit(9.7);
    console.log('maskFlag = ' + maskFlag);
    console.log('opcode = ' + opcode);
    console.log('payloadLen = ' + payloadLen);
});Copy the code

Print the following:

maskFlag = 1

opcode = 1

payloadLen = 26

GetBit = getBit; getBit = getBit; getBit = getBit; getBit = getBit

getMaskingKey (offset) {
    const BYTE_COUNT = 4;
    let masks = []; 
    for (let i = 0; i < BYTE_COUNT; i++) {
        masks.push(this.getBit(offset + i * 8.8));
    }   
    return masks;
}Copy the code

The mask value for this example starts at bit 16, so offset is 16:

let maskKeys = bitBuffer.getMaskingKey(16);
console.log('maskKey = ' + maskKeys);Copy the code

The printed maskKey is:

maskKeys = 76, 63, 100, 117

How to use this Mask Key for xOR? The document states:

j = i MOD 4

transformed-octet-i = original-octet-i XOR masking-key-octet-j

[makKey] [makKey] [makKey] [makKey] [makKey] [makKey] [makKey] [makKey] [makKey] [makKey]

getXorString (byteOffset, byteCount, maskingKeys) {
    let text = ' '; 
    for (let i = 0; i < byteCount; i++) {
        let j = i % 4;
        // Get the original UTF-8 encoding by xor
        let transformedByte = this.buffer.readUInt8(byteOffset + i)
                                  ^ maskingKeys[j];
        // Convert the encoding value to the corresponding character
        text += String.fromCharCode(transformedByte);
    }   
    return text;
}Copy the code

The xOR operation yields the encoding value, which can then be retrieved using String.fromCharcode. For example, 97 is restored to the letter ‘A’ according to an ASCII table.

Payload data (payload data) payload data (payload data)

let payloadLen = bitBuffer.getBit(9.7),
    maskKeys = bitBuffer.getMaskingKey(16);
let payloadText = bitBuffer.getXorString(48 / 8, payloadLen, maskKeys);
console.log('payloadText = ' + payloadText);Copy the code

The printed text is as follows:

payloadText = hello, this is from client

At this point, the received data is restored. If you want to send data, you need to reverse the process of reading data, according to the frame format to form a compliant frame to send to the other side, the difference is that the server frame data does not need a Mask, if you Mask, Chrome will report an exception, saying that the data does not need a Mask, refused to parse the received data.

Let’s look at the Websocket client implementation from Chrome source code to add some details.

Chrome webSockets code is SRC /net/ webSockets. For example, how does Chrome generate a random SEC-websocket-key when shaking hands? The following code looks like this:

std: :string GenerateHandshakeChallenge(a) {
  std: :string raw_challenge(websockets::kRawChallengeLength, '\ 0');
  crypto::RandBytes(base::string_as_array(&raw_challenge),
                    raw_challenge.length());
  std: :string encoded_challenge;
  base::Base64Encode(raw_challenge, &encoded_challenge);
  return encoded_challenge;
}Copy the code

It uses a crypto::RandBytes to generate random bytes, and the same calculation is used to verify sec-websocket-accept:

std: :string ComputeSecWebSocketAccept(const std: :string& key) {
  std: :string accept;
  std: :string hash = base::SHA1HashString(key + websockets::kWebSocketGuid);
  base::Base64Encode(hash, &accept);
  return accept;
}Copy the code

The same method is used when calculating the mask:

inline void MaskWebSocketFramePayloadByBytes(
    const WebSocketMaskingKey& masking_key,
    size_t masking_key_offset,
    char* const begin,
    char* const end) {
  for (char* masked = begin; masked ! = end; ++masked) { *masked ^= masking_key.key[masking_key_offset++];if (masking_key_offset == WebSocketFrameHeader::kMaskingKeyLength)
      masking_key_offset = 0; }}Copy the code

Deflate compression, cookies, extensions, etc., are not covered in this article.

If there are 1000 users online at the same time, then the server has to maintain 1000 TCP connections. And a TCP connection usually requires an independent thread, and the overhead of threads is very high. So WebSocket is particularly stressful on the server side? It’s not that big, because Linux has an epoll service model, which is event-driven and enables a core to support many concurrent connections.

Finally, since the connection is always operated, if one of the two parties fails to send a packet to the other party to close the connection, the other party will hold the useless connection, so WebSocket introduces a ping/pong message frame. Opcode 0x9 in the frame header indicates a ping frame, and 0x10 indicates a pong response frame. Therefore, the client can be pinged continuously, for example, every 30 seconds. When the server receives the ping, it knows that the client is still alive and sends a response to pong. If the server has not received the ping for a long time, such as 1 minute, it considers that the client has gone and directly closes the connection. If the client does not receive a PONG response, it considers that the current connection has been broken and needs to be reconnected. The browser JS API does not open ping/pong, so you need to implement a message type yourself.

This paper mainly discusses the significance of WebSocket, open a socket API to the browser, and standardize, in addition to the browser, APP can also be implemented in accordance with this standard, make up for the shortcomings of HTTP one-way transmission. It also discusses the format of WebSocket message frame, and how to use Node.js to read this message frame. The client will mask the content it sends, and the server will also need to mask the received message. We see a lot of similarities in the Chrome client implementation.

How to ensure the stability of WebSocket transmission may be another topic, including the error reconnection mechanism, the use of private lines across The United States and China.