Why did I write this article?

At the beginning as a programming white I, just engaged in background work, think HTTP is a very awesome thing, but behind with their in-depth study and practice, feel that the original and I think of a world of difference, not as complex as we imagine, just a protocol! . After learning more things, slowly calm. The reason we’re talking about WebSocket today, rather than some other protocol, is that in a sense (and I’m going to pretend to be a jerk) HTTP is a bit of a no-no if you understand websocket. The code for WebSocket, I used to write it in C and C++, but FOR the sake of PHP coder (PHP is the best language in the world), I rewrote it in PHP, but it’s a simplified version, and it’s enough for us to thoroughly understand WebSocket and understand the essence of it. Code I have uploaded to the code cloud (PHP-websocket-base-implemention), please be sure to download, and personally run, practice is the only standard to test the truth ah, code is completely run, if there are obstacles to run, please contact me. The blog post took about 3 days of revision (fortunately there was not much going on in the company) to explain as clearly as possible. All of a sudden, I feel so tired to write an article. This is not important. I hope you can understand it, or what I write will be useless. More hope that we encounter do not understand, put forward questions. After writing this post, I reviewed the content of the current post again, fixed some typos, and I hope you can correct some of them.

The preparatory work

Before reading this blog post, you need to have some basic knowledge of the reserve, I will give you a list, first install force

The socket base

Basic socket programming skills, if you don’t know, don’t panic, just in case, I have prepared for you, please refer to PHP to write basic socket programs

An operation

Because of the PHP programming in general will rarely encountered an operation, so forget and not familiar with the course, we can refer to the PHP official document, but I still want to speak a little, xor (^) operation, please see below, this conclusion is very important, please remember, remember to keep in mind, the important things about three times.

If a to the b is equal to c, it follows that c to the b is equal to aCopy the code

Binary data and text data

Don’t you sometimes open a file and see garbled characters like the following

The fundamental difference between binary data and text data is in digital storage

Big endian and little endian, network endian

Existence of this statement, because different CPU architecture, multibyte data storage format is different in content, here we are in the int (hypothesis 4 bytes) data m (data using hexadecimal format) as an example, m = 0 x12345678, to explain, please carefully felt a, b, c, d, in turn, increase the memory address.

  • Little endian, low byte stored at low address, high byte stored at high address, what does that mean? 0x78 is stored in A, 0x56 in B, 0x34 in C, and 0x12 in D.
  • Big endian sequence, the high byte is stored in the low byte, the low byte is stored in the high byte, 0x78 is stored in D, 0x56 is stored in C, 0x34 is stored in B, and 0x12 is stored in A.
  • Network byte order, network byte order is big endian byte order, this has become the standard.

As you can see from the above analysis, when parsing multi-byte data from network data, byte order must be taken into account, which is why I emphasize it here.

The birth of an agreement

Websocket protocol is widely used nowadays. The main reason for this phenomenon lies in the brevity of HTTP protocol. Every request reply between the client and the server needs to establish TCP three-way handshake, which is very terrible for the server with large traffic (system-level resources). So at this time webSocket was born, the exact birth date is unknown, but the actual standardization date is 2011, officially completed by IETF, please refer to RFC6455 for details.

Protocol workflow

Here’s a picture that illustrates this, from Google,

Websocket and HTTP both belong to the application layer protocol (above TCP/IP), but WebSocket has a handshake compared to HTTP. It can be clearly seen from the figure above. HTTP is a text protocol, but WebSocket is different and has its own strict byte format, which I’ll cover later.

Packet format

Overview of the protocol process

The protocol consists of two parts, handshake and data transmission. The handshake part is not complicated, and the handshake is based on HTTP protocol. Let’s take a look at the handshake process of the protocol first.

	GET /chat HTTP/1.1
	Host: server.example.com
	Upgrade: websocket
	Connection: Upgrade
	Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
	Origin: http://example.com
	Sec-WebSocket-Protocol: chat, superchat
	Sec-WebSocket-Version: 13
Copy the code

The server responds as follows:

	HTTP/1.1 101 Switching Protocols
    Upgrade: websocket
    Connection: Upgrade
    Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
    Sec-WebSocket-Protocol: chat
Copy the code

Whether it is a request or a response package, the order of the header field is not required, some of these fields believe everyone is very familiar with, even if not familiar, baidu, or easy to figure out, let’s carefully discuss some of the Websocket unique fields:

Upgrade field

This field represents the protocol to upgrade to, is required, and its value must be WebSocket.

Connection

This field indicates that the Upgrade protocol is required, and it is required that its value be Upgrade.

The Sec – WebSocket – the Key and the Sec – WebSocket – Accept

This is used to use shaking hands with the client and the server, must pass, because the server will use this value to certain transformation and then back to the client, the client check this value, and calculate the value of your, if not the same, so the client will think, the service side is problematic, then the result can only be connection failed. Before we get to the actual operation, we need to introduce a constant GUID with a value of 258eAFa5-e914-47DA-95CA-C5AB0DC85b11. This value is fixed and must be defined by any Websocket server and client (including browser). Now let’s focus on this field. If the value passed by the client was dGhlIHNhbXBsZSBub25jZQ==, the PHP code would look like this:

$GUID = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
$sec_websocket_key = "dGhlIHNhbXBsZSBub25jZQ==";
$result = base64_encode(sha1($sec_websocket_key . $GUID));
Copy the code

The calculated $result value is eventually passed back to the client’s HTTP response header, sec-websocket-accept. The client validates the value, and it’s up to the client.

Sec-WebSocket-Version

The version number of the websocket protocol, as documented in RFC6455, must be 13, and none of the other values will work.

The request MUST include a header field with the name |Sec-WebSocket-Version|. The value of this header field MUST be 13. NOTE: Although draft versions of this document (-09, -10, -11,and -12) were posted (they were mostly comprised of editorial changes and clarifications and not changes to the wire protocol), values 9, 10, 11, and 12 were not used as valid values for Sec-WebSocket-Version. These values were reserved in the IANA registry but were not and will not be used.

Sec-WebSocket-Protocol

Select the subprotocol used by webSocket. This field is not required, depending on the implementation, and will not be passed if you are using Google browser.

handshake

After explaining Websocket’s main HTTP header fields, let’s take a look at the server side check code, here I put the code in the example program posted out, for everyone to analyze a ha

/**
     * @param $client_socket_handle
     * @throws Exception
     */
    private function shakehand($client_socket_handle)
    {
        if (socket_recv($client_socket_handle.$buffer, 1000, 0) < 0) {
            throw new Exception(socket_strerror(socket_last_error($this->socket_handle)));
        }
        while (1) {
            if (preg_match("/([^\r]+)\r\n/".$buffer.$match) > 0) {
                $content = $match[1].if (strncmp($content."Sec-WebSocket-Key", strlen("Sec-WebSocket-Key")) = = 0) {$this->websocket_key = trim(substr($content, strlen("Sec-WebSocket-Key:")), " \r\n");
                }
                $buffer = substr($buffer, strlen($content) + 2);
            } else {
                break; }} // Response client$this->writeToSocket($client_socket_handle."HTTP / 1.1 101 Switching Protocol \ r \ n");
        $this->writeToSocket($client_socket_handle."Upgrade: websocket\r\n");
        $this->writeToSocket($client_socket_handle."Connection: upgrade\r\n");
        $this->writeToSocket($client_socket_handle."Sec-WebSocket-Accept:" . $this->calculateResponseKey() . "\r\n");
        $this->writeToSocket($client_socket_handle."Sec-WebSocket-Version: 13\r\n\r\n");
    }
Copy the code

First of all, we read from the client socket 1000 bytes of the content, the 1000 bytes is enough to read all the head (but in the enterprise code, we can’t write, how much we can never assume that the HTTP headers, in this post, we to the focus of the outstanding problems, simplifies a lot of code, but you rest assured, Socket_recv: socket_recv: socket_recv: socket_recv: socket_recv: socket_recv: socket_recv: socket_recv: socket_recv: socket_recv: socket_recv: socket_recv: socket_recv: socket_recv

Detailed description of HTTP protocol format

For websocket handshakes, the status code returned must be 101 if the server approves the connection to the client, and the subsequent text does not necessarily have to be a Switching Protocol, just passed as everyone else does. Second, Upgrade: websocket,Connection: Upgrade and sec-websocket-version: 13, must be passed to the client, this is fixed, should not be difficult, in addition to the sec-websocket-accept we have said earlier, its calculation code, I have posted above, this calculation method is also fixed, do not forget to each line after the \r\n oh, The last line must be followed by two \r\n.

Analytical data protocol

After looking at the handshake code above, do you feel like you are going to heaven, feeling really easy? Bitch, wake up, wake up. Haha, reality is too young, young is good

See the webSocket packet format I posted above, it’s time to unmask it, this part may be a bit difficult, don’t be afraid, I have. Now LET me do an atomic level analysis.

FIN

The FIN bit, which is the highest bit of the first byte of the entire fragment, can be either 0 or 1. This bit has only one function. If it is 1, the fragment is the last fragment of the entire message, and if it is 0, other fragments follow. Does it sound like you’re confused? What’s a clip? What is a message? Very good. Looks like it’s time for me to be a pussy, so without further ado. To make sense of these concepts, the code is respectful

(new WebSocket()).send("I am Obama.");
Copy the code

This is a JAVASCRIPT code, send function parameter is a message that is very short, but note that we cannot assume that any time, any place, is so short, and when it becomes a very long time, the client is likely to cut it, for example, I had a string, size is 4 m, I divided it into four string 1 m, Each 1 m string, can only become a snippet, each separate to send, four pieces together make up a message, every segment format is fixed, the format and the mapping is the same, according to what I said just now, in front of three pieces, FIN is 0, the fourth is 1, clear? So easy!!

RSV1,RSV2,RSV3

These three digits are reserved for extension use, which I don’t use, so we can just treat them like air, always set to zero, that’s it.

opcode

Opcode, as the name implies, takes up the lower four bits of the first byte, so opcode can stand for 16 different values. What’s OpCode for, you might ask? Opcode is used to parse the payload (the data carried) of the current fragment, more on this later.

  • 0x00 means the current fragment is a continuous fragment. What does that mean? Remember when we discussed FIN above, when one message was split into multiple fragments? If the current fragment is not the first, opCode must be set to 0.
  • 0x01 indicates that the data carried by the current fragment is text data. If there are multiple fragments, you only need to set this value in the first fragment, and in the later fragment of the same message, you only need to set it to 0.
  • 0x02 indicates that the data carried by the current fragment is binary data. If there are more than one fragment, you only need to set this value in the first fragment, which belongs to the later fragment in the same message, and only need to set it to 0.
  • 0x03-0x07, reserved for future use, meaning not used yet.
  • 0x08, which means closing webSocket connections, and I’ll talk about that again, but I’ll leave it there
  • 0 x09, send Ping fragment, to put it bluntly, it is mainly used to detect the remote endpoint is still alive, I want to check whether the object is dead, but this fragment can carry data, if the endpoint of one party sent Ping, then the receiving party, must return Pong fragments, in Chinese terms, is in town.
  • 0xA, send Pong, reply to Ping, isn’t that easy?
  • 0xB minus F is reserved for future use, which means we haven’t used it yet.

MASK

RFC6455 specifies that all data sent from the client to the server must be encrypted, so the value of mask must be 1. Also, all data sent from the server to the client must not be encrypted, so the mask must be 0, simple as that.

Payload Length

This part is used to define the length of the load data. There are 7 bits, so the maximum is 127. Is that easy? Hem, no.

  • Payload_length <=125. In this case, the data length is the size of Payload_length.
  • Payload_length =126, then two bytes of payload_length are used to indicate the size of the data. Therefore, when the data size is larger than 125 and smaller than 65535, payload_length is set to 126. I will talk about this later when I analyze the code.
  • If the value of payload_length is 127, then the next 8 bytes of payload_length are used to represent the size of the data. This can represent a considerable amount of data. I will talk about this later when I analyze the code.

Mask key

It is located immediately after the data length and is 0 or 4 bytes in size. The function of mask has been analyzed previously. If mask is 1, the data needs to be encrypted, and the mask key occupies 4 bytes; otherwise, the length is 0. As for how mask key is used to decrypt data, we will talk again later.

payload data

Here is the data we received from the client, but it’s encrypted, “I’m Obama”, and the length of payload_length is the encrypted length of the data, not the length of the original data.

With that out of the way, we can start analyzing how to parse Websocket message fragments using PHP.

Parsing packet

As I said at the beginning of this blog post, the current WebSocket implementation focuses on the best and hardest parts of WebSocket, so it leaves out some of the details. If you understand what follows, the rest is fine.

Calculate the length of the data

// Wait for the new data transferred by the clientif(! socket_recv($client_socket_handle.$buffer, 1000, 0)) {
        throw new Exception(socket_strerror(socket_last_error($client_socket_handle))); } // Parse the length of the message$payload_length = ord($buffer[1]) & 0x7f; // The second character is 7 bits lowerif ($payload_length> = 0 &&$payload_length < 125) {
        $this->current_message_length = $payload_length;
        $payload_type = 1;
        echo $payload_length . "\n";
    } else if ($payload_length = 126) {
        $payload_type = 2;
        $this->current_message_length = ((ord($buffer[2]) & 0xff) << 8) | (ord($buffer[3]) & 0xff);
        echo $this->current_message_length;
    } else {
        $payload_type = 3;
        $this->current_message_length =
            (ord($buffer[2]) << 56)
            | (ord($buffer[3]) << 48)
            | (ord($buffer[4]) << 40)
            | (ord($buffer[5]) << 32)
            | (ord($buffer[6]) << 24)
            | (ord($buffer[7]) << 16)
            | (ord($buffer[8]) << 8)
            | (ord($buffer[7]) < < 0); }Copy the code

For the code above, let’s parse it line by line

$payload_length = ord($buffer[1]) & 0x7f; // The second character is 7 bits lowerCopy the code

Read the lower 7 bits of the second byte, which is payload_length, 0x7f converted to binary is 01111111, ord($buffer[1]) converts the second character to the corresponding ASCII value, and calculates and. You get the lower 7 bits of the second byte (check out the link I gave you earlier in this post if you’re not familiar with computing),

if ($payload_length> = 0 &&$payload_length < 125) {
        $this->current_message_length = $payload_length;
        $payload_type = 1;
        echo $payload_length . "\n";
 }
Copy the code

When payload_length is less than 125, the data length is equal to the fragment length.

if ($payload_length = 126) {
        $payload_type = 2;
        $this->current_message_length = ((ord($buffer[2]) & 0xff) << 8) | (ord($buffer[3]) & 0xff);
        echo $this->current_message_length;
  }
Copy the code

There is some trouble when the length of payload_length is equal to 126, when the third and fourth bytes are combined into an unsigned 16-bit integer. Remember what we said earlier about network byte order? The most important byte is first, the least important byte is second, so when we read, the third byte is the highest 8 bits, and the fourth byte is the lowest 8 bits, so we first move the highest 8 bits to the left and then do the sum with the lowest 8 bits.

$payload_type = 3;
$this->current_message_length =
    (ord($buffer[2]) << 56)
    | (ord($buffer[3]) << 48)
    | (ord($buffer[4]) << 40)
    | (ord($buffer[5]) << 32)
    | (ord($buffer[6]) << 24)
    | (ord($buffer[7]) << 16)
    | (ord($buffer[8]) << 8)
    | (ord($buffer[9]) < < 0);Copy the code

When the length of payload_length is 127, bits 3 through 10 are combined into an unsigned 64-bit integer, so the top 8 bits need to be moved 56 bits to the left, and so on, leaving the bottom 8 bits unchanged.

Parsing mask key

// Parse the mask, which is required. The mask is 4 bytes in total$mask_key_offset = ($payload_type= = 1? 0: ($payload_type= = 2? 2:8) + 2;$this->mask_key = substr($buffer.$mask_key_offset, 4);
Copy the code

If payload_length==126, then the offset is 2. If payload_length==126, then the offset is (2+2)=4. If payload_length==126, then the offset is (2+2)=4. The offset is (2+8) =10, and the size of the mask key is 4 bytes, so the offset and length are found, and the mask key can be obtained.

Decrypt the data

// Get the encrypted content$real_message = substr($buffer.$mask_key_offset + 4);
$i = 0;
$parsed_ret = ' '; // Parse encrypted datawhile ($i < strlen($real_message)) {
    $parsed_ret .= chr((ord($real_message[$i]) ^ ord(($this->mask_key[$i % 4]))));
    $i+ +; }Copy the code

The first step in decrypting the data is to find the offset of the encrypted data in the current fragment. This value is simply equal to the offset of the maskkey + the length of the maskkey itself 4. Looking at the code above, it can be seen that the decryption process is actually traversing the ASCII value and data of each character of the encrypted data (the current traversing position is modulo 4, the data obtained must be 0,1,2,3, and the obtained data will find the ASCII value of the corresponding position of maskkey) for xor operation. This algorithm is stipulated in RFC6455, and it is the same all over the world.

Returns data to the client

The data format from the client to the server and from the server to the client follows the same packet format, so in my implementation, the code is as follows:

function echoContentToClient($client_socket.$content)
{
    $len = strlen($content); // The first byte$char_seq = chr(0x80 | 1);

    $b_2 = 0;
    //fill length
    if ($len > 0 && $len< = 125) {$char_seq .= chr(($b_2 | $len));
    } else if ($len< = 65535) {$char_seq .= chr(($b_2 | 126));
        $char_seq .= (chr($len >> 8) . chr($len & 0xff));
    } else {
        $char_seq .= chr(($b_2 | 127));
        $char_seq .=
            (chr($len >> 56)
                . chr($len >> 48)
                . chr($len >> 40)
                . chr($len >> 32)
                . chr($len >> 24)
                . chr($len >> 16)
                . chr($len >> 8)
                . chr($len> > 0)); }$char_seq. =$content;
    $this->writeToSocket($client_socket.$char_seq);
}
Copy the code

For simplicity, FIN=1 in the first byte, opcode set to 1, and then check the length of the data. This is just the opposite of parsing the length of the data, so I’m not going to analyze it. If you can read all the previous ones, you should have no problem here, but be careful, as we’ve already mentioned, The data returned by the server to the client cannot be encrypted. Therefore, the mask must be set to 0 and the length of the mask key must be 0.

Running instance

As mentioned at the beginning of this post, I’ve written a simple WebSocket implementation. Be sure to download and run it yourself, and it won’t work: php-Websocket-base-implemention

In order for you to see the results of the actual run, open the websocket.html file. If this appears on the page, the run is successful.

If you still can’t run it, please contact me. If you want to see something else, please also modify the websocket. HTML file and restart the server.

prompt

The purpose of this blog post is just to briefly introduce the core content of WebSocket, there are still some content not mentioned (the rest is not difficult, interested in their own can be implemented), in order to let you have a more intuitive view of webSocket, the code has removed the error checking content, so it is not rigorous. I hope you enjoy your study.

contact

If you have any questions, please contact me, welcome to join the QQ group: 971572229