An overview of the

This article is the first chapter of WebSocket protocol. The main content of this article translation is a simple and comprehensive introduction to the whole WebSocket. Through this article we can have a general understanding of WebSocket.

1 introduction

This chapter is the first chapter of the agreement.

1.1 background

This section is a non-standard section.

Historically, creating a client-server two-way data Web application (such as an IM application and a game application) has required sending frequent HTTP polling requests to the server to update data upstream from the server, unlike regular HTTP requests.

There are many problems with this approach:

  • The server is forced to interact with the client using a large number of potential TCP connections: one for sending data, the other for receiving data.
  • The application layer Wireless Transport protocol (HTTP) is expensive, and each client-to-server message has an HTTP header.
  • The client script must contain a mapping table for sending and receiving data processing.

A simple solution is to use a simple TCP connection for two-way data transfer. This is what WebSocket provides. Combined with the WebSocket API, it can provide an alternative to HTTP polling for two-way data communication between Web pages and remote servers.

The same techniques can be applied to many Web applications: games, stock applications, multiplayer collaboration applications, user interfaces that interact with back-end services in real time, and so on.

The WebSocket protocol is designed to replace the existing two-way communication technology that uses HTTP as the transport layer, so that existing basic services (such as proxies, filters, authentication services) can benefit. This technique was later implemented based on the trade-off between efficiency and reliability, and the HTTP protocol was not originally intended for two-way data communication. WebSocket protocol tries to realize the goal of bidirectional communication technology in existing environment based on existing HTTP basic service. So, even though this means some complexity in the existing environment, it is still designed to use HTTP ports 80 and 443, as well as support for HTTP proxies. However, this design does not restrict Websockets to using HTTP ports, and future implementations could use a simple handshake to use a particular port without changing the entire protocol. This last point is important because two-way message communication does not conform well to the standard HTTP pattern and can lead to abnormal load in some components.

1.2 Protocol Overview

This section is a non-standard section.

The protocol has two parts: the handshake and the data transfer.

The handshake data from the client looks like this:

GET /chatHTTP / 1.1Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version13:Copy the code

The server’s handshake response looks like this:

HTTP / 1.1101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat
Copy the code

The leading line of the client request follows the format of the HTTP request line.

The leading line on the server follows the format of the HTTP status line.

The specification for HTTP request lines and status lines is defined in RFC2616.

In both protocols, the first header line is followed by an unordered set of header fields. These header fields contain content found in Section 4 of this article. Additional header fields, such as cookies, may also exist. Format and parse header information is defined in RFC2616.

When both the client and the server have sent their handshake protocol, and the handshake has been successful, the data transfer begins. This is a two-way communication channel where both sides can independently send arbitrary data.

After the handshake is successful, the data transmitted between the client and the server is called messages in the specification. In transmission, a message consists of one or more frames. A message in a WebSocket does not need to correspond to a frame in a particular network layer. A fragmented message may be merged or split into frames at the network layer by an intermediary.

Frames have associated types. Each frame of the same message contains the same type of data. In general, it can be text data (UTF-8 encoding), binary data (reserved for application parsing), and control frame data (not used to transmit data, but used as protocol layer specific symbols, such as closing connection frames). The current version of the protocol defines six control frame types and reserves 10 reserved types.

1.3 Connection Handshake

This section is a non-standard section.

The connection handshake is compatible with HTTP-based server software and mediations, so a single port can allow both HTTP clients to interact with the service and WebSocket clients to interact with the service. Ultimately, the WebSocket client handshake is an HTTP-based upgrade request:

GET /chatHTTP / 1.1Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version13:Copy the code

According to RFC2616, header fields sent by the client during the handshake may be out of order, so the difference in the order of the header fields received does not matter much.

The Request URI (REQUEST-URI) of the GET method is used to define the WebSocket connection terminals. The same IP address can provide services for multiple domain names, and multiple WebSocket terminals can connect to the same server.

The client includes a host domain name in each Hostheader handshake. So both clients and servers can verify which domain names are in use.

The additional header field is used to determine the option of the WebSocket protocol. The specific options provided in this version are sub-protocol selection (SEC-websocket-Protocol), list of client-supported Extensions (sec-websocket-Extensions), Originheader fields, and so on. The request header field sec-websocket-protocol can be used to identify which subprotocols (websocket-based application high-level protocols) are supported by clients. The server then selects either zero or a supported protocol and prints the protocol it selected in the response handshake.

Sec-WebSocket-Protocol: chat
Copy the code

The server can also set cookie-related fields to set cookie-related properties. For details, see RFC6265.

1.4 Ending the Handshake

This section is a non-standard section.

Ending a handshake is much easier than joining a handshake.

Either end can send a control frame containing a particular closed handshake (see Section 5.5.1 for details). After receiving this frame, the other end responds by sending an end frame without sending any data. After receiving the end frame from the other end, the end that originally sent the control frame safely closes the connection when no data needs to be sent.

After sending a control frame indicating that the connection needs to be closed, the client does not send any more data; After receiving a control frame indicating that the connection needs to be closed, the client discards all subsequent data.

It’s safer than shaking both hands at the same time.

The goal of this end-of-handshake is to complement the TCP end-of-handshake (FIN/ACK), which is not always reliable from end to end, especially with proxies and other network intermediaries.

When sending a closed frame and waiting to receive a response from the other end of the closed frame, in some cases unnecessary loss of data can be avoided. For example, on some platforms, if a socket is closed while the receive queue has data, an RST packet is sent, even though the data is still waiting to be read, which also causes the receiving party to fail to receive data.

1.5 Design Philosophy

This section is a non-standard section.

The WebSocket protocol is designed to be the smallest framework possible (the only constraints are that the protocol is frame-based rather than stream-based, and can support either Unicode text or binary frames). In a Websocket-based application layer, metadata should be layered, just like in a TCP-based application layer (such as HTTP).

Conceptually, the WebSocket layer is implemented based on TCP, with the following additions:

  • Added a browser-based same-origin policy model
  • Added an address and protocol naming mechanism to support multiple services on the same port and multiple host names on the same IP address
  • Layered on top of TCP, the framework mechanism goes back to the IP packet mechanism used by TCP, but with no length limit
  • Contains an additional end-of-handshake protocol designed to handle situations with agents and other network intermediaries

Beyond that, WebSocket adds nothing. Basically, the goal of WebSocket is to provide Web services that are as close to native TCP as possible with scripting under constrained conditions. It also allows the server to share a service with HTTP when shaking hands and processing valid HTTP upgrade requests. Other protocols can be used to establish message communication from the client to the server, but the WebSocket protocol is intended to provide a relatively simple protocol that can coexist with HTTP and rely on HTTP infrastructure such as proxies. This protocol, which is very close to TCP, is safe to use because of its security-based infrastructure and targeted additions (such as message semantics) that are easy to use and make things easier.

This protocol is extensible and future versions may introduce new concepts such as multiplexing.

1.6 Security Model

This section is a non-standard section.

When WebSocket protocol is applied in web pages, WebSocket protocol uses the same origin policy model based on Web browser when web pages are connected to WebSocket server. Therefore, the same origin policy model does not work when the WebSocket protocol is used directly on a specific client (not a web page in a Web browser), and the client can accept arbitrary source data.

This protocol cannot connect to existing servers such as SMTP (RFC5421) and HTTP, and the HTTP server can choose to support this protocol if necessary. The protocol also implements a tightly constrained handshake process and restricts data from being inserted for transmission before the handshake is complete and a connection is established (thus limiting many affected servers).

WebSocket servers are also unable to establish connections to other protocols, particularly HTTP. For example, an HTML “form” might be submitted to a WebSocket server. The WebSocket server can only read handshake data that contains specific fields sent by the WebSocket client. In particular, when writing this specification, an attacker cannot send fields starting with Sec- using only Web browsers that use HTML and JavaScript APIs.

1.7 Relationship with TCP and HTTP

This section is a non-standard section.

The WebSocket protocol is an independent TCP-based protocol. Its only relationship to HTTP is that the upgrade request for the handshake operation that establishes the connection is based on the HTTP server.

By default, WebSocket uses port 80 for connection, while TLS (RFC2818) WebSocket uses port 443 for connection.

1.8 Establishing a Connection

This section is a non-standard section.

When a connection is made to a shared port with the HTTP server (which is likely to be sent to ports 80 and 443), the connection will send a regular GET request to the HTTP server to upgrade. Websocket-based systems can be deployed in a more practical way, with a relatively simple setup of an IP address and a single server to handle communication with a single hostname. In more detailed Settings (such as load balancing and multiple servers), a proprietary WebSocket connection cluster separate from the HTTP server might be easier to manage. When writing this specification, we should know that the success rates for establishing WebSocket connections on port 80 and 443 are different, and that connections on port 443 are significantly more likely to succeed, although this may change over time.

1.9 Using the WebSocket subprotocol

The client can request the server to use the specified subprotocol through the sec-websocket-protocol field in the handshake phase. If this field is specified, the server needs to include the same field and select a value from the subprotocol in response to establishing the connection.

The names of subprotocols can be registered as described in Section 11.5. To avoid potential conflicts, it is recommended to use domain names containing ASCII codes as subprotocol names. For Example, The Example Corporation created a Chat subprotocol implemented across multiple servers on the Web, which could be called chat.example.com. If the Example Organization creates their relative subprotocol called chat.example.org, the two subprotocols can be implemented by the server at the same time, and the server can dynamically choose which subprotocol to use based on the client.

Subprotocols can also be retrocompatible by changing their names, for example, changing bookings.example.net to v2.bookings.example.net. The WebSocket client can completely differentiate between these subprotocols. Backward compatible versioning ensures this extensibility by reusing the same subprotocol characters and carefully designed subprotocol implementations.

conclusion

This article has carried on a comprehensive general introduction to WebSocket, can let everyone have a preliminary understanding of WebSocket related protocol content.

The WebSocket protocol will continue to be translated in the future, students who are interested in understanding can continue to pay attention to it.