This is the 5th day of my participation in the August More Text Challenge

This section describes concepts related to THE HTTP protocol.

The Hypertext Transport Protocol (HTTP) is an application-layer Protocol that defines the communication rules between the browser and the World Wide Web server. The so-called “Hypertext” is plain text (ordinary text) on the basis of adding images, video, audio, hyperlinks, etc., beyond the scope of text, such text is Hypertext. Hypertext is described and displayed using HTML (Hypertext Markup Language)!

The side that requests access to resources such as text or images is called the client (such as a browser), and the side that provides the resource response is called the server side. Simply put, HTTP is a communication rule that specifies the format of the content sent to the server by the client and the format of the content sent to the client by the server.

Most of our Web applications are developed based on HTTP, and our operations on the Web are transmitted through HTTP protocol. It can be said that the Web is built on the HTTP protocol communication, and ordinary programmers in the Web application development can contact the most protocol is located in the application layer HTTP protocol, so the HTTP protocol for ordinary programmers should focus on understanding!

This article refers to the Illustrated HTTP protocol, which I recommend anyone learning Java to read. It’s very simple!

1 HTTP does not save status

HTTP is a stateless protocol that does not save state. The HTTP protocol itself does not store the state of communication between requests and responses. That is, at the HTTP level, the protocol does not persist requests or responses that have been sent.

With HTTP, a new response is generated whenever a new request is sent. The protocol itself does not retain information about all previous request or response messages. The HTTP protocol is designed to be so simple in order to process a large number of transactions more quickly and ensure protocol scalability.

However, as the Web continues to evolve, the number of cases where business processing becomes tricky due to statelessness increases. For example, a user logging into a shopping site needs to be able to stay logged in even after he jumps to other pages on the site. For this example, the site needs to save the user’s status in order to know who sent the request.

Although HTTP/1.1 is a stateless protocol, Cookie technology was introduced in order to achieve the desired state retention function. For example, the server sends the information it wants to remember to the client (browser) through a Cookie, and the client saves the Cookie locally. The next time it requests the server, it will automatically bring the Cookie, and the server can parse the Cookie to obtain relevant information, such as whether you have logged in. With cookies and HTTP communication, you can manage session state to some extent.

2 Cookie status management

While maintaining the feature of stateless protocol, it is necessary to solve the contradictory problem that the login state of users cannot be saved, so Cookie technology is introduced. Cookie technology controls client status by writing Cookie information in request and response packets.

The Cookie notifies the client to save the Cookie based on the set-cookie header field in the response packet sent from the server. When the client sends a request to the server next time, the client automatically adds the Cookie value to the request packet and sends the request packet.

After discovering the Cookie sent by the client, the server will check which client sent the connection request, and then compare the records on the server to obtain the previous status information.

Request without Cookie information:

Requests after the second time (with Cookie information state) :

The figure above shows the situation of Cookie interaction. The contents of HTTP request packets and response packets are as follows:

Request message (state without Cookie information) :

GET /reader/ HTTP/1.1 Host: hackr.jp * header does not contain information about cookies

Response message (Cookie information generated by the server) :

HTTP/1.1 200 OK set-cookie: sid=1342077140226724; path=/; expires=Wed,10-Oct-12 07:12:20 GMT Content-Type: text/plain; charset=UTF-8

Request message (automatically sending saved Cookie information) :

GET /image/ HTTP/1.1 Host: hackr.jp Cookie: sid=1342077140226724

3 the request URI

The HTTP protocol uses URIs to locate resources on the Internet. Because of the specific functionality of URIs, resources can be accessed anywhere on the Internet.

When a client sends a request to access a resource, the URI must be included as the request URI in the request packet. There are many types of request URIs to specify, currently the most widely used is URL, it is a URI implementation, that is, by the location of resources to locate resources!

Currently, however, it is preferred to say the requested address is a URI, which is the most accurate!

4 Long connection and short connection

Short connections are used by default in HTTP/1.0. That is, each TIME the client and server perform an HTTP operation (a request), a connection is established and broken at the end of the task. When the client browser accesses an HTML or other type of Web page that contains other Web resources (such as JavaScript files, image files, CSS files, etc.), the browser re-establishes an HTTP session each time it encounters such a Web resource.

Since HTTP/1.1, all connections use long connections by default to preserve the connection feature. If HTTP is used for long connections, the “Connection=keep-alive” attribute is added to the response header.

In the case of long connections, when a web page is opened, the TCP connection between the client and the server for HTTP data is not closed, and when the client accesses the server again (multiple requests), it continues to use the established connection. Keep-alive does not hold a connection forever, it has a hold time that can be set in different server software such as Apache.

HTTP long connection and short connection are essentially TCP long connection and short connection. HTTP long (hold) and short connections are essentially TCP long and short connections. HTTP is an application-layer protocol. It uses TCP at the transport layer and IP at the network layer. The IP protocol mainly solves the problem of network routing and addressing, while the TCP protocol mainly solves the problem of how to transmit packets reliably on the IP layer, so that the receiver on the network receives all packets sent by the sender, and the order is consistent with the order of sending. TCP is reliable and connection-oriented.

To implement the long-haul connection, both the client and the server must support the long-haul connection. The TCP connection is maintained as long as neither end explicitly disconnects.

4.1 Pipelining

HTTP1.1’s long connections make it possible for most requests to be sent pipelining. After sending the previous request, wait and receive the response before sending the next request.

With the advent of pipelining, the next request can be sent directly without waiting for a response. This makes it possible to send multiple requests simultaneously in parallel without having to wait for one response after another.

For example, when requesting an HTML Web page with 10 images, using persistent connections can end the request faster than connecting one by one. Pipelining is faster than persistent connections. The more requests there are, the more significant the time difference becomes.

However, the client still receives the response in the order in which it sent the request. HTTP1.1 pipelining is turned off by default.

5. Development of HTTP protocol

5.1 HTTP1.0 / HTTP1.1

  1. HTTP1.0 defaults to short connections, and HTTP1.1 defaults to long (persistent) connections.
    1. In HTTP1.0 the default is a short connection. Each time you interact with the server, you need to open a new connection. Each connection can send only one request and response, and then the connection is closed. We know that HTTP protocol is based on TCP, TCP every time after three handshakes, four waves, which needs to consume a lot of our resources!
    2. HTTP1.1 default long connection (persistent connection) to solve this problem, establish a connection, multiple requests are completed by this connection! (But if a request or response is blocked, a new TCP connection is opened.) The Connection = keep-alive request header needs to be set for long connections.
  2. HTTP 1.1 added the host field. HTTP1.0 assumes that each server is bound to a unique IP address, so the URL in the request message does not pass the hostname. However, with the development of virtual hosting technology, there can be multiple virtual hosts (multi-homed Web Servers) on a physical server, and they share the same IP address.
  3. HTTP 1.1 introduced Chunked Transfer-coding, range requests, and resumable breakpoints (essentially using HTTP headers to block transfer encoding to block entity bodies).
  4. HTTP 1.1 introduces the theory of Pipelining. A client can make multiple HTTP requests at the same time without waiting for the previous request to be responded to in sequence. While pipelining is only theoretical, most desktop browsers still turn HTTP Pipelining off by default! So now using HTTP1.1 protocol applications, it is possible to open multiple TCP connections!

5.2 HTTP2.0

HTTP Pipelining is a process that combines multiple HTTP requests into a TCP connection and sends them one by one without waiting for a response from the server. However, the client still receives the response in the same order it sent the request! So said. Whether HTTP1.0 or HTTP1.1 introduced Pipelining theory, there will still be congestion. Pipelining theory is still limited to the theoretical stage of what’s called Head of line blocking (HOLB), and is turned off by default in almost all browsers.

The most important difference between HTTP2.0 and HTTP1.1 is that it eliminates thread blocking! The most important changes are: Multiplexing means that header blocking will no longer be a problem, allowing multiple request-response messages to be sent simultaneously over a single HTTP2.0 connection. The optimization of merging multiple requests into one will no longer apply. In order to reduce HTTP requests, there have been many operations to merge multiple requests).

6 Http-based WebSocket

WebSocket is a standard for full-duplex communication between Web browsers and Web servers. The WebSocket protocol is standardized by IETF, and the WebSocket API is standardized by W3C. WebSocket technology, which is still under development, is designed to address problems caused by a bug that came with XMLHttpRequest in Ajax and Comet. So, WebSocket is a network communication protocol.

HTTP protocol has a defect: communication can only be initiated by the client, and the server cannot actively push information to the client. Once the Communication connection of WebSocket protocol is established between the Web server and the client, all subsequent communication depends on this special protocol. Data in any format, such as JSON, XML, HTML or images, can be sent to each other during communication.

Since WebSocket is a protocol based on HTTP, the initiator of the connection is still the client. Once the WebSocket communication connection is established, either the server or the client can directly send packets to the other party. Usually can use WebSocket protocol to achieve online web chat system, a variety of customer service systems, so WebSocket protocol is very important, if the follow-up has high concurrency requirements, then you can use Netty framework! Here is a simple introduction to WebSocket, we will have a special article on WebSocket how to achieve online chat system!

6.1 WebSocket Protocol Features

  1. Push function: The server pushes data to the client. This way, the server can send data directly without waiting for the client to request it.
  2. Reduce traffic: As long as a WebSocket connection is established, you want it to stay connected. Not only is the total overhead per connection reduced compared to HTTP, but there is also less traffic due to the small size of the WebSocket header.

To implement WebSocket communication, after an HTTP connection is established, you need to complete a “Handshaking” step:

  1. Handshake – request: In order to achieve WebSocket communication, the Upgrade header field of HTTP needs to be used to inform the server that the communication protocol has changed, so as to achieve the purpose of handshake.

GET /chat HTTP/1.1 Host: server.example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Origin: example.com Sec-WebSocket-Protocol: chat, superchat Sec-WebSocket-Version: 13

The sec-websocket-key field records the essential Key values in the handshake process.

The sec-websocket-protocol field records the subprotocol used.

Subprotocols define the names of those connections when they are used separately according to the WebSocket protocol standard.

  1. Handshake-response: Return a response from the status code 101 Switching Protocols for the previous request.

HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= Sec-WebSocket-Protocol: chat

The value of the sec-websocket-Accept field is generated from the value of the sec-websocket-key field in the handshake request.

After a successful handshake establishes a WebSocket connection, communication uses websocket-independent data frames instead of HTTP data frames.

Reference:

Illustrated HTTP

If you need to communicate, or the article is wrong, please leave a message directly. In addition, I hope to like, collect, pay attention to, I will continue to update a variety of Java learning blog!