Illustrated HTTP is an easy-to-understand, 11-chapter book on the HTTP protocol. This series of articles is divided into two parts to make a learning record of the book by chapter. The first part mainly explains HTTP related content, including history, HTTP method, protocol format, packet structure, header field, status code and other related content, while the second part mainly introduces Web security related content.
Chapter 1 understanding the Web and network basics
HyperText Transfer Protocol (HTTP) is a HyperText Transfer Protocol. Transport protocols are easy to understand. What is hypertext? When multiple documents are related to each other, hypertext is formed. HTTP exists as a Web document transfer protocol, and its main purpose is to solve the problem of text transfer.
Network base TCP/IP
TCP/IP is a subset of HTTP, and the network that is commonly used operates on top of the TCP/IP protocol. An important aspect of the TCP/IP protocol family is layering, or the classic five-layer model: application layer, transport layer, network layer, data link layer, and physical layer. The layers are generally understood in a single flow: First, the client, acting as the sender, makes an HTTP request at the application layer (HTTP protocol) to view a Web page. Then, for convenient transmission, the data (HTTP request packets) received from the application layer are segmented at the transport layer (TCP), and the packets are marked with serial numbers and port numbers and forwarded to the network layer. At the network layer (IP protocol), add the MAC address as the communication destination and forward the MAC address to the link layer. The server on the receiving end receives data at the link layer and sends it to the upper layer in sequence until it reaches the application layer. When transmitting data from layer to layer, the sender must print the header information of the layer every time it passes through the layer. The receiver, on the other hand, removes the corresponding header as it passes through layer to layer.
# Three handshakes
TCP is located at the transport layer and provides the byte stream service. The so-called byte stream service divides large chunks of data into packets for management based on packet fields for convenient transmission. TCP uses a three-way handshaking strategy to deliver data accurately to its destination. During the handshake, the TCP tokens SYN(Synchronize) and ACK(Acknowledgement) are used. The sender first sends a packet with the SYN flag to the peer. After receiving the packet, the receiving end sends back a packet with the SYN/ACK flag to convey the acknowledgement information. Finally, the sender sends back a packet with an ACK flag, representing the end of the handshake.
# URI and URL
A Uniform Resource Identifier (URI) is a string that identifies an Internet Resource. It is said by a specific protocol, the protocol scheme are: HTTP, FTP, mailto, Telnet, file, etc. Uniform Resource Locator (URL) indicates the location of resources (on the Internet) and is a subset of URIs.
Chapter 2 simple HTTP protocol
According to the HTTP protocol, a request is made from the client, and the server responds to the request and returns. A request message consists of the request method, request URI, protocol version, optional request header fields, and content entities. The response message consists of the protocol version, the status code, the reason phrase to explain the status code, the optional response header field, and the entity. HTTP is a protocol that does not store state, that is, stateless. This means that HTTP does not record or manage the status of previous requests or responses. Therefore, Cookie technology was introduced to solve the stateless problem.
# persistent connection
In the original version of HTTP, TCP connections were disconnected for every HTTP communication. The next HTTP communication requires a three-way handshake to establish a TCP connection, which is extremely expensive. Therefore, HTTP1.1 proposes a persistent connection method that preserves the TCP connection as long as either the client or server side does not explicitly disconnect. Moreover, persistent connections allow most requests to be sent in parallel, without waiting for the response of the previous request before the next request can be sent.
Chapter 3 HTTP information in HTPP packets
# HTTP message
Information exchanged over HTTP is called AN HTTP packet. An HTTP packet is a string text composed of multiple lines of data, which can be roughly divided into a packet header and a packet body. HTTP packet body The entity body used to transmit the request or response. Generally, the packet body is equal to the entity body. The difference between the entity body and the packet body occurs only when the content of the entity body changes during the encoding operation during transmission.
# code
During HTTP data transmission, data can be directly transmitted as it is or encoded to improve transmission efficiency. Content encoding specifies the encoding format to be applied to entity content and keeps entity information compressed as is. The encoded entity is received and decoded by the client. Commonly used content encoding has: gzip, deflate, compress, identity.
Chapter 4 returns the HTTP status code for the result
The status code’s job is to describe the result of the request returned when the client sends a request to the server. We need to understand some common status codes. 200 OK: The request from the client is processed on the server. 301: Permanent redirection. Indicates that the requested resource has been assigned a new URI, and the URI 302: temporary redirection should be used. The resource has been assigned a new URI and is expected to be accessed using the new URI this time. 304:Not Modified, the server resource has Not changed, can directly use the client’s cache has Not expired. 404: Requested resource cannot be found on the server. 500: This status code indicates that an error occurred when the server executed the request.
Chapter 5 web Servers that collaborate with HTTP
An HTTP server can build multiple Web sites by using virtual hosts.
# agent
A proxy is a forwarding application that receives requests sent by the client and forwards them to the server, as well as a response returned by the server and forwards them to the client. The reasons for using a proxy server are as follows: 1. Reduce network bandwidth by caching. 2. Access control for specific websites within the organization; 3. Obtain access logs. When forwarding the response from the proxy server, the caching proxy stores a copy of the resource on the proxy server in advance. When the proxy receives a request for the same resource again, it can return the previously cached resource directly without obtaining the resource from the source server.
# gateway
The gateway enables servers on communication lines to provide non-HTTP services. The security of communication can be improved by using gateway.
# tunnel
The purpose of the tunnel is to ensure secure communication between the client and the server. You can use encryption methods such as SSL for communication.
Chapter 6 HTTP headers
HTTP request and response packets must contain HTTP headers. In a request, an HTTP packet consists of methods, URIs, HTTP versions, and HTTP header fields. In the response, the HTTP packet consists of the HTTP version, status code, and HTTP header field.
HTTP header field
The HTTP header field contains the most information among various packet fields. The HTTP header field consists of the header field name and field value, which are separated by colons (:), and multiple values are separated by commas (,). Header field name: Field value 1, field value 2, and field value 3 HTTP header fields are classified into the following types based on actual usage: 1. Common header Field The common header field refers to the header used in both request and response packets. Cache-control: no-chache // Force re-authentication to the source server cache-control: no-store // do not Cache any content of the request or response cache-control: Max-age = [s] // Maximum age value of the response Connection: close // Disconnection Connection: keep-alive // Persistent Connection Date: Tue,03 Jul 2018 // Time and date when the HTTP packet is created 2. Request header field Accept: Accept-charset: Unicode-1-1 // Accept-encoding: Encoding of the character set supported by the user agent and its priority Accept-language: zh-cn, en // Set of natural languages that user agents can handle Host: www.baidu.com // The Internet host name and port number of the requested resource, the only required request header field if-match: ‘123abc’ // The server will compare the field value of if-match with the ETag value of the resource and execute the request if-modified-since only If the two Match: If none of the requested resources has been updated, the server accepts the request. If none of the requested resources has been updated, the status code 304 Not Modified user-agent is returned: // The browser that created the request and the name of the User Agent are passed to server 3. Accept-ranges :bytes/ None // Tell the client whether the server can process the range request Age: 600 // Tell the client how long ago the source server created the response ETag: ‘ABC-123’ // The server assigns a corresponding ETag value to each resource, and the ETag value is updated when the resource is updated. ETag can be divided into strong ETag and weak ETag. Strong ETag will change its value no matter how slight the change of the entity. Weak ETag is only used to indicate whether the resource is the same or not. The ETag value Location will only be changed if the resource is fundamentally changed: // Location can direct the response recipient to a resource at a different Location than the requested URI. Almost all browsers that receive a response containing Location will force an attempt to access the redirected resource 4. Allow: GET,HEAD // Inform the client of all HTTP methods that can be supported content-Encoding: gzip // Inform the client of how the server encodes the entity body (compression) Content-language: Zh-cn // Tells the client the natural language used by the entity body Content-Length: 15000 // Specifies the size (in bytes) of the entity body: Date() // Tells the client the actual Date of the resource, priority less than max-age last-modified: Date() // specifies the time when the resource was Last Modified
# is the header field of the Cookie service
Set-cookie set-cookie: name= Jack // Set the name and value of the Cookie set-cookie: expires=DATE // The expiration DATE of the Cookie set-cookie: Domain = domain name // The domain can be shared between the primary domain and the secondary domain. Cookie set-cookie: Secure // The cookie set-cookie is sent only when HTTPS is used for Secure communication. HttpOnly // Prevents JavaScript scripts from accessing cookies