1. Understand Web and network basics

1.1 Basic NETWORK TCP/IP

Before understanding HTTP, understand the TCP/IP protocol family. Commonly used networks, including the Internet, operate on the basis of the TCP/IP protocol family. HTTP is an internal subset of this. An important aspect of the TCP/IP protocol family is layering. There are four layers: application layer, transport layer, network layer, and data link layer.

  1. Application layer: Determines the activities that communicate when presenting applications to users for service.
  2. Transport layer: The transport layer provides data transfer between two computers in a network connection to the upper application layer.
  3. Network layer: Used to handle packets flowing over the network. A packet is the smallest unit of data transmitted over a network. This layer specifies the path to the other computer and the packet to the other computer.
  4. Link layer: Handles the hardware part that connects the network.

1.2 TPC/IP communication Transmission flow

1.3 Protocols closely related to HTTP: IP, TCP, and DNS

  1. IP protocol for transmission: THE IP address indicates the address assigned to the node, and the MAC address indicates the fixed address of the nic. An IP address can be paired with a MAC address. The IP address can be changed, but the MAC address is basically unchanged. No one has a complete grasp of the state of transport on the Internet (like logistics, with many hubs in between).
  2. TCP to ensure reliability: to ensure that data is delivered to the correct target. TCP uses the three-way handshake policy.
  3. DNS for resolving domain names: The DNS protocol provides the service of searching IP addresses by domain names or reverse-searching domain names from IP addresses.

1.4 Relationship between various protocols and HTTP

The following figure shows the roles of IP, TCP, and DNS in communication over HTTP.

1.5 the URI and URL

  1. Uris: Uniform Resource Identifiers, common identifiers (FTP, HTTP, LDAP, TEL,news..)

    Protocol scheme: HTTP: or HTTPS: specifies the protocol type for accessing resources. Case insensitive, followed by a colon (:). You can also specify the schema name of the data or script using data: or javascript:.

    Login information: Specify the user name and password as the login information (identity authentication) required to obtain resources from the server. This item is optional.

    Server address: You must specify the address of the server to be accessed using the absolute URI. The address can be a DNS resolvable name like hackr.jp, an IPv4 address name like 192.168.1.1, or an IPv6 address name enclosed in square brackets like 0:0:0:0:0:0:0:1.

    Server port number: specifies the network port number to which the server is connected. This option is optional. If omitted, the default port number is automatically used.

    Hierarchical file path: Specifies the file path on the server to locate the specified resource. This is similar to the file directory structure on UNIX systems.

    Query string: You can use the query string to pass in any parameter for a resource within a specified file path. This item is optional.

    Fragment identifiers: Fragment identifiers are usually used to mark the child resources within the resource that has been acquired.

  2. URL: Uniform resource locator, the URL represents the location of a resource (the location on the Internet). Is a subset of urIs.

Two, the simple HTTP protocol

2.1 HTTP Client and Server


Request message composition: request method, request URI, protocol version, optional request header field and content entity composition.

2.2 HTTP is a protocol that does not save state

HTPP is a non-saving, or stateless, protocol. The protocol does not persist requests or responses that have been sent. (sent and forgotten) to process a large number of transactions faster, ensuring the scalability of the protocol. In order to maintain state, Cookie technology is introduced.Copy the code

2.3 Requesting a URI to Locate Resources

The HTTP protocol uses URIs to locate resources on the Internet. (An * can be used instead of the request URI)Copy the code

2.4 HTTP method to inform the server of intent

  1. GET: Obtains resources. Used to request access to resources identified by the URI. The specified resource is parsed by the server to return the response content. If the requested resource is text, return it as is. (Status code returned from original sample: 304)
  2. POST: transmits the entity body.
  3. PUT: transfers files. HTTP/1.1’s PUT method has no authentication mechanism of its own, so anyone can upload your files, which is a security issue.
  4. HEAD: obtains the packet HEAD. The same as the GET method, except that the body part of the message is not returned. The user confirms the validity of the URI and the date and time when the resource is updated.
  5. DELETE: deletes a file. As with the PUT method, there is no verification mechanism.
  6. OPTIONS: Asks for supported methods. Used to query the method supported against the resource specified by the request URI. Returns methods supported by the server (GET,POST), etc.
  7. TRACE: indicates a tracing path. Method of getting the Web server to return the previous request traffic back to the client. At the time of sending the request, the forward field of max-forwards is filled with a value. After passing through each server, the value is reduced by one. When the value reaches zero, the transmission is stopped. (causes cross-site tracking).
  8. CONNECT: The tunnel protocol is required to CONNECT to the agent. It is required to establish a tunnel when communicating with the proxy server to realize TCP communication using the tunnel protocol. Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols are used to encrypt communication content and transmit it through network tunnels.

2.5 Persistent Connection saves traffic

To solve the TCP connection problem, HTTP/1.1 and HTTP/1.0 came up with HTTP Persistent Connections (also known as HTTP keep-alive or HTTP Connection reuse). The characteristic of a persistent connection is that the TCP connection remains as long as neither end explicitly disconnects.Copy the code

  1. Piped: Persistent connections make it possible for most requests to be piped. After sending the previous request, wait and receive the response before sending the next request. With the advent of pipelining, the next request can be sent directly without waiting for a response.
  2. Cookie state management: The Cookie notifies the client to save the Cookie according to the information in the header field called set-cookie in the response packet sent from the server. When the client sends a request to the server next time, the client automatically adds the Cookie value to the request packet and sends the request packet. After discovering the Cookie sent by the client, the server will check which client sent the connection request, and then compare the records on the server to obtain the previous status information.

3. HTTP information in HTTP packets

3.1 HTTP message

HTTP packets sent by the requesting end (client) are called request packets, and those sent by the responding end (server) are called response packets. The HTTP message itself is a string text composed of multiple lines of data (using CR+LF as a newline character).Copy the code
  1. Request line: contains the method used for the request, request URI, and HTTP version.
  2. Status line: Contains the status code, reason phrase, and HTTP version indicating the result of the response.
  3. Header fields: Various headers that contain the various conditions and attributes that represent the request and response. Generally, there are four types of headers, which are: general header, request header, response header and entity header.

3.2 Encoding improves transmission rate

HTTP can directly transmit data as it is, but can also improve the transmission rate through encoding during transmission. A large number of access requests can be efficiently handled by encoding at transport time. However, the operation of coding requires the computer to complete, so it will consume more RESOURCES such as CPU.Copy the code
  1. Message: The basic unit of HTTP communication. It consists of 8-bit byte streams and is transmitted through HTTP communication.
  2. Entity: Payload data (supplementary items) is transmitted as a request or response, and its content consists of an entity header and entity body.

Content encoding of compressed transmission: Commonly used encoding content includes the following: 1. Gzip (GNU Zip) 2.com Press (standard compression of UNX system) 3. Deflate (zlib) 4. Identity (not encoding) Chunking transfer coding divides the entity body into parts (blocks). Each block is marked with a hexadecimal size, while the last block of the entity body is marked with “0(CR+LF)”.

3.3 A collection of multi-part objects that send a variety of data

When we send an email, we can write text in the email and add multiple attachments. This is because of the MIME mechanism, which allows mail to handle many different types of data, such as text, images, and video. The multi-part object collection contains the following objects.Copy the code
  1. Multipart /form-data: used when uploading Web form files.
  2. Multipart/Byteranges: status code 206 Response packets contain multiple ranges of content

3.4 Obtaining scope requirements for some content

Range requests use the header field Range (Range: bytes=-3000, 5000-7000, 7000-) to specify the byte Range of the resource. The response packet with status code 206 Partial Content is returned. In addition, for multiple range requests, the response will return the multipart/byteranges in the header field ContentType. If the server is unable to respond to the range request, the status code 200 OK and the complete entity content are returned.Copy the code

3.5 Content Negotiation Returns the most appropriate content

  1. Server-driven negotiation: Content negotiation is performed by the server. The header field of the request is used as a reference and is automatically processed on the server side.
  2. Client-driven negotiation: the client negotiates content. The user manually selects from the list of options displayed in the browser.
  3. Transparent negotiation: a combination of server-driven and client-driven content negotiation by the server and client respectively.

4. HTTP status code for return results

The HTTP status code is responsible for representing the return result of the CLIENT'S HTTP request, marking the normal processing of the server, and notifying the error. Let's have a good understanding of the working mechanism of the status code through this chapter.Copy the code

200OK

Indicates that the request from the client is processed on the server.Copy the code

204 No Content

The status code indicates that the request received by the server is successfully processed, but the response packet returned does not contain the body part of the entity.Copy the code

206 Partial Content

This status code indicates that the client made a range request and that the server successfully executed that part of the GET request. The response message contains the entity Content in the Range specified by content-range.Copy the code

301 Moved Permanently

Permanent redirect. This status code indicates that the requested resource has been assigned a new URI and that the URI to which the resource now refers should be used later.Copy the code

302 Found

Temporary redirection. This status code indicates that the requested resource has been assigned a new URI and is expected to be accessed by the user using the new URI.Copy the code

303 See Other

Because the resource corresponding to the request has another URI, the GET method should be used to target the requested resource. The 303 status code has the same functionality as the 302 Found status code, but differs from the 302 status code in that the 303 status code explicitly states that the client should use the GET method to obtain the resource.Copy the code

304 Not Modified

Indicates the condition that the server allows the request to access the resource when the client sends a conditional request 2, but the condition is not met. (Conditional requests refer to GET request packets that contain any of if-match, if-modifiedSince, if-none-match, if-range, or if-unmodified-since headers.)Copy the code

307 Temporary Redirect

Temporary redirect.Copy the code

400 Bad Request

A syntax error occurred in the request packet. ProcedureCopy the code

401 Unauthorized

The sent request requires authentication information authenticated through HTTP. The authentication dialog box is displayed.Copy the code

403 Firbidden

Access to the requested resource was denied by the server.Copy the code

404 Not Found

The resource does not exist on the server. Path error.Copy the code

500 Internal Server Error

The server failed to execute the request. The server reported an internal error. ProcedureCopy the code

503 Service Unavailable

The server is temporarily overloaded or is down for maintenance and cannot process requests at this time.Copy the code

Web servers that collaborate with HTTP

A Web server can set up Web sites with multiple independent domain names or serve as a transfer server on the communication path to improve transmission efficiency.Copy the code

The agent

Is a forwarding application where the agent does not change the request URI and sends it directly to the target server holding the resource ahead. Multiple proxy servers can be cascaded. The Via header field needs to be attached to mark the passing host information; Agents can be used in a variety of ways, categorized by two benchmarks. One is whether to use caching, and the other is whether to modify packets. A Caching Proxy pre-stores a copy of the resource (cache) on the Proxy server. When the agent receives a second request for the same resource, it can not fetch the resource from the source server, but return the previously cached resource as a response. Transparent Proxy is a type of Proxy that does not process packets. On the contrary, an agent that processes the packet content is called an opaque agent.Copy the code

The gateway

Is a server that forwards communication data from other servers, and when it receives a request from a client, it processes the request as if it were a source server that owns its own resources. Using the gateway improves communication security because the communication line between the client and the gateway can be encrypted to ensure the security of the connection.Copy the code

The tunnel

An application that mediates between clients and servers that are far apart and maintains a communication connection. A communication line with other servers can be established as required, and encryption means such as SSL will be used to communicate. Ensure that the client can securely communicate with the server.Copy the code

Save a cache of resources

The cache is a copy of resources stored on the local disk of the proxy server or client. Cache servers are a type of proxy server and are grouped under the cache proxy type. Multiple forwarding of resources from the source server is avoided. The cache has an expiration date, the cache server will fetch "new" resources from the source server again when the cache expires.Copy the code