Recommended article: mp.weixin.qq.com/s?__biz=Mzk…

A, type the url to press the car will happen

HTTP request process:

  • The browser gets the IP address and port number of the server from the input in the address bar.
  • The browser establishes a connection with the server using TCP’s three-way handshake.
  • The browser sends a packet to the server.
  • After receiving the packet, the server processes the request and sends the packet to the browser.
  • The browser parses the message and renders the output page.

Domain name resolution: During domain name resolution, there are multiple levels of cache. The browser first checks whether the domain name resolution file exists in its cache. If the domain name resolution file does not exist, the browser requests the domain name resolution file hosts from the operating system

CDN: Because CDN can cache most of the resources of the website, such as images and CSS style sheets, some HTTP requests do not need to be sent to the website, and CDN can directly respond to your request and send data to you. CDN cannot cache dynamic resources generated by background services such as PHP and Java.

Summary:

  1. HTTP protocol is based on the underlying TCP/IP protocol, so you must use IP address to establish a connection;
  2. If you do not know the IP address, you must use the DNS protocol to resolve the IP address, otherwise the connection will fail.
  3. After a TCP connection is established, data is sent and received sequentially. Both the requestor and the responder construct and parse packets according to the HTTP specification.
  4. To reduce response time, there is a cache at every step of the process, enabling “short-circuit” operations;
  5. Although the HTTP transfer process in real life is very complex, it can still be reduced to the experimental “two-point” model in theory.

What does an HTTP packet look like?

The structure of HTTP request packets and response packets is basically the same and consists of three parts:

  • Start line: Describes basic information about the request or response. (Request header or response header)
  • Header field set: describes the packet in more detail in key-value format.
  • Entity: Data that is actually transmitted. It may not be plain text but can be binary data such as pictures and videos. (entity)

The request line consists of three parts:

  • Request method: is a verb, such as GET/POST, indicating an operation on a resource;
  • Request target: Usually a URI that marks the resource on which the request method will operate;
  • Version: Indicates the HTTP version used by the packet.

The status line also consists of three parts

  • Version: Indicates the HTTP version used by the packet.
  • Status code: a three-digit code representing the result of the processing, such as 200 for success, 500 for server error;
  • Reason: as a supplement to the digital status code, it is a more detailed explanation text to help people understand the reason.

Summary:

  • The HTTP packet structure is like a “big head”, consisting of “start line + header+ blank line + entity”. Simply put, it is “header+body”.
  • HTTP packets can have no body, but must have a header, and must have empty lines after the header.
  • The request header consists of “request line + header field”, and the response header consists of “status line + header field”.
  • The request line has three parts: request method, request target, and version number;
  • The status line also has three parts: version number, status code, and reason string;
  • Header fields are in the form of key-values, separated by colons (:), case insensitive, and in any order. In addition to the specified standard header, you can also add custom fields to achieve function expansion.
  • The only header field required in HTTP/1.1 is Host, which must appear in the request header and mark the virtual Host name.

Understand the request method

1. Common request methods

  • GET: access to resources, can be interpreted as reading or downloading data.
  • HEAD: get the meta information of the resource.
  • POST: writes or uploads data to a resource.
  • PUT: similar to POST;
  • DELETE: deletes resources.
  • CONNECT: establish a special connection tunnel;
  • OPTIONS: Lists the methods that can be implemented on the resource;
  • TRACE: Traces the transmission path of the request and response.

1.1 the GET

It means to request a resource from the server, which can be static text, pages, images, videos, or pages dynamically generated by PHP, Java, or other formats of data.

1.2 the HEAD

The HEAD method is similar to the GET method in that it is also a request to obtain a resource from the server. The server processes the resource in the same way, but the server does not return the requested entity data, only the response header, which is the “meta information” of the resource.

1.3 POST/PUT

POST is also a frequently used request method, the frequency of use should be second only to GET, there are many applications, as long as the server to send data, most of the use is POST.

The function of PUT is similar to that of POST. It can also submit data to the server, but it is slightly different from POST. POST usually means “create” or “create”, while PUT means “modify” or “update”.

1.4 Other Methods

The DELETE method instructs the server to DELETE the resource, which is so dangerous that the server usually does not actually DELETE the resource, but marks it with a DELETE mark. More often, of course, the server simply does not process the DELETE request.

CONNECT is a special method that requires the server to establish a special connection tunnel between a client and another remote server, with the Web server acting as a proxy.

The OPTIONS method asks the server to list the available operations on the resource, returned in the Allow field of the response header. It has limited functionality and is not very useful, and some servers (such as Nginx) simply don’t support it.

The TRACE method is used to test or diagnose HTTP links and display the request-response transmission path. It means well, but it has bugs that can leak information about a website, so Web servers are usually forbidden.

1.5 Differences between GET and POST

  • From a caching perspective, GET requests are actively cached by the browser, leaving a history, whereas POST requests are not cached by default.
  • From an encoding perspective, GET can only urL-encode and can only accept ASCII characters, while POST has no limitations.
  • From a parameter perspective, GET is generally placed in the URL and therefore is not secure, while POST is placed in the request body and is better suited for transmitting sensitive information.
  • fromidempotenceThe point of view,GETisPower etc., andPOSTIt isn’t. (Power etc.Means that the same operation is performed and the result is the same.
  • From a TCP perspective, a GET request sends the request all at once, while a POST is split into two TCP packets, with the header part first, and if the server responds with 100(continue), then the body part. (Except for Firefox, where POST requests only send a TCP packet)

2. Security and idempotent

By “safe” I mean that the request method does not “corrupt” the resources on the server, that is, does not substantially modify the resources on the server. The GET and HEAD methods are “safe” because they are “read-only” operations, and the data on the server is “safe” no matter how many times the GET and HEAD operations are performed, as long as the server does not intentionally misinterpret how the request methods are handled.

An idempotent operation is one that performs the same operation many times and results in the same result, i.e., the result is “equal” after many “powers”. GET and HEAD are both safe and idempotent, while DELETE can DELETE the same resource multiple times with the effect that the resource does not exist, so it is also idempotent.

summary

  • A request method is an operation on a resource that is issued by a client and required to be performed by the server.
  • The request method is an “instruction” to the server, which decides what to do with it.
  • The most common request methods are GET and POST, which fetch data and send data respectively.
  • The HEAD method is a lightweight GET that fetches meta information about a resource.
  • PUT is basically a synonym for POST, mostly used to update data;
  • “Security” and “idempotence” are two important attributes to describe the request method, which have theoretical guiding significance and can help us design the system.

Can you write the correct web address?

  • A URI is a string that uniquely identifies a resource on a server, often called a URL;
  • A URI usually consists of scheme, host:port, Path, and Query, some of which can be omitted.
  • Scheme is called “scheme name” or “protocol name”, indicating which protocol should be used to access resources.
  • Host :port indicates the host name and port number of the resource.
  • Path marks the location of the resource;
  • Query represents an additional request for a resource;
  • Special characters such as @&/ and Chinese characters must be encoded in the URI. Otherwise, the server may fail to process HTTP packets correctly.

5. Response status code

1. Common five types of status codes

  • 1 x x: prompt, indicating that the protocol processing is in the intermediate state and subsequent operations are required.
  • 2 x x: Yes, the packet is received and processed correctly.
  • 3 x x: indicates redirection. The resource location changes and the client needs to resend the request.
  • 4 x x: A client error occurs. The request packet is incorrect and the server cannot process the request packet.
  • 5 x x: Server error. An internal error occurred when the server was processing the request.

2. Status codes are described one by one

1 x 2.1

The 1 x x status code is a prompt message and is an intermediate state in protocol processing. It is rarely used.

2.2 2 x

2 x x status code Indicates that the server receives and successfully processes the request from the client, which is the status code that the client prefers to see.

  1. 200 OK means everything is fine, and there is usually body data after the response header.

  2. 204 No Content It has basically the same meaning as “200 OK”, except that there is No body data after the response header

  3. 206 Partial Content Status code 206 is usually accompanied by the content-range header field, indicating the specific Range of body data in the response packet

2.3 3 x

3 x x status code Indicates that the resource requested by the client has changed. The client must re-send the request to obtain the resource using a new URI, which is commonly referred to as “redirection”, including the famous redirect 301 and 302.

  1. 301 Moved Permanently Permanently. This means that the requested resource no longer exists, and you need to access it again using a new URI. The response header uses the Location field to indicate which URI to jump to later.
  2. “302 Found”, commonly known as “temporary redirection”, means that the requested resource is still there but needs to be accessed temporarily with another URI. The response header uses the Location field to indicate which URI to jump to later.
  3. “304 Not Modified” It is used for conditions such as if-modified-since to indicate that the resource has Not been Modified for cache control. It does not have the usual meaning of a jump, but can be understood as “redirects to cached files” (i.e., “cache redirects”).

4 x 2.4

4 x x status code Indicates that the request packet sent by the client is incorrect and the server cannot process it. This is an error code.

  1. 400 Bad Request is a common error code. It indicates that the Request packet has an error. Whether the data format is incorrect, the Request header is missing, or the URI is too long, it is a general error.
  2. “403 Forbidden” is not actually an error request from the client, but indicates that the server forbids access to the resource.
  3. 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found
  4. Other:
  • 405 Method Not Allowed: Certain methods are Not Allowed to operate resources. For example, POST is Not Allowed. Only GET is Allowed.
  • 406 Not Acceptable: The resource does Not meet the requirements of the client request, for example, the request is in Chinese but English only;
  • 408 Request Timeout: The server waits too long for the Request Timeout.
  • 409 Conflict: Multiple requests are in Conflict, which can be understood as a race when multiple threads are concurrent;
  • 413 Request Entity Too Large: The body in the Request packet is Too Large.
  • 414 request-uri Too Long: The URI in the Request line is Too large.
  • 429 Too Many Requests: The client sent Too Many Requests, usually due to the server’s connection restriction policy;
  • 431 Request Header Fields Too Large: A field or the total size of the Request Header is Too Large.

5 x 2.5

5 x x status code Indicates that the client requests the packet correctly, but the server fails to return the response data due to an internal error during processing.

  1. “500 Internal Server Error” is a common Error code similar to 400. What is the Error that we do not know about
  2. “501 Not Implemented” means the functionality requested by the client is Not yet supported. This error code is “softer” than 500, and is similar to “coming soon, stay tuned,” though it’s Not clear when.
  3. 502 Bad Gateway is usually an error code returned when the server functions as a Gateway or proxy, indicating that the server works properly but an error occurs when accessing the back-end server. The specific cause of the error is unknown.
  4. 503 Service Unavailable indicates that the server is busy and cannot respond to services temporarily. When we go online, the message “The network Service is busy. Please try again later” is the status code 503. 503 Response messages usually also have a retry-after field indicating how long it will be before the client tries to send the request again.

summary

  • The status code represents the processing result of the request in the response message.
  • The reason phrase after the status code is a simple text description that can be customized.
  • The status code is a three-digit decimal number divided into five categories, 100 through 599;
  • 2 x x status code indicates success. 200, 204, 206 are commonly used.
  • 3 x x status codes indicate redirection. 301, 302, and 304 are commonly used.
  • 4 x x status codes indicate client errors. Common status codes are 400, 403, and 404.
  • 5 x x status code indicates a server error. Common status codes are 500, 501, 502, and 503.

Six, HTTP characteristics

1. Be flexible and scalable

For example, header+body, the syntax and semantics of each component of a packet are not strictly restricted and can be customized by the developer.

2. Reliable transmission

Because HTTP is based on TCP/IP, and TCP itself is a “reliable” transport protocol, HTTP naturally inherits this feature to “reliably” transfer data between requestor and responder.

We must correctly understand the meaning of “reliable”, HTTP is not 100% guarantee that the data will be sent to the other end, in a busy network, poor connection quality and other bad environment, may be sent and received failure. “Reliable” simply provides the user with a “promise” that the data will be delivered “as far as possible” by various means below.

3. Application layer protocol

HTTP is a protocol that can deliver almost anything and satisfy every need, as long as it’s not too performance-demanding.

4. Request-reply

This request-response mode is the most fundamental communication model of THE HTTP protocol. It is commonly referred to as “round and round” and “back and forth”. It is like a function call when writing code.

5. A stateless

Clients and servers are always in a state of “ignorance”. The two devices do not know each other before the connection is established, and each packet sent and received is independent of each other. Neither the client nor the server is affected by sending or receiving messages, and no information is required to be saved after the connection.

Use cookie technology to achieve stateful

6. Other features

The entity data transmitted can be cached and compressed, segmented to obtain data, support identity authentication, support international language, etc

summary

  1. HTTP is flexible and extensible, and you can add any header field to achieve any function.
  2. HTTP is a reliable transport protocol, based on TCP/IP protocol “as much as possible” to ensure the delivery of data;
  3. HTTP is an application-layer protocol. It has more general functions than FTP and SSH, and can transmit arbitrary data.
  4. HTTP uses the request-response mode, in which the client initiates a request and the server responds to it passively.
  5. HTTP is stateless in nature. Each request is independent and unrelated, and the protocol does not require the client or server to record information about the request.

Advantages and disadvantages of HTTP

  1. HTTP’s greatest strengths are simplicity, flexibility, and ease of extension;
  2. HTTP has a mature software and hardware environment and is widely used as the infrastructure of the Internet.
  3. HTTP is stateless and can be easily clustered to extend performance, but Cookie technology is sometimes needed to achieve “stateful”;
  4. HTTP is plaintext transmission, data completely visible to the naked eye, can be easily studied and analyzed, but also easy to be eavesdropped;
  5. HTTP is insecure. It cannot verify the identities of communication parties or determine whether the packets are tampered.
  6. HTTP performance isn’t bad, but it’s not fully suited to today’s Internet, and there’s a lot of room for improvement.