caibaojian.com/http.html


The HTTP protocol contains: common header fields, request messages, response messages, and body information. ,

HyperText Transfer Protocol (HTTP) is the most widely used network Protocol on the Internet. All WWW files must comply with this standard. HTTP was originally designed to provide a way to publish and receive HTML pages. In 1960, Ted Nelson, an American, conceived a way to process text information by computer and called it hypertext, which became the foundation of the standard architecture of HTTP hypertext transfer protocol. Ted Nelson coordinated a collaboration between the World Wide Web Consortium and the Internet Engineering Task Force that resulted in the release of a series of RFCS, The famous RFC 2616 defines HTTP 1.1.

1 Technical Architecture

HTTP is a standard for client – and server-side requests and responses (TCP). The client is the end user and the server is the website. By using a Web browser, Web crawler, or other tool, the client makes an HTTP request to a specified port (default port 80) on the server. This client is called the User Agent. The answering server stores resources, such as HTML files and images. This reply server is called the Origin Server. It may exist between the user agent and the source server



Multiple intermediate layers, such as agents, gateways, or tunnels. Although TCP/IP is the most popular application on the Internet, the HTTP protocol does not stipulate that it must be used and (based on) the layers it supports. In fact, HTTP can be implemented over any other Internet protocol, or over any other network. HTTP only assumes reliable transport, and any protocol that provides such a guarantee can be used by it.

Typically, an HTTP client initiates a request to establish a TCP connection to a specified port (default: port 80) on the server. The HTTP server listens for requests from clients on that port. Once the request is received, the server sends back a status line, such as “HTTP/1.1 200 OK”, and a message, the body of which may be the requested file, an error message, or some other information.



The reason HTTP uses TCP rather than UDP is that a web page must transmit a lot of data, whereas TCP provides transport control, ordering data, and error correction.

Resources requested over HTTP or HTTPS are identified by Uniform Resource Identifiers (or, more accurately, URLs).

2 Protocol Functions

HyperText Transfer Protocol (HTTP) is used to Transfer HyperText from WWW server to local browser. It can make browsers more efficient and reduce network traffic. It not only ensures that the computer transfers the hypertext document correctly and quickly, but also determines which parts of the document to transfer and which parts of the content to display first (e.g. text before graphics).
HTTP is an application-layer communication protocol between a client browser or other program and a Web server. Hypertext information is stored in Web servers on the Internet. Clients need to transfer the hypertext information to be accessed through HTTP protocol. HTTP contains commands and transmission information. It can be used not only for Web access, but also for communication between other Internet/Intranet applications. In this way, hypermedia access of various application resources can be integrated.

The web address we type into the address bar of our browser is called a Uniform Resource Locator (URL). Just as every home has a house address, every web page has an Internet address. When you are in



When you type a URL into the browser’s address box or click on a hyperlink, the URL determines the address to visit. Browsers use hypertext Transfer Protocol (HTTP) to extract the code from a Web server site and translate it into a beautiful Web page.

3 Protocol Basis

HyperText Transport Protocol (HTTP) is short for HyperText Transport Protocol. It is used to transmit data in WWW mode. For details about HTTP, see RFC2616. The HTTP protocol uses a request/response model. The client sends a request to the server. The request header contains the request method, URL, protocol version, and a MIME-like message structure containing request modifiers, client information, and content. The server responds with a status line containing the version of the message protocol, a success or error code, plus server information, entity meta-information, and possible entity content.
Typically HTTP messages include a request message from a client to a server and a response message from a server to a client. These two types of messages consist of a starting line, one or more header fields, a blank line indicating the end of the header field, and an optional message body. HTTP header field includes four parts: general header, request header, response header and entity header. Each header field consists of a domain name, a colon (:), and the field value. The domain name is case insensitive and can be preceded by any number of Spaces. The header field can be extended to multiple lines, with at least one space or TAB character at the beginning of each line.

General head domain

Common header fields include header fields supported by both request and response messages. Common header fields include cache-Control, Connection, Date, Pragma, Transfer-Encoding, Upgrade, and Via. The extension of common header fields requires that both communication parties support this extension. If there is an unsupported common header field, it is generally treated as an entity header field. Here are a few common header fields used in UPnP messages:
1. The cache-control header fields
Cache-control specifies the caching mechanism followed by requests and responses. Setting cache-control in a request or response message does not modify the caching process in the other message process. The cache instructions at request include no-cache, no-store, max-age, max-stale, min-fresh, and only-if-cached. Directives in response messages include public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, and max-age. The meanings of the commands in each message are as follows:
Public indicates that the response can be cached by any cache.

Private indicates that all or part of a response message from a single user cannot be processed by the shared cache. This allows the server to only describe as the user



This response message is not valid for another user’s request.

No-cache indicates that request or response messages cannot be cached
No-store is used to prevent important information from being inadvertently published. Sending in the request message will result in neither the request nor the response message using caching.
Max-age indicates that a client can receive a response with a lifetime not longer than the specified time in seconds.
Min-fresh indicates that the client can receive a response with a response time less than the current time plus the specified time.
Max-stale indicates that the client can receive response messages beyond the timeout period. If you specify a value for a max-stale message, the client can receive a response message beyond the specified value for the timeout period.
HTTP Keep-Alive
The keep-alive function keeps the connection from the client to the server Alive and prevents the establishment or re-establishment of the connection when there is a subsequent request to the server. Most Web servers on the market, including iPlanet, IIS, and Apache, support HTTP keep-Alive. This feature is often useful for sites that offer static content. However, there is another problem for heavily burdened sites: while there is some benefit to keeping open connections for customers, it also affects performance because resources that could have been freed up are still held up during processing pauses. The keep-alive feature has a particularly significant impact on resource utilization when the Web server and application server are running on the same machine.
KeepAliveTime controls how often TCP/IP attempts to verify that an idle connection is intact. If there is no activity during this time, a remain active signal is sent. If the network is working properly and the receiver is active, it responds. Consider reducing this value if you need to be sensitive to missing recipients, in other words, to find missing recipients more quickly. If the number of idle connections that are inactive for a long time is high and the number of lost recipients is low, you might want to increase this value to reduce overhead. By default, Windows sends a keepalive message if the idle connection is inactive for 7200,000 milliseconds (2 hours). Typically, 1800000 milliseconds is the preferred value, so that half of closed connections are detected within 30 minutes. The KeepAliveInterval value defines how often TCP/IP repeatedly sends keepalive messages if no response is received from the receiver. When continuous activity send signal, but not yet received response times beyond the value of TcpMaxDataRetransmissions, will give up the connection. If you expect long response times, you may need to increase this value to reduce overhead. If need to reduce the spent time to verify whether or not the receiver has been lost, please consider reducing the value or TcpMaxDataRetransmissions value. By default, Windows waits 1000 milliseconds (1 second) before resending the keepalive message without receiving a response. KeepAliveTime can be set according to your needs, such as 10 minutes, be sure to convert to MS. XXX represents the interval value.
2. The Date field
The Date header field indicates the time when the message is sent. The time description format is defined by RFC822. For example, the Date: Mon, 31 dec200104:25:57 GMT. Date Indicates the time zone where the user is located when converting to the local time.
3. The Pragma header fields
Pragma header fields are used to contain implementation-specific instructions. The most common is Pragma:no-cache. In HTTP/1.1, it has the same meaning as cache-control :no-cache.

The request message

The first behavior of the request message looks like this:
Methodsprequest-urisphttp-versioncrlfmethod Indicates that this field is case-sensitive for request-URI completed methods, including OPTIONS, GET, HEAD, POST, PUT, DELETE, and TRACE. The methods GET and HEAD should be supported by all generic WEB servers, and the implementation of all other methods is optional. The GET method retrieves the information identified by the Request-URI. The HEAD method also retrieves the information identified by the Request-URI, but can respond without returning the message body. The POST method can be used to request the server to receive information about entities included in the request, submit forms, and send messages to newsgroups, BBS, mailing groups, and databases.

SP indicates the space. Request-uri follows the URI format, and an asterisk (*) in this field indicates that the Request is not for a specific resource address, but for the server itself. Http-version Indicates the supported HTTP Version, for example, HTTP/1.1. CRLF represents the newline carriage return character. The request header field allows the client to pass additional information about the request or about the client to the server. The request header field may contain the following fields Accept, accept-charset, accept-encoding, accept-language, Authorization, From, Host, if-Modified-since, if-match, and If- Non-match, if-range, if-range, if-unmodified-since, max-max-forward, proxy-authorization, Range, Referer, user-agent. The extension of the request header field requires the support of both communication parties. If there is an unsupported request header field, it is generally treated as an entity header field.

Typical request message:
Host: download.*******.de
Accept: */*
Pragma: no-cache
Cache-Control: no-cache
The user-agent: Mozilla / 4.04 [en] (small subsidiary; I; Nav)
Range: bytes=554554-
The first line of the preceding example indicates that the HTTP client (possibly a browser or a download program) obtains the file at the specified URL through GET. The brown part represents the information for the request header field, and the green part represents the common header.
1. The Host header fields
The Host header field specifies the Intenet Host and port number of the requested resource. It must represent the location of the original server or gateway from which the URL was requested. HTTP/1.1 requests must contain a host header field, or the system will return a 400 status code.
2. The Referer header fields
The Referer header field allows the client to specify the source resource address of the request URI, which allows the server to generate a rollback list that can be used for login, cache tuning, and so on. It also allows aborted or faulty connections to be traced for maintenance purposes. If the requested URI does not have its own URI address, the Referer cannot be sent. If a partial URI address is specified, it should be a relative address.
3. The Range header fields
The Range header field can request one or more subscopes of an entity. For example,
Bytes: 0-499 Indicates the first 500 bytes
Indicates the second 500 bytes: bytes=500-999
Indicates the last 500 bytes: bytes=-500
Indicates the range after 500 bytes: bytes=500-
First and last bytes: bytes=0-0,-1
Specify several ranges: bytes=500-600,601-999
However, the server can ignore this request header, and if the unconditional GET contains the Range request header, the response will be returned with status code 206 (PartialContent) instead of 200 (OK).
4. The user-agent header fields
The contents of the user-Agent header field contain information about the User making the request.

The response message

The first behavior of the response message looks like this:
HTTP-VersionSPStatus-CodeSPReason-PhraseCRLF
Http-version Indicates the supported HTTP Version, for example, HTTP/1.1. Status-code is a three-digit result Code. Reason-phrase provides a simple text description for status-code. Status-code is used for automatic identification, and reason-phrase is used to help users understand. The first number of a status-code defines the category of the response, and the last two numbers have no sorting effect. The first number may take five different values:
1XX: Information response class, indicating that the request has been received and processing continues
2xx: Processing successful response class, indicating that the action is successfully received, understood, and accepted
3xx: Redirect response class that must accept further processing in order to complete the specified action
4xx: client error. The client request contains syntax errors or is not executed correctly
5xx: server error. The server cannot execute a correct request correctly
The response header fields allow the server to pass additional information that cannot be placed in the status line. These fields mainly describe server information and request-URI further information. The response header fields include Age, Location, proxy-authenticate, Public, Retry-after, Server, Vary, Warning, and www-authenticate. The extension of response header fields requires the support of both communication parties. If an unsupported response header field exists, it is generally treated as an entity header field.
Typical response message:
HTTP / 1.0200 OK
Date:Mon,31Dec200104:25:57GMT
Server: Apache / 1.3.14 (Unix)
Content-type:text/html
Last-modified:Tue,17Apr200106:46:28GMT
Etag:”a030f020ac7c01:1e9f”
Content-length:39725426
Content-range:bytes55******/40279980
The first line of the previous example shows that the HTTP server responds with a GET method. The brown part represents the response header field information, the green part represents the common header, and the red part represents the entity header field information.
1. The Location response headers
The Location response header is used to redirect the receiver to a new URI address.
2. The Server response headers
The Server response header contains the software information for the original Server processing the request. This field can contain multiple product identifiers and annotations, typically in order of importance.

Entity information

Both request messages and response messages can contain entity information, which generally consists of entity header fields and entities. The entity header field contains the original information about the entity, Entity headers include Allow, Content-Base, Content-Encoding, Content-Language, Content-Length, Content-location, content-MD5, content-range, and Cont Ent -Type, Etag, Expires, Last-Modified, extension-header. Extension-headers allow clients to define new entity headers, but these fields may not be recognized by the recipient. An entity can be an encoded byte stream, whose Encoding is defined by content-encoding or Content-Type, and whose Length is defined by content-Length or content-range.
1. The content-type entity
The content-type entity header is used to indicate the entity’s media Type to the recipient, specifying the Type of media that the HEAD method sends to the recipient, or the Type of media that the GET method sends to the request
2. The Content – head Range entities
The Content-range entity header is used to specify where to insert a portion of the entire entity. It also indicates the length of the entire entity. When the server returns a partial response to the client, it must describe the response coverage and the entire entity length. General format:
Content-Range:bytes-unitSPfirst-byte-pos-last-byte-pos/entity-legth
For example, send the first 500 bytes of a field: Content-range :bytes0-499/1234 If an HTTP message contains this section (for example, a response to a Range request or an overlapping request for a series of ranges), content-range indicates the Range to be transmitted, and Content-Length indicates the number of bytes actually transmitted.
3. The last-modified entity
The last-Modified entity header specifies the Last revision time of the content saved on the server.
For example, send the first 500 bytes of a field: Content-range :bytes0-499/1234 If an HTTP message contains this section (for example, a response to a Range request or an overlapping request for a series of ranges), content-range indicates the Range to be transmitted, and Content-Length indicates the number of bytes actually transmitted.

4 Operation Mode

In the WWW, “client” is a relative term to “server” and exists only for the duration of a particular connection, i.e. a client in one connection may act as a server in another. Based on HTTP protocol client/server mode of information exchange process, it is divided into four processes: establish a connection, send request information, send response information, close the connection.

The HTTP protocol is based on the request/response paradigm. When a client establishes a connection to a server, it sends a request to the server in the form of a Uniform Resource Identifier (URI), protocol version number, followed by MIME information including request modifiers, client information, and possible content. After receiving the request, the server sends the corresponding response information in the form of a status line containing the protocol version number of the message, a success or error code, followed by MIME information including server information, entity information, and possible content.

In short, any server that contains HTML files also has an HTTP resident program that responds to user requests. Your browser is an HTTP client and sends a request to the server. When a start file is entered or a hyperlink is clicked in the browser, the browser sends an HTTP request to the server, which is sent to the URL specified by the IP address. The resident program receives the request and sends back the requested file after doing the necessary operations. In this process, the data sent and received on the network is divided into one or more packets. Each packet contains: data to be transmitted; Control information, which tells the network how to handle packets. TCP/IP determines the format of each packet. If you weren’t told beforehand, you might not know that information is broken up into many small pieces for transmission and reassembly.

Many HTTP communications are initiated by a user agent and include a request for resources on the source server. The simplest case might be done through a separate connection between the user agent (UA) and the source server (O).
Things get a little more complicated when one or more mediations appear in the request/response chain. There are three types of mediations: Proxy, Gateway, and Tunnel. An agent accepts requests based on the absolute format of the URI, overrides all or part of the message, and sends the formatted request to the server through the URI’s identity. A gateway is a receiving proxy that acts as a layer on top of some other server and translates requests to the underlying server protocol if necessary. A channel acts as a relay point between two connections that do not change messages. Channels are often used when communication needs to go through a mediation (such as a firewall, etc.) or when the mediation cannot recognize the content of the message.

5 Packet Format

HTTP messages consist of requests from the client to the server and responses from the server to the client. The format of the request packet is as follows:
Request line – General information Header – Request Header – Entity Header – Message body
The request line begins with a method field, followed by a URL field and an HTTP protocol version field, respectively, and ends with CRLF. SP is the separator. Except that CF and LF are required in the final CRLF sequence, nothing else is necessary. See the documentation for details on generic headers, request headers, and entity headers.
The reply packet format is as follows:
Status line – General message Header – Response Header – Entity Header – Message body
The status code consists of three digits that indicate whether the request is understood or fulfilled. Cause analysis is a short description of the status code of the original text, the status code is used to support automatic operation, and cause analysis is used by users. The client does not need to check or display the syntax. See the documentation for more information on generic headers, response headers, and entity headers.

6 Working Principles

An HTTP operation is called a transaction, and it works in four steps:
First of all,
The client and server need to establish a connection. Just click on a hyperlink and the WORK of HTTP begins.
After the connection is established, the client sends a request to the server in the form of a uniform resource Identifier (URL), protocol version number, followed by MIME information including request modifiers, client information, and possible content.
After receiving the request, the server sends the corresponding response information in the form of a status line, including the protocol version number of the message, a success or error code, followed by MIME information including server information, entity information, and possible content.

The client receives the information returned by the server and displays it on the user’s display screen through the browser. Then the client disconnects from the server.


If an error occurs in any of the preceding steps, the error message is returned to the client and output by the display. For the user, this process is done by HTTP itself. The user just clicks the mouse and waits for the information to appear.
Many HTTP communications are initiated by a user agent and include a request for resources on the source server. The simplest case might be to do this through a separate connection between the user agent and the server. On the Internet, HTTP communication usually takes place over TCP/IP connections. The default port is TCP 80, but other ports are also available. However, this does not mean that HTTP can only be implemented on top of other protocols on the Internet or other networks. HTTP promises only a reliable transport.
This process is just like ordering goods by phone. We can call the merchant and tell him what specifications we need, and then the merchant can tell us what is in stock and what is out of stock. We do this by telephone (HTTP is TCP/IP), and of course we can also fax, as long as there is a fax on the merchant side.

7 Status Message

1 xx: information
The message
describe
100 Continue
The server only receives part of the request, but once the server does not reject the request, the client should continue to send the remaining requests.
101 Switching Protocols
Server translation protocol: The server converts compliance with a client’s request to another protocol.
2 xx: success
The message
describe
200 OK
The request was successful (followed by the reply document for the GET and POST requests).
201 Created
The request is created and the new resource is created.
202 Accepted
The request for processing was accepted, but processing did not complete.
203 Non-authoritative Information
The document has returned normally, but some of the reply headers may be incorrect because a copy of the document is being used.
204 No Content
No new documents. The browser should continue to display the original document. This status code is useful if the user refreshes the page periodically and the Servlet can determine that the user document is sufficiently new.
205 Reset Content
No new documents. But the browser should reset what it displays. Used to force the browser to clear form input.
206 Partial Content
The client sends a GET request with a Range header, and the server completes it.
Xx: redirect
The message
describe
300 Multiple Choices
Multiple choices. List of links. The user can select a link to reach the destination. A maximum of five addresses are allowed.
301 Moved Permanently
The requested page has been moved to the new URL.
302 Found
The requested page has been temporarily moved to the new URL.
303 See Other
The requested page can be found at a different URL.
304 Not Modified
Documents are not modified as expected. The client has the buffered document and makes a conditional request (typically providing an if-modified-since header to indicate that the client only wants to update the document after the specified date). The server tells the client that the originally buffered document can still be used.
305 Use Proxy
The document requested by the customer should be retrieved through the proxy server specified in the Location header.
306
Unused
This code was used for the previous version. It is no longer in use, but the code remains.
307 Temporary Redirect
The requested page has been temporarily moved to the new URL.
4xx: Client error
The message
describe
400 Bad Request
The server failed to understand the request.
401 Unauthorized
The requested page requires a username and password.
401.1
Login failed.
401.2
The login failed due to server configuration. Procedure
401.3
Not authorized due to ACL resource restrictions.
401.4
Filter authorization failed.
401.5
ISAPI/CGI application authorization failed.
401.7
Access denied by the URL authorization policy on the Web server. This error code is specific to IIS 6.0.
402 Payment Required
This code is not yet available.
403 Forbidden
Access to the requested page is disabled.
403.1
Execution access is prohibited.
403.2
Read access is prohibited.
403.3
Write access is disabled.
403.4
For SSL.
403.5
SSL 128 is required.
403.6
The IP address was rejected. Procedure
403.7
Client certificates are required.
403.8
Site access denied.
403.9
Too many users.
403.10
The configuration is invalid.
403.11
Password change.
403.12
Access to the mapping table is denied.
403.13
The client certificate is revoked. Procedure
403.14
Reject the directory list.
403.15
Client access permission exceeded.
403.16
The client certificate is not trusted or invalid.
403.17
The client certificate has expired or has not taken effect.
403.18
The requested URL cannot be executed in the current application pool. This error code is specific to IIS 6.0.
403.19
CGI cannot be performed for clients in this application pool. This error code is specific to IIS 6.0.
403.20
Passport login failed. Procedure This error code is specific to IIS 6.0.
404 Not Found
The server could not find the requested page.
404.0
(None) – No file or directory was found.
404.1
The Web site could not be accessed on the requested port.
404.2
Web services extended locking policy prevents this request.
404.3
MIME mapping policy prevents this request.
405 Method Not Allowed
The method specified in the request is not allowed.
406 Not Acceptable
The response generated by the server was not accepted by the client.
407 Proxy Authentication Required
The user must first authenticate with a proxy server before the request can be processed.
408 Request Timeout
The request exceeded the server wait time.
409 Conflict
The request could not be completed due to a conflict.
410 Gone
The requested page is not available.
411 Length Required
“Content-length” is not defined. Without this content, the server will not accept the request.
412 Precondition Failed
The preconditions in the request were assessed as failure by the server.
413 Request Entity Too Large
The server will not accept the request because the requested entity is too large.
414 Request-url Too Long
The server will not accept the request because the URL is too long. This happens when a POST request is converted into a GET request with long query information.
415 Unsupported Media Type
The server will not accept requests because the media type is not supported.
416 Requested Range Not Satisfiable
The server could not satisfy the Range header specified by the customer in the request.
417 Expectation Failed
The execution failed. Procedure
423
Locking error.
5xx: Server error
The message
describe
500 Internal Server Error
Request not completed. The server encountered an unexpected condition.
500.12
The application is busy restarting on the Web server.
500.13
The Web server is too busy.
500.15
Direct requests to global.asa are not allowed.
500.16
The UNC authorization credentials are incorrect. This error code is specific to IIS 6.0.
500.18
URL authorization store cannot be opened. This error code is specific to IIS 6.0.
500.100
Internal ASP error.
501 Not Implemented
Request not completed. The requested functionality is not supported by the server.
502 Bad Gateway
Request not completed. The server received an invalid response from the upstream server.
502.1
The CGI application timed out. ,
502.2
CGI application error.
503 Service Unavailable
Request not completed. The server is temporarily overloaded or down.
504 Gateway Timeout
The gateway timed out.
505 HTTP Version Not Supported
The server does not support the HTTP protocol version specified in the request.

8 Version History

The HYPERtext Transfer protocol has evolved into many versions, most of which are backward compatible. HTTP is described in RFC 2145



Usage of the version number. The client tells the server its protocol version number at the beginning of the request, and the server responds with the same or earlier protocol version.

0.9Has been out of date. Only GET is accepted as a request method, the version number is not specified in the communication, and the request header is not supported. Since this version does not support the POST method, the client cannot pass much information to the server.
HTTP / 1.0This was the first version of the HTTP protocol to have a version number specified in communication, and it is still widely used today, especially in proxy servers.
HTTP / 1.1Current version. Persistent connections are adopted by default and work well with proxy servers. Multiple requests can also be piped simultaneously to reduce line load and increase transmission speed.
The differences between HTTP/1.1 and HTTP/1.0 are as follows:
1 Cache Processing
2 bandwidth optimization and network connection usage
3 Error notification management
4 Sending messages on the network
5 Maintaining Internet Addresses
6 security and integrity



Front-end development blog