preface

Short step without thousands of miles, not small streams into rivers and seas. Learning is like rowing upstream; not to advance is to drop back. I’m a hauler who drifts from platform to platform. An article with you in detail HTTP protocol (network protocol a) nonsense not to say, directly to everyone dry goods, I hope to help you, excellent people have been praised.

For a preview of this long article, take a mind map. I have a general idea.

An overview,

1. Hierarchical computer network architecture

Computer network architecture layering

2.TCP/IP communication flow

When the TCP/IP protocol family is used for network communication, the communication with the peer party is hierarchical and sequential. The sending end goes down from the application layer, and the receiving end goes up from the link layer. As follows:

TCP/IP traffic

  • First, the sending client makes an HTTP request at the application layer (HTTP protocol) to view a Web page.
  • In order to facilitate transmission, the transport layer (TCP) divides the data (HTTP request packets) received from the application layer, marks the serial number and port number of each packet and forwards it to the network layer.
  • At the network layer (IP protocol), add the MAC address as the communication destination and forward the MAC address to the link layer. This way, the communication request to the network is ready.
  • The server at the receiving end receives data at the link layer and sends the data to the upper layer in sequence, all the way to the application layer. HTTP requests sent by clients are received only when they are transmitted to the application layer.

As shown below:

The HTTP request

In the network architecture, there are many network protocols. This article mainly focuses on the HTTP protocol (HTTP/1.1 version).

HyperText Transfer Protocol (HTTP) is used to Transfer HyperText from WWW server to local browser. It can make browsers more efficient and reduce network traffic. It not only ensures that the computer transfers the hypertext document correctly and quickly, but also determines which parts of the document to transfer and which parts of the content to display first (e.g. text before graphics). HTTP is an application-layer communication protocol between a client browser or other program and a Web server. Hypertext information is stored in Web servers on the Internet. Clients need to transfer the hypertext information to be accessed through HTTP protocol. HTTP contains commands and transmission information. It can be used not only for Web access, but also for communication between other Internet/Intranet applications. In this way, hypermedia access of various application resources can be integrated. The web address we type into the address bar of our browser is called a Uniform Resource Locator (URL). Just as every home has a house address, every web page has an Internet address. When you type a URL into the browser’s address box or click on a hyperlink, the URL determines the address to visit. Browsers use hypertext Transfer Protocol (HTTP) to extract the code from a Web server site and translate it into a beautiful Web page.

Second, HTTP working process

HTTP request response model

HTTP communication mechanism In a complete HTTP communication process, the client and server will complete the following seven steps:

  1. Establishing a TCP Connection Before HTTP work begins, the client must first establish a connection with the server over the network. This connection is accomplished through TCP. This protocol and IP together build the Internet, which is known as the TCP/IP protocol family. So the Internet is also called the TCP/IP network. HTTP is an application layer protocol with a higher level than TCP. According to the rules, a connection with a higher layer protocol can be established only after a lower layer protocol is established. Therefore, a TCP connection must be established first.
  2. The client sends a request command to the server

    Once a TCP connection is established, the client sends a request command to the server.

    Such as:GET/sample/hello. JSP HTTP / 1.1
  3. After sending its request command, the client sends some additional information to the server in the form of a header. The client then sends a blank line to inform the server that it has finished sending the header.
  4. Server reply

    After the client sends a request to the server, the server returns a response.

    Such as:HTTP / 1.1 200 OK

    The first part of the response is the protocol version number and response status code
  5. Just as the client sends information about itself along with the request, the server sends data about itself and the requested document to the user along with the response;
  6. The server sends the data to the client. Once the server sends the header to the client, it sends a blank line indicating that the header is sent to this end, and then it sends the actual data requested by the user in the format described in the Content-Type response header;
  7. The server closes the TCP connection

    Normally, once the server returns the request data to the client, it closes the TCP connection, and then if the client or server adds this line of code to its headerConnection:keep-aliveThe TCP connection remains open after being sent, so the client can continue sending requests over the same connection. Keeping the connection saves the time required to establish a new connection for each request and saves network bandwidth.

3. HTTP protocol basis

1. Communicate by exchanging requests and responses

When the HTTP protocol is used, one end must serve as the client and the other as the server. The roles of the server and the customer side are defined in terms of a single communication line. According to the HTTP protocol, a request is made from the client, and the server responds to the request and returns. In other words, the communication must start with the client, and the server will not send a response until the request is received.

2.HTTP is a protocol that does not save state

HTTP is a stateless protocol. The protocol itself does not store the state of communication between requests and responses. That is, at the HTTP level, the protocol does not persist requests or responses that have been sent. The HTTP protocol is designed to be so simple in order to process a large number of transactions more quickly and ensure protocol scalability. But as the Web continues to evolve, many of our businesses need to preserve communication state. So we introduced Cookie technology. With cookies and HTTP communication, you can manage state.

3. Use cookies for state management

Cookie technology controls client status by writing Cookie information in request and response packets. The Cookie notifies the client to save the Cookie based on the set-cookie header field in the response packet sent from the server. When the client sends a request to the server next time, the client automatically adds the Cookie value to the request packet and sends the request packet. After discovering the Cookie sent by the client, the server will check which client sent the connection request, and then compare the records on the server to obtain the previous status information.

The flow of the Cookie

4. Request the URI to locate the resource

The HTTP protocol uses URIs to locate resources on the Internet. Because of the specific functionality of URIs, resources can be accessed anywhere on the Internet.

5. HTTP methods to inform the server of intent (HTTP/1.1)

HTTP method

6. Persistent connection

In the original version of the HTTP protocol, TCP connections were disconnected for each HTTP communication. For example, when using a browser to browse an HTML page containing multiple images, a request is sent to access the resources of the HTML page, and other resources contained in the HTML page are also requested. Therefore, each request causes fearless TCP connection establishment and disconnection, increasing the traffic overhead. To address the above TCP connection problem, HTTP/1.1 and parts of HTTP/1.0 came up with persistent connections. The TCP connection is maintained as long as neither end explicitly disconnects. To establish a TCP connection for multiple requests and responses. In HTTP/1.1, all connections are persistent by default.

7. Pipelines

Persistent connections make it possible for most requests to be piped. Previously, after sending a request, you had to wait and receive the response before sending the next request. With the advent of pipelining, the next request can be sent without waiting. This allows you to send multiple requests in parallel at the same time without having to wait for one response after another. For example, when requesting an HTML page with multiple images, using persistent connections can end the request faster than connecting one by one. Pipework is faster than persistent connections. The more requests there are, the more significant the time difference becomes.

4. HTTP packet structure

1. The HTTP message

The information used for HTTP interaction is called HTTP packets. HTTP packets sent by the requester (client) are called request packets. The responder side (server side) is called the response message. The HTTP message itself is a string text composed of multiple lines of data (using CR+LF as a newline character).

2.HTTP packet structure

HTTP packets are roughly divided into the header and the body. The two are separated by the initial blank line (CR+LF). Usually, there is not necessarily a message body. As follows:

HTTP packet Structure

2.1 Request Packet Structure

Request message structure

The header of the request packet consists of the following data:

  • Request line – contains the method used for the request, request URI, and HTTP version.
  • Header field – Contains various headers that represent the various conditions and attributes of the request. (Common headers, request headers, entity headers, and headers not defined in the RFC such as cookies, etc.)

The following is an example of a request message:

Sample Request message

2.2 Response Packet Structure

The header of the response packet consists of the following data:

  • Status line – contains the status code, reason phrase, and HTTP version indicating the result of the response.
  • Header field – Contains various headers that represent the various conditions and attributes of the request. (Common headers, response headers, entity headers, and headers not defined in the RFC such as cookies, etc.)

The following is an example of a response message:

Response Packet Example

5. Request and status lines in the HTTP packet header

1. The request

For example, here is an HTTP request:

支那

HTM HTTP/1.1 Host: sample.comCopy the code

Where, the following line is the request line,

支那

GET/index. HTTP / 1.1 HTMCopy the code
  • The GET at the beginning indicates the type of server requested, called a method;
  • Subsequent string/index.htmSpecifies the resource object to be accessed, also known as the request URI.
  • The last of theHTTP / 1.1Is the HTTP version number, which is used to prompt clients to use the HTTP protocol function.

This is a request to access a /index.htm page resource on an HTTP server.

2. The status line

For example, here is an HTTP response message:

支那

HTTP/1.1 200 OK Date: Mon, 10 Jul 2017 15:50:06 GMT Content-Length: 256 Content-Type: text/ HTML < HTML >...Copy the code

Where the bottom row is the status row,

支那

HTTP / 1.1 200 OKCopy the code
  • At the beginning ofHTTP / 1.1Indicates the HTTP version of the server.
  • Next to the200 OKA status code and reason phrase that represents the processing result of the request.

Vi. The first part of the HTTP header field (focus analysis)

1. Overview of header fields

Let’s review the position of the header field in the packet. An HTTP packet contains the header and the packet body, and the header contains the request line (or status line) and the header field. Among many fields in packets, the HTTP header field contains the most abundant information. The header field exists in both the request and response packets and contains information related to HTTP packets. The header field is used to provide the client and server with information such as packet size, language, and authentication information.

2. Header field structure

  • HTTP header fields consist of header field names and field values separated by colons (:).
  • In addition, field values can have multiple values for a single HTTP header field.
  • When two or more HTTP header fields with the same header field name appear in the header, this situation is not clear in the specification. The priority of processing may be different according to the internal processing logic of the browser, and the results may be inconsistent.
Header field name The colon The field values
Content-Type : text/html
Keep-Alive : timeout=30, max=120

3. Header field type

The header field is divided into the following four types based on actual usage:

type describe
Generic header field Header used by both request and response packets
Request header field Header used to send request packets from the client to the server. This section provides additional information about the request, client information, and priority of the response
Response header field Header used to return response packets from the server to the client. Additional content added to the response also requires the client to attach additional content information.
Entity head field The header used for the entity portion of request and response messages. Added information related to the entity such as when the resource content was updated.

4. Generic header fields (HTTP/1.1)

Header field name instructions
Cache-Control Controls the behavior of caching
Connection Select head, connection management
Date Date and time when the packet was created
Pragma Packet instructions
Trailer View the header of the packet end
Transfer-Encoding Specifies the transmission code of the packet body
Upgrade Upgrade to another protocol
Via Proxy server information
Warning Error notification
4.1 Cache-Control

You can manipulate how the Cache works by specifying an instruction for the header field cache-control.

4.1.1 List of available commands

The available directives are classified by request and response as follows: Cache request directives

instruction parameter instructions
no-cache There is no Force revalidation to the server
no-store There is no No content of the request or response is cached
Max-age = [seconds] necessary The maximum Age value of the response
Max – stale (= [s]) Can be omitted Receive an expired response
Min-fresh = [seconds] necessary The expected response within the specified time is still valid
no-transform There is no Agents cannot change media types
only-if-cached There is no Fetch resources from the cache
cache-extension New Instruction Token (token)

Cache response instruction

instruction parameter instructions
public There is no Caching of responses can be provided to any party
private Can be omitted Returns a response only to a specific user
no-cache Can be omitted The cache must be validated before being cached
no-store There is no No content of the request or response is cached
no-transform There is no Agents cannot change media types
must-revalidate There is no Cacheable but must be validated with the source server
proxy-revalidate There is no The intermediate cache server is required to validate the cached response
Max-age = [seconds] necessary The maximum Age value of the response
S-maxage = [seconds] necessary The maximum Age value for the public cache server response
cache-extension New Instruction Token (token)
4.1.2 indicates whether to cache instructions

Public directive cache-control: public When the public directive is specified, it explicitly indicates that other users can use the Cache.

Private directive cache-control: private When a private directive is specified, the response is to object only a specific user, as opposed to the behavior of the public directive. The cache server provides resource caching for this particular user, and the proxy server does not return requests from other users to the cache.

No-cache command cache-control: no-cache

  • The no-cache directive is used to prevent an expired resource from being returned from the cache.
  • If the request sent by the client contains the no-cache command, the client will not receive the cached response. The cache server in the “middle” must then forward the client request to the source server.
  • If the response returned from the server contains the no-cache directive, the cache server cannot cache the resource. The source server will no longer validate the resource proposed in the cache server request and will not cache the response resource.

Cache-Control: No-cache =Location In the response returned by the server, if the no-cache field name is specified in the cache-control header field, the client cannot use the cache after receiving the response packet corresponding to the header field with the specified parameter value. In other words, a header field with no parameter value can be cached. This parameter can only be specified in the response instruction.

No-store directive cache-control: no-store When the no-store directive is used, it implies that the request (and corresponding response) or response contains confidential information. Therefore, the directive states that the cache cannot store any part of the request or response locally. Note: the no-cache instruction represents the instruction that does not cache expired instructions. The cache will process resources after the expiration date confirmation to the source server. The no-store directive is really not cached.

4.1.3 Instructions specifying cache duration and authentication

S-maxage command cache-control: s-maxage=604800 (unit: SEC)

  • The s-maxage directive performs the same functions as the Max-age directive, except that the S-maxage directive only works with a common cache server (typically a proxy) that can be used by multiple users. That is, this directive has no effect on a server that repeatedly returns a response to the same user.
  • In addition, when the S-maxage directive is used, the processing of the Expires header field and the Max-age directive is directly ignored.

Max-age command cache-control: max-age=604800 (unit: second)

  • When a client sends a request that contains a max-age directive, the client receives the cached resource if the cache time value is determined to be smaller than the specified time. In addition, when you specify a value of 0 for max-age, the cache server usually needs to forward requests to the source server.
  • When the server returns a response containing a max-age directive, the cache server does not confirm the validity of the resource. The max-age value represents the maximum length of time the resource has been cached.
  • When the HTTP/1.1 cache server encounters a simultaneous Expires header field, the max-age directive is processed first and the Expires header field is ignored. The opposite is true for HTTP/1.0 cache servers.

The Min-fresh directive cache-Control: Min-fresh =60 (unit: second) The min-fresh directive requires the Cache server to return cached resources that have not been cached for at least the specified time.

Max-stale command cache-control: max-stale=3600 (unit: second)

  • Using max-stale indicates that cached resources are received even if they expire.
  • If the instruction does not specify a parameter value, the client will receive the response no matter how long elapsed. If a specified parameter value is specified, the stale file is still received by the client as long as it remains in the specified max-stale period.

The only-if-cached directive cache-control: only-if-cached indicates that the client will only ask for the target resource to be returned if it is cached locally by the Cache server. In other words, the directive requires that the cache server not reload the response and not revalidate the resource.

With the must-revalidate directive, the proxy revalidates to the source server whether the Cache of the response that is being returned is currently valid. In addition, using the must-revalidate directive ignores the requested max-stale directive.

The proxy-revalidate directive cache-control: proxy-revalidate the proxy-revalidate directive requires all Cache servers to verify the validity of the Cache again before receiving a response from a client.

The no-transform directive cache-control: no-transform Uses the no-transform directive to specify that the Cache cannot change the media type of the entity body, either in the request or in the response. Doing so prevents caching or proxy compression of images and similar operations.

4.1.4 cache-control extension

Cache-control: private, community=”UCI” Cache-extension: cache-control: private, community=”UCI” Cache-extension: cache-control: private, community=”UCI” The community directive is the extended directive, and if the cache server does not understand the new directive, it simply ignores it.

4.2 the Connection

The Connection header field does two things:

Control header field that will not be forwarded Connection: Upgrade When the client sends a request and the server returns a response, use the Connection header field to control the header field that will not be forwarded to the agent, that is, delete the header field and then forward it (that is, the hop-by-hop header).

Managing persistent connections Connection: close HTTP/1.1 uses persistent connections by default. When the server wants to explicitly disconnect, specify the value of the Connection header field as close. Connection: keep-alive The default Connection of HTTP versions prior to HTTP/1.1 is non-persistent. To do this, if you want to maintain a persistent Connection over older versions of HTTP, you need to specify the value of the Connection header field as keep-alive.

4.3 the Date

Indicates the date and time when the HTTP packet is created. Date: Mon, 10 Jul 2017 15:50:06 GMT HTTP/1.1 uses the Date and time format specified in RFC1123.

4.4 Pragma

Pragma header fields are historical legacy fields prior to HTTP/1.1 and are defined only as backward compatibility with HTTP/1.0. Pragma: no-cache

  • This header field is a generic header field, but is used only in requests sent by clients, requiring all intermediate servers not to return cached resources.
  • If all intermediate servers can use HTTP/1.1 as a benchmark, use it directlyCache-Control: no-cacheSpecifies the best way to handle the cache. However, it is not practical to know the HTTP protocol version used by all intermediate servers, so the request will contain both the following header fields:

支那

Cache-Control: no-cache
Pragma: no-cache
Copy the code
4.5 Trailer

Trailer will specify in advance which header fields are recorded after the packet body. Can be applied in HTTP/1.1 version of the block transfer encoding.

4.6 Transfer – Encoding

Transfer-Encoding: chunked

  • Specifies the encoding mode used to transmit the packet body.
  • HTTP/1.1 transport encoding is only valid for block transport encoding.
4.7 Upgrade

Upgrade: TSL/1.0 is used to test whether HTTP and other protocols can communicate with a higher version, and its parameter value can be used to specify a completely different communication protocol.

4.8 Via

Squid (Via: a1.sample.com 1.1/2.7)

  • To trace the transmission path of request and response packets between the client and server.
  • When a message passes through a proxy or gateway, information about the server is appended to the header field Via before it is forwarded.
  • The first field Via is not only used to trace the forwarding of the message, but also to avoid the occurrence of the request loop.
4.9 Warning

This header field usually warns the user of some cache-related problems. The format of the Warning header field is as follows: Warning: [Warning code][Warning host: port number]” [Warning content]”([date and time]) The last date and time can be omitted. HTTP/1.1 defines seven types of warnings. The corresponding warning codes are recommended for reference only. In addition, the warning codes are extensible and new warning codes may be added in the future.

The warning code Warning content instructions
110 Stale Response is stale The agent returns an expired resource
111 Revalidation failed The agent failed to revalidate the resource (server cannot be reached, etc.)
112 Disconnection operation The agent was deliberately disconnected from the Internet
113 Heuristic expiration The trial period of the response exceeds 24 hours (if the effective cache is set to be longer than 24 hours)
199 Miscellaneous Warning Arbitrary warning content
214 Transformation applied When the agent performs some processing such as content encoding or media type
299 Miscellaneous Persistent Warning Arbitrary warning content

5. Request header Field (HTTP/1.1)

Header field name instructions
Accept The type of media that the user agent can handle
Accept-Charset Preferred character set
Accept-Encoding Priority content encoding
Accept-Language Preferred language (natural language)
Authorization Web Authentication Information
Expect Expect specific behavior from the server
From Email address of the user
Host Request the server where the resource resides
If-Match Compare Entity Tag (ETag)
If-Modified-Since Compares the update times of resources
If-None-Match Compare entity tags (as opposed to if-macth)
If-Range Send scope requests for entity Byte when the resource is not updated
If-Unmodified-Since Compare resource update times (as opposed to if-modified-since)
Max-Forwards Maximum transmission hop by hop
Proxy-Authorization The proxy server requires authentication information of the client
Range Byte range request for the entity
Referer The original acquirer of the URI in the request
TE Priority of transmission encoding
User-Agent HTTP client program information
5.1 the Accept

Accept: text/html, application/xhtml+xml, application/xml; Q = 0.5

  • The Accept header field informs the server of the media types that the user agent can handle and the relative priority of the media types. You can specify multiple media types at once using the type/subtype form.
  • If you want to give priority to the media type displayedQ = [number]To indicate the weight value, use a semicolon (;). Delimit. The weight value ranges from 0 to 1 (accurate to three decimal places), and 1 is the maximum value. If no weight value is specified, the default value is 1.
5.2 the Accept – Charset

Accept-Charset: iso-8859-5, unicode-1-1; Q =0.8 Accept-charset header field can be used to inform the server of the character set supported by the user agent and the relative priority of the character set. In addition, multiple character sets can be specified at once. Also use q=[number] to indicate relative priority.

5.3 the Accept – Encoding

The header field is used to inform the server of the content Encoding supported by the user agent and the priority of the content Encoding, and can specify more than one content Encoding at a time. Also use q=[number] to indicate relative priority. You can also use the asterisk (*) as a wildcard to specify any encoding format.

5.4 the Accept – Language

Accept-Lanuage: zh-cn,zh; Q = 0.7, en = us, en; Q =0.3 tells the server which natural language sets (Chinese or English, etc.) the user agent can handle, and the relative priority of the natural language sets. Multiple natural language sets can be specified at one time. Also use q=[number] to indicate relative priority.

5.5 Authorization

Authorization: Basic ldfKDHKfkDdasSAEdasd== Indicates the authentication information (certificate value) of the user agent to the server. Typically, the user agent that wants to authenticate with the server adds the first field, Authorization, to the request after receiving the return 401 status code response. The shared cache will handle operations slightly differently when it receives a request containing an Authorization header field.

5.6 Expect

Expect: 100-continue tells the server that the client expects a particular behavior.

5.7 the From

From: [email protected] Tells the server to use the email address of the user agent.

5.8 the Host

Host: www.jianshu.com

  • Tells the server the Internet host and port number of the requested resource.
  • The Host header field is the only HTTP/1.1 header field that must be included in the request.
  • If the server does not have a host name, send a null valueHost:
5.9 the If – Match

A request header field in the form of if-xxx is called a conditional request. When the server receives a conditional request, it will execute the request only if it determines that the specified condition is true.

If-Match: "123456"

  • The first field, if-match, is one of the strings that tells the server the entity tag (ETag) value used to Match the resource. The server cannot use weak ETag values.
  • The server compares the field value of if-match with the ETag value of the resource. The request is executed only when the two values are consistent. Otherwise, the status code is returned412 Precondition FailedThe response.
  • You can also specify if-match field values using an asterisk (*). In this case, the server will ignore the ETag value and process the request as long as the resource exists.
5.10 the If – Modified – Since

If-Modified-Since: Mon, 10 Jul 2017 15:50:06 GMT

  • The first field, if-modified-since, is one of the provisos used to verify the validity of local resources owned by the agent or client.
  • It tells the server that it wants to process the request If the if-Modified-since field value is earlier than the resource update time. If none of the requested resources has been updated after the date or time specified for the if-Modified-since field value, the status code is returned304 Not ModifiedThe response.
5.11 the If – None – Match

If-none-match: if-none-match at the beginning of “123456” is one of the conditions. This is the opposite of the header if-match field. When the entity tag (ETag) value used to specify the if-none-match field value is inconsistent with the ETag of the requested resource, it tells the server to process the request.

5.12 the If – Range

If-Range: "123456"

  • The header field if-range is one of the conditionals. It tells the server that If the specified IF-range field value (ETag value or time) matches the ETag value or time of the requested resource, it will be treated as a Range request. Otherwise, all resources are returned.
  • Let’s consider sending a request without using the header if-range field. If the resource on the server is updated, then the portion of the resource held by the client is also invalid. Of course, the scope request is invalid as a premise. At this point, the server will temporarily with the status code412 Precondition FailedReturned as a response, its purpose is to urge the client to send the request again. As a result, it takes twice as much effort as using the header field if-range.
5.13 If Unmodified — Since

If-unmodified-since: Mon, 10 Jul 2017 15:50:06 GMT If-unmodified-since: Mon, 10 Jul 2017 15:50:06 GMT If-unmodified-since: Mon, 10 Jul 2017 15:50:06 GMT If-unmodified-since: Mon, 10 Jul 2017 15:50:06 GMT It tells the server that the specified request resource can only process the request if no update has occurred after the date and time specified in the field value. If an update occurs after the specified date and time, the feed Failed is returned in response with the status code 412.

5.14 Max – recently

Max-forwards: 10 When the request for max-forwards is handled through the TRACE method or OPTIONS method, this field specifies the maximum number of servers for forward processing in decimal integer format. Before the server Forwards requests to the next server, the value of max-forwards is reassigned after being reduced by one. When the server receives a request for max-forwards with a value of 0, it does not forward the request but directly returns the response.

5.15 the Proxy Authorization

Proxy-Authorization: Basic dGlwOjkpNLAGfFY5

  • When receiving an authentication challenge from the Proxy server, the client sends a proxy-authorization request containing the header field to inform the server of the information required for authentication.
  • This behavior is similar to HTTP access authentication between client and server, except that the authentication behavior occurs between client and proxy.
5.16 the Range

Range: bytes=5001-10000

  • For scope requests that fetch only part of the resource, including the header field Range tells the server the specified scope of the resource.
  • A server that receives a request with the Range header field returns a status code of206 Partial ContentThe response. When the range request cannot be processed, the status code is returned200 OKResponse and full resources.
5.17 Referer

Referer: http://www.sample.com/index.html first field Referer will inform server requests the URI of the original resources.

5.18 TE

TE: gzip, deflate; Q = 0.5

  • The header field TE tells the server which transport encoding and relative priority the client can process the response. It is similar to the accept-encoding function of the header field, but is used to transmit the Encoding.
  • In addition to specifying the transfer code, the header field TE can also specify the way the block transfer code is accompanied by the Trailer field. When you apply the latter, you just assign the Trailers to the field value.TE: trailers
5.19 the user-agent

User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; The rv: 13.0) Gecko / 20100101

  • The header field user-Agent communicates information to the server such as the browser that created the request and the name of the User Agent.
  • When a web crawler initiates a request, it is possible to add the email address of the crawler author to the field. Also, if the request goes through a proxy, the proxy server name is likely to be added in the middle.

6. Response header Field (HTTP/1.1)

Header field name instructions
Accept-Ranges Whether to accept byte range requests
Age Calculate the elapsed time of resource creation
ETag Matching information of resources
Location Causes the client to redirect to the specified URI
Proxy-Authenticate The proxy server authenticates the client
Retry-After Request the timing of the request to be made again
Server HTTP server installation information
vary Proxy server cache management information
WWW-Authenticate Authentication information about the server to the client
6.1 the Accept – Ranges

Accept-Ranges: bytes

  • The accept-ranges header field is used to tell the client whether the server can handle a range request to specify a portion of the server’s resources.
  • There are two types of field values that can be specified, bytes for range requests and None for range requests.
6.2 the Age

Age: 1200

  • The header field Age tells the client how long ago the source server created the response. Field values are in seconds.
  • If the server that creates the response is a cache server, the Age value indicates the time between the cached response initiating authentication again and the completion of authentication. The agent must add the header field Age when creating the response.
6.3 the ETag

ETag: "usagi-1234"

  • The header field ETag tells the client the entity identity. It is a way to uniquely identify a resource as a string. The server assigns an ETag value to each resource.
  • In addition, the ETag value needs to be updated when the resource is updated. When ETag values are generated, there is no uniform algorithmic rule, but only allocation by the server.
  • ETag has strong ETag value and weak ETag value. A strong ETag value, which changes regardless of how little the entity changes; Weak ETag values are used only to indicate whether the resources are the same. The ETag value is changed only when the resource is fundamentally changed, resulting in a difference. At this point, we append W/ to the beginning of the field value:ETag: W/"usagi-1234".
6.4 the Location

Location: http://www.sample.com/sample.html

  • The header field Location is used to direct the response recipient to a resource at a different Location than the request URI.
  • Basically, this field provides the URI for the Redirection in conjunction with the 3xx: Redirection response.
  • Almost all browsers, upon receiving a response containing the header field Location, will force an attempt to access the prompted redirect resource.
6.5 the Proxy – Authenticate

Proxy-Authenticate: Basic realm="Usagidesign Auth"

  • Proxy-authenticate sends authentication information required by the Proxy server to the client.
  • It is similar to HTTP access authentication between client and server, except that the authentication takes place between client and proxy.
Retry – After 6.6

Retry-After: 180

  • The header retry-after field tells the client how long it should take to send the request again. Mainly with the status code503 Service UnavailableResponse, or 3XX Redirect response used together.
  • The field value can be specified as a specific date and time (Mon, 10 Jul 2017 15:50:06 GMT, etc.) or as the number of seconds after the response was created.
6.7 Server

Server: Apache/2.2.6 (Unix) PHP/5.2.5 Header Field Server tells the client about the HTTP Server application installed on the current Server. Not only is the software application name marked on the server, but it may also include the version number and options enabled at installation time.

6.8 than

Vary: Accept-Language

  • The header field Vary controls the cache. The source server communicates commands to the proxy server about how to use the local cache.
  • After the proxy server receives a response from the source server that contains the Vary specified item, if caching is required, only requests with the same Vary specified header field are returned to the cache. Even if a request is made for the same resource, because Vary specifies a different header field, the resource must be retrieved from the source server.
6.9 the WWW – Authenticate

Www-authenticate: Basic realm=”Usagidesign Auth” header www-authenticate is used for HTTP access authentication. It tells the client which authentication scheme (Basic or Digest) and challenge with parameter prompts is appropriate for accessing the resource specified by the request URI.

7. Entity header fields (HTTP/1.1)

Header field name instructions
Allow HTTP methods supported by the resource
Content-Encoding The encoding method applicable to the entity body
Content-Language The natural language of entity subjects
Content-Length Size of entity body in bytes
Content-Location Replace the URI of the corresponding resource
Content-MD5 The packet digest of the entity body
Content-Range The location range of the entity body
Content-Type The media type of the entity body
Expires The date and time when the entity body expires
Last-Modified The last modified date and time of the resource
7.1 Allow

Allow: GET, HEAD

  • The header field Allow is used to inform the client that all HTTP methods of the resource specified by request-URI can be supported.
  • When the server receives an unsupported HTTP method, it sends a status code405 Method Not AllowedReturn as a response. At the same time, all supported HTTP methods are written to the header field Allow and returned.
7.2 the Content – Encoding

Content-Encoding: gzip

  • The header field content-Encoding tells the client which Content Encoding method the server chooses for the body of the entity. Content encoding refers to compression without loss of entity information.
  • These four content encoding methods are mainly used (GZIP, COMPRESS, Deflate, identity).
7.3 the Content – Language

Content-language: zh-CN The header field content-language tells the client the natural Language (Chinese or English) used by the entity body.

7.4 the Content – Length

Content-length: 15000 The header field content-Length specifies the size of the body of the entity in bytes. You can no longer use the content-Length header field when transferring Content encoding to entity bodies.

7.5 the Content – the Location

The Content – the Location: http://www.sample.com/index.html Content first field – the Location corresponding to the URI of the message body is given. Unlike the header field Location, content-location indicates the URI of the resource returned by the packet body.

7.6 the Content – the MD5

The Content – MD5: OGFkZDUwNGVhNGY3N2MxMDIwZmQ4NTBmY2IyTY = = first field Content – MD5 is a string of value generated by the MD5 algorithm, the purpose is to check the message subject in the process of transmission is intact, and confirm the transfer.

7.7 the Content – Range

Content-range: bytes 5001-10000/10000 The header field used in the response to a Range request tells the client which part of the entity returned as a response complies with the Range request. Field values, in bytes, represent the current sent portion and the entire entity size.

7.8 the content-type

Content-Type: text/html; Charset =UTF-8 Content-type specifies the media Type of the object in the entity body. As with the header field Accept, the field value is assigned as type/subtype. The charset parameter is assigned using a character set such as ISO-8859-1 or EUC-JP.

7.9 Expires

Expires: Mon, 10 Jul 2017 15:50:06 GMT

  • The header field Expires tells the client when the resource Expires.
  • The cache server responds to a request with a cache after receiving a response containing the header field Expires. A copy of the response is stored until the Expires field value is specified. When the specified time passes, the cache server turns to the source server to request the resource when the request is sent.
  • When the source server does not want the cache server to cache the resource, it is best to write the same time value in the Expires field as in the header field Date.
7.10 Last-Modified

Last-modified: Mon, 10 Jul 2017 15:50:06 GMT The last-modified header field specifies the time when the resource was Last Modified. In general, this value is the request-URI that specifies when the resource is modified. However, for dynamic data processing like CGI scripts, this value can become the time when the data was finally modified.

8. The header field of Cookie service

Header field name instructions The first type
Set-Cookie Cookie information used to start state management Response header field
Cookie Cookie information received by the server Request header field
8.1 the Set – cookies

Set-Cookie: status=enable; expires=Mon, 10 Jul 2017 15:50:06 GMT; path=/;

The table below lists the field values for set-cookie.

attribute instructions
NAME=VALUE The name and value assigned to the Cookie (required)
expires=DATE Cookie validity period (defaults to before browser closure if not explicitly specified)
path=PATH Use the file directory on the server as the appropriate object for cookies (default to the file directory where the document resides if not specified)
Domain = domain name The domain name used as the Cookie object (default if not specified to the domain name of the server that created the Cookie)
Secure Cookies are sent only for SECURE HTTPS communication
HttpOnly Restrict cookies so that they cannot be accessed by JavaScript scripts
8.1.1 expires attribute
  • The Expires attribute of a Cookie specifies the expiration date that the browser can send the Cookie.
  • When the Expires attribute is omitted, it is only valid for the duration of the browser Session. This is usually limited to until the browser application is closed.
  • In addition, once a Cookie is sent from the server side to the client, there is no way to explicitly delete the Cookie on the server side. However, the substantial deletion of client cookies can be achieved by overwriting expired cookies.
8.1.2 path attribute

The path attribute of the Cookie can be used to restrict the file directory to which the Cookie can be sent.

8.1.3 domain properties
  • The domain name specified by the domain attribute of the Cookie can be matched with the end. For example, if example.com is specified, cookies can be sent by either www.example.com or www2.example.com in addition to example.com.
  • Therefore, it is safer not to specify the domain attribute, except to send cookies to multiple specifically specified domains.
8.1.4 secure properties

The secure property of cookies is used to restrict Web pages from sending cookies only when they are connected using HTTPS security.

8.1.5 HttpOnly attribute
  • The HttpOnly attribute of cookies is an extension of cookies and makes them unavailable to JavaScript scripts. Its main purpose is to prevent Cookie information theft by cross-site scripting (XSS).
  • Cookies can also be read, usually from within the Web page, with the above Settings. JavaScript document.cookie cannot read the content of the cookie with the HttpOnly attribute attached. Therefore, there is no way to hijack cookies with JavaScript in XSS.
8.2 the Cookie

Cookie: status=enable The header field Cookie tells the server that when the client wants HTTP state management support, it will include the Cookie received from the server in the request. When multiple cookies are received, they can also be sent in the form of multiple cookies.

9. Other header fields

HTTP header fields are self-extensible. Therefore, in the application of Web server and browser, there will be various non-standard header fields. The following are the most commonly used header fields.

9.1 X – Frame – the Options

X-frame-options: DENY header X-frame-options belongs to the HTTP response header and is used to control the display of website content in the Frame tag of other Web sites. Its main purpose is to prevent clickjacking attacks. The x-frame-options header field has the following two fields that can be specified:

  • DENY: refused to
  • SAMEORIGIN: Allows the browsing of pages within a same-origin domain name only. (For example, if you specify the sample.com/sample.html page as SAMEORIGIN, then frames for all pages on sample.com are allowed to load that page, but not for pages on example.com and other domains.)
9.2 X – XSS – Protection

X-xss-protection: 1 header field X-xss-protection belongs to the HTTP response header. It is a countermeasure against cross-site scripting attacks (XSS) and is used to enable or disable the browser’s XSS Protection mechanism. Header field x-xss-protection can be specified as follows:

  • 0: Sets XSS filtering to invalid state
  • 1: XSS filtering is set to a valid state
9.3 DNT

DNT: 1 header field DNT belongs to the header of HTTP requests. DNT is short for Do Not Track, which means to refuse to collect personal information and is a method to refuse to be tracked by accurate advertisements. DNT of the header field can be specified as follows:

  • 0: Consent to be tracked
  • 1: Refuse to be tracked

Because the DNT function of the header field is valid, the Web server needs to support DNT accordingly.

9.4 P3P

P3P: CP=”CAO DSP LAW CURa ADMa DEVa TAIa PSAa PSDa IVAa IVDa OUR BUS IND header P3P belongs to the HTTP response header, By using P3P (The Platform for Privacy Preferences) technology, personal Privacy on Web sites can be changed into a form that can only be understood by programs, so as to achieve The purpose of protecting user Privacy. To set up P3P, follow the following steps:

  • Step 1: Create P3P privacy
  • Step 2: After creating the P3P privacy contrast file, save the file named /w3c/ p3P.xml
  • Step 3: Create Compact Policies from P3P Privacy and export them to the HTTP response

Vii. HTTP Response status code (key Analysis)

1. Status code Overview

  • The HTTP status code is responsible for representing the return result of the CLIENT’S HTTP request, marking the normal processing of the server, and notifying the error.
  • The HTTP status code is as follows200 OK, consisting of three digits and a reason phrase. The first digit of the number specifies the response category, and the last two digits are unclassified.
  • Many of the response status codes returned are incorrect, but the user may not be aware of this. For example, if an error occurs within the Web application, the status code still returns200 OK.

2. Status code category

category The reason the phrase
1xx Informational(Informational status code) The received request is being processed
2xx Success(Success Status code) The request is successfully processed
3xx Redirection(Redirection status code) Additional action is required to complete the request
4xx Client Error(Client Error status code) The server cannot process the request
5xx Server Error The server failed to process the request

We can change the status code defined in RFC2616 or create the status code on the server, as long as the category definition of the status code is followed.

3. Parse common status codes

There are dozens of HTTP status codes. Here are 14 of the most commonly used.

3.1 200 OK

Indicates that the request from the client is processed on the server.

3.2 204 No Content
  • The request received on behalf of the server was successfully processed, but the body of the entity is not contained in the response message returned. Also, it is not allowed to return the body of any entity.
  • Typically used when only messages need to be sent from the client to the server, and the server does not need to send new message content to the client.
3.3 206 Partial Content

Indicates that the client made a scope request and the server successfully executed that part of the GET request. The response message contains the entity Content specified by the content-range header field.

3.4 301 version Permanently

Permanent redirect. Indicates that the requested resource has been assigned a new URI. In the future, use the URI to which the resource now refers. That is, if the URI corresponding to the resource is already bookmarked, it should be saved again as indicated in the Location header field.

3.5 302 Found
  • Temporary redirection. Indicates that the requested resource has been assigned a new URI and the user is expected to access it using the new URI.
  • and301 Moved PermanentlyStatus codes are similar, but302 FoundStatus codes indicate that the resource is not permanently moved, but only temporarily. In other words, the URI of a resource that has been moved may change in the future.
3.6 303 See Other
  • Indicates that because another URI exists for the requested resource, the GET method should be used to GET the requested resource.
  • 303 See OtheR and302 FoundStatus codes have the same function, but303 See OtherThe status code explicitly states that the client should use the GET method to obtain the resource302 FoundStatus codes are different.
3.7 304 Not Modified
  • Represents a condition in which the server allows the request to access a resource when the client sends a conditional request, but the condition is not met.
  • 304 Not ModifiedWhen the status code is returned, it does not contain any body of the response.
  • 304 Not ModifiedAlthough classified as 3xx, it has nothing to do with redirection.
3.8 307 Temporary Redirect

Temporary redirect. The status code has the same meaning as 302 Found.

3.9 400 Bad Request
  • Syntax errors exist in the request packet. When an error occurs, you need to modify the content of the request and send the request again.
  • In addition, the browser will look like200 OKTreat the status code the same.
3.10 401 Unauthorized
  • The request to be sent requires the AUTHENTICATION information that passes the HTTP authentication (BASIC authentication or DIGEST authentication).
  • In addition, if the request has been made once before, the user authentication fails.
  • Return to contain401 UnauthorizedThe response must contain a WWW-authenticate first challenge user information applicable to the requested resource.
3.11 403 Forbidden

Indicates that access to the requested resource was denied by the server. The server does not need to provide a detailed reason for the rejection. You can also describe the reason in the entity body of the response packet.

3.12 404 Not Found

Indicates that the requested resource cannot be found on the server. In addition, it can be used when the server rejects the request without giving a reason.

3.13 500 Internal Server Error

Indicates that an error occurred on the server side while executing the request. It could be a Web application bug or some temporary glitch.

3.14 503 Service Unavailable

Indicates that the server is temporarily overloaded or is down for maintenance and is unable to process requests at this time. If you know in advance how long it will take to resolve the above situation, it is best to write the retry-after header field and return it to the client.

HTTP packet entities

1. Overview of HTTP packet entities

HTTP packet Structure

Take a closer look at the components in the example above. Next, let’s look at the concepts of messages and entities. If you think of HTTP packets as boxes in an Internet shipping system, then HTTP entities are the actual goods in the packets.

  • Packet: a data unit exchanged and transmitted on the network, that is, a data block sent by a site at a time. A message contains complete data information to be sent, and its length is variable and unlimited.
  • Entity: Payload data (supplementary items) is transmitted as a request or response and its content consists of an entity header and entity body. (The relevant content of the entity head has been described in the sixth paragraph above.)

We can see that the content in the dark red box on the right is the entity part of the packet, while the two parts in the blue box are the entity head and entity body respectively. The pink box on the left is the message body. Generally, the message body is equal to the entity body. Only when the encoding operation is carried out in transmission, the content of the entity body changes, causing it to be different from the packet body.

2. Content coding

  • HTTP applications sometimes need to encode content before sending it. For example, the server might compress a large HTML document before sending it to a client over a slow connection, which helps reduce the time it takes to transfer the entity. The server can also scramble or encrypt the content to prevent unauthorized third parties from seeing the contents of the document.
  • This type of encoding is applied to the content by the sender. When the content is content-encoded, the encoded data is placed in the entity body and sent to the recipient as usual.

Content encoding type:

encoding describe
gzip Indicates that the entity uses GNU ZIP encoding
compress Indicates that the entity uses a Unix file compression program
deflate Indicates that the entity is compressed in zlib format
identity Indicates that the entity is not encoded. This Encoding mode is used by default when there is no content-Encoding header field

3. Transmission coding

Content encoding is a reversible transformation of the main body of the message, which is closely related to the details of the specific format of the content. Transport encodings are also reversible transformations on entity bodies, but they are used for architectural reasons, independent of the format of the content. Transport encoding is used to change the way data in messages is transmitted over the network.

Content encoding versus transport encoding

4. Block coding

Block coding divides a message into chunks of known size. Blocks are sent next to each other, so you don’t need to know the size of the entire message before sending it. Block coding is a transmission code and an attribute of a message.

If there is no persistent connection between the client and the server, the client does not need to know the length of the principal it is reading, but only needs to read until the server closes the principal connection. When using persistent connections, the server must know its size and send it in the Content-Length header before writing the body. If the server creates content on the fly, it may not know the length of the body until it is sent. Chunking coding provides a solution to this difficulty by allowing the server to send the body in chunks, stating the size of each chunk. Because the body is created dynamically, the server can buffer a portion of it, send its size and corresponding blocks, and then repeat the process before the body is finished sending. The server can signal the end of the body with a block of size 0 so that the connection can continue in preparation for the next response. Let’s look at an example of a block-coded message: \

Block coded packets

5. Multi-part media types

A multipart E-mail message in MIME contains multiple messages that are sent together as a single complex message. Each part is independent, with its own set of descriptions of its contents, and the different parts are connected by a delimited string. Accordingly, THE HTTP protocol also adopts the multi-part object set, and a packet body can contain multiple types of entities. The multi-part object collection contains the following objects:

  • Multipart /form-data: used when uploading Web form files.
  • Multipart/Byteranges: indicates the status code206 Partial ContentUsed when the response message contains multiple ranges of content.

6. Scope request

Let’s say you are downloading a very large file, and you are three quarters of the way down, when the Internet goes down and you have to start all over again. To solve this problem, you need a recoverable mechanism that can recover a download from a previous download break. To do this, you use a range request. With scope requests, AN HTTP client can resume downloading an entity by requesting a scope (or part of it) of the entity that failed to obtain it. This assumes, of course, that the object has not changed between the time the client last requested the entity and the time the scope request was made. Such as:

支那

HTML HTTP/1.1 Host: www.sample.com Range: bytes=20224- ···Copy the code

Example entity scope request

In the example above, the client is requesting after the first 20224 bytes of the document.

Web servers that collaborate with HTTP

In addition to clients and servers, there are applications that facilitate HTTP communication. The important ones are listed as follows: proxy, cache, gateway, tunnel, and Agent Agent.

1. The agent

The agent

HTTP proxy server is an important component of Web security, application integration and performance optimization. The proxy sits between the client and the server, receiving all HTTP requests from the client and forwarding them to the server (with possible modifications). To the user, these applications act as proxies to access the server on behalf of the user. For security reasons, proxies are typically used as trusted intermediate nodes that forward all Web traffic. The proxy can also filter requests and responses to secure or green Internet access.

2. The cache

First request from the browser: \

First request from the browser

The browser asks again: \

The browser requests again

The Web cache or proxy cache is a special HTTP proxy server that copies and stores commonly used documents that are transferred through the proxy. The next client that requests the same document can take advantage of the cached private copy. Clients can download documents much faster from a nearby cache than from a remote Web server.

3. The gateway

HTTP/FTP gateway

A gateway is a special type of server that acts as an intermediary entity for other servers. Usually used to convert HTTP traffic to other protocols. The gateway receives requests as if it were the source server for the resource. The client may not be aware that it is communicating with a gateway.

4. The tunnel

HTTP/SSL tunnel

\

A tunnel is an HTTP application that blindly forwards raw data between two connections once it is established. HTTP tunnels are usually used to forward non-HTTP data over one or more HTTP connections without snooping on the data. A common use of HTTP tunneling is to carry encrypted Secure Sockets Layer (SSL) traffic over HTTP connections so that SSL traffic can pass through a firewall that only allows Web traffic.

5. The Agent Agent

Automatic search engine “Web spider”

Agent An Agent is a client application program that initiates HTTP requests on behalf of users. All applications that publish Web requests are HTTP Agent proxies.

conclusion

Today’s sharing ends here, learning is a continuous thing, so I will continue to update the article, if this article is a little help to you is my greatest happiness, if you also love learning iOS, you might as well click the circle to join us, communicate with each other, learn from each other. I will also share some interview materials in the group from time to time. Finally, ask for a wave of attention and likes.

The original link