Following the basics of the HTTP protocol, this article focuses on the HTTP Header.
There are so many HTTP headers that few people can tell exactly what they are for. In view of the complexity of the RFC file specification, this article combs the HTTP Header listed in the protocol specification and expresses it in popular language to facilitate readers to understand the HTTP protocol.
While reading the RFC document, the author found a lot of knowledge that was not noticed before, and it is estimated that most web developers also ignore these knowledge, reading the text will bring you a lot of unexpected surprises.
Disclaimer: If the following sentence is incorrect, please use less saliva.
Accept
Represents the media format that the client expects the server to return. The client expects a resource type that the server may not have, so the client expects multiple types and sets priorities, and the server returns the resource to the client based on the priority.
# Note: A comma separates a type, then a semicolon separates an attributeAccept: audio/*; Q = 0.2, audio/basicCopy the code
An audio/basic resource is preferred. If not, any other audio resource is acceptable. The value of q ranges from 0 to 1, and its specific value is meaningless. It is only used to sort priorities. If there is no q, the default value is 1, which is the highest priority.
Accept-Charset
Represents the encoding format of the content that the client expects the server to return. Like the Accept header, it can specify multiple encodings, with the q value representing priority.
# Note: A comma separates a type, then a semicolon separates an attributeAccept-Charset: utf8, gbk; Q = 0.6Copy the code
Utf8 encoding is preferred, if not, return with GBK encoding.
Content-Type
Content-type is the header sent by the server to the client. It represents the media Type and encoding format of the Content. It is the unified response to Accept and Accept-Charset headers.
Content-Type: text/html; charset=utf8
Copy the code
Indicates that the Body returned is HTML text, encoded as UTF8
Accept-Language
The language that represents what the client expects the server to return. Many large Internet companies are global, and their technical documents generally have multiple languages. Through this field, documents can be localized, presenting simplified Chinese documents for domestic users and English documents for English users.
Accept-Language:zh-CN,en-US; Q = 0.8, useful - TW; Q = 0.6Copy the code
Mainland simplified Chinese is preferred, Followed by English, followed by Taiwanese traditional Chinese
Content-Language
This header field content is a response to accept-language. The server uses this field to tell the client what language the Body information is returned in.
Content-Length
Represents the length of the Body of the transmitted request/response. The GET request does not need this header because it does not have a Body. Requests/responses that carry the Body and know the length of the Body in advance must carry this field so that the other party can easily identify the boundary of the message, i.e. when the Body data ends. If the Body is too large, it needs to be calculated and transmitted at the same time. The size of the whole Body cannot be known until the final calculation is completed. In this case, HTTP blocks can be used to transmit the Body and content-Length field is not required.
Content-Location
When a client requests a resource that has multiple addresses on the server, the server can use the Content-location field to inform the client of alternative addresses. This field is rare.
Content-MD5
This information is provided in the Header for Body content validation. It represents the Base64 string after the Body information has been processed by the MD5 algorithm. This field is also less common. Because the validation mechanism is already implemented at the TCP layer, it doesn’t make much sense to add another layer of validation. In addition, the MD5 value of a resource is used as the unique identifier of the resource in the ETag header.
Date
If the server has no cache, Date is the immediate generation time of the response. If the server has a cache, Date is the time when the response content is cached. It must conform to a specific format defined in the specification, called HTTP-date, which does not allow you to arbitrarily define your own time format.
Date: Tue, 15 Nov 1994 08:12:31 GMT
Copy the code
Age
Represents the age of the resource cache, the amount of time, in seconds, that has passed since the resource was cached.
Age: 86400
Copy the code
Expires
Servers use the Expect header to tell each other when resources are failing. If its value is equal to the value of the Date header, the resource is already valid.
Expires: Thu, 01 Dec 1994 16:00:00 GMT
Copy the code
ETag
Resource labels. Each resource can provide multiple label information. It is used in conjunction with if-match and if-none-match to check the validity of cache resources. A common label is the version number of the resource. For example, you can use the MD5 check code of the resource data as the version number.
If-Match
The if-match value is typically the ETag value mentioned above, which is often used for OPTIMISTIC locking in HTTP. HTTP optimistic locking, refers to the client to GET the resources to GET the version number ETag, then launched a resource change requests PUT | when the PATCH through the If – Match head to specify the version number of resources, If the server resources to meet the If – Match the specified version number, the request will be executed. If not, the resource is concurrently modified, and you need to return the error with the 412 Feed Failed status code. The client can choose to abort or retry the entire process.
If-None-Match
Similar to if-match, but with the opposite condition.
Allow
Represents the HTTP Method type that the resource supports access to. It is the server’s suggestion to the client to access the resource using the Method mentioned in Allow.
Allow: GET, HEAD, PUT
Copy the code
Connection
The Connection header can be used when the client and server need to negotiate the properties of the Connection. A common value is close, which notifies the other party to close the connection when the current request ends.
Connection: close
Copy the code
Expect
Used to ask the server for permission before the request is sent. To send a large file to the server, for example, and not be sure whether the limit is exceeded, you can carry an Expect header in the request header
Expect: 100-continue
Copy the code
If the server says no, it will return a 417 Expectation Failed error telling the client to give up. If so, return a 100 continue status code telling the client to bring it on, and the client will continue to upload the Body content. If the server receives the Body content ahead of time, it will abort and return a 100 continue response.
From
This field is usually used to mark the email address of the initiator of the request, which is equivalent to assigning a person to the request. If the server finds a problem with the request, it contacts the originator through this field for processing. Because email addresses involve private information, requests to carry From headers require the user’s consent. The RFC protocol recommends that all requests made by robot agents should carry this header, so that the responsible person can be identified in case of a problem. But if it’s a malicious robot, that advice might fall on deaf ears.
Host
According to the RFC protocol, all HTTP requests must carry a Host header. Even if Host has no value, the Host header must be attached with an empty string. If the Host header does not meet the value, the application server should throw 400 Bad Requests. The protocol specifies this, but most gateways or servers are more lenient. Since you do not specify a Host field, you should add one by default. The gateway agent can forward to different upstream service nodes based on different Host values and is commonly used for hosting services.
Last-Modified
Marks the Last time a resource was Modified. This is similar to Date except that last-modified is the modification time and Date is the creation time.
If-Modified-Since
When a browser requests a static resource from a server that already has a local cache, it carries the if-Modified-since header, which is the last-Modified time of the resource and asks the server If the resource has been Modified Since this last-Modified time. If Not, 304 Not Modified is returned to inform the browser that it is safe to use the cached resource. If the resource has been modified, it returns 200 OK with the contents of the resource, just like a normal GET request.
If-Unmodified-Since
Similar to if-modified-since, in the opposite sense. The difference is that instead of returning 304 Not Modified, 412 Precondition Failed is returned when the server resource condition is Not met.
Range
A server that supports resumable breakpoints must handle the Range header, which represents the Range of request bytes specified when a client requests part of a resource. It is the request header sent by the client to the server.
Range: bytes=500-999
Copy the code
Content-Range
For the Range header above, the server responds to the client with a content-Range header that represents the Range of bytes in the overall resource block for the Body data to be transferred. For example, the following example shows that the resource has a total of 47022 bytes and the current response is between 21010 and 47021 bytes.
Content-Range: bytes 21010-47021/47022
Copy the code
The reason it’s 47021 instead of 47022 is because offset starts with 0, and 47021 is the last byte.
If-Range
To ensure that the server resource itself has not changed between two consecutive requests during breakpoint continuation, the if-range header contains the ETag resource version number. The server resource determines whether the resource has changed based on this version number. If not, return 206 Partial Content to return part of the resource. If the resource changes, this is equivalent to a normal GET request that returns 200 OK and the entire resource content.
Location
When the server sends a 302 redirect to the client, it always carries the Location header, whose value is the destination URL.
HTTP/1.1 302 Temporary Redirect
Location: https://www-temp.example.org/
Copy the code
Max-Forwards
It is used to limit the number of layers of the gateway or proxy, that is, the maximum number of forwarding times. Max-forwards value is reduced by 1 every time HTTP moves through a gateway or agent layer. If max-forwards is zero at the time of receiving front-end requests, nginx should not forward requests to service nodes specified by upstream.
Pragma
This header is relatively common and is often added in front end development mode.
Pragma: no-cache
Copy the code
When a gateway receives a header with such a request, even if an internal cache of the requested resource exists and is valid, it cannot be sent directly to the client and must be forwarded to the upstream for processing. However, the attack is easy to construct if all gateways really follow this protocol, so it is generally only used in development mode to prevent the front-end from being updated immediately after static resource changes. Pragma values for other values are not encountered.
Referer
Referer is a very common header that represents the source URI of the request, which is the parent page of the current page resource. If you go from page A to page B, the request header for page B will have the Referer value, which is the address of page A. Tracing Referer leads to complex jump chains between resource pages, which is ideal for data analysis and path optimization of web pages.
Retry-After
When a server is upgraded, requests from the client are directly given a 503(Service Unavailable) error, which is inserted in the retry-after field in the return header to tell the client when the Service can be accessed normally. The retry-after header can be http-date or an integer, indicating the number of seconds After which services can be accessed normally. Once the browser has this value, it can consider adding a timer to retry at some future time.
Server
It is used to return server-related software information to inform the client that the current HTTP service is provided by such and such software. It can be regarded as a software advertisement. The RFC protocol warns against this header: Exposing server information may make it easier for hackers to attack your service. Use this header with caution.
User-Agent
Carries the current user agent information, including the browser, browser kernel, and operating system version and model information. It corresponds to the Server header, which expresses Server information and client information. The server can calculate the usage ratio of web browsers and operating systems according to the user agent information. The server can also customize different contents according to the UA information.
Transfer-Encoding
What transformations need to be made to the Body data when passing the Body information. When HTTP blocks the Body, you need to add the following header information to block it. Other types are not currently encountered.
Transfer-Encoding: chunked
Copy the code
Upgrade
The server recommends that the client upgrade the transport protocol. For example, when a client sends a request using HTTP/1.0, the server can suggest that the client upgrade to HTTP/1.1. At this point, the Upgrade header can be used. After receiving this Upgrade, the client will transfer subsequent requests to HTTP/1.1 format to continue communication. Multiple arguments can be supported, separated by commas.
Upgrade: HTTP / 1.1Copy the code
When the client wants to communicate with the server through Websocket, the server also sends the Upgrade header to the client during the handshake phase, prompting the client to switch the protocol to Websocket.
Upgrade: WebSocket
Copy the code
vary
This header is used for cache control. For some cache servers, adding the Vary parameter to the request tells the cache server to use a different cache unit for the Vary parameter response. For example, if the Vary parameter is encoded, different encoded pages will have different caches. Vary can have more than one value and will have different caches as long as any one value is different. For example, the following example tells the cache server to use different cache units for web page responses in different languages and encoding.
Vary: Accept-Encoding,Accept-Language
Copy the code
Via
This field identifies the gateway routing node through which a request passes. If the request passes through multiple proxy layers, the Via header will have multiple gateway information.
Warning
Use to add additional warning information to the response, which contains an error code and error description. Common error codes are specified in the RFC protocol. For example, error code 111 indicates that the cache server’s cache item has expired, and a reload attempt failed, so the old expired content is returned. In this case, the warning header is needed to report back to the client.
Warning: 111 Revalidation failed
Copy the code
WWW-Authenticate
Www-authenticate is a header that 401 Unauthorized error codes must carry when they return. The header carries a Challenge to the client and informs the client that it needs to carry the answer to the question to request the server to continue accessing the target resource. The Challenge can be customized, and the most common is Basic authentication.
WWW-Authenticate: Basic realm=xxx
Copy the code
Basic refers to the Base64 encryption algorithm (insecure) and realm refers to the authentication scope/occasion/situation name.
Authorization
For some resources that require special permissions, the client needs to provide authentication information of the user name and password in the request. It is a reply to WWW-authenticate.
# value = base64(user_name:password)
Authorization: Basic YWRtaW46YWRtaW4xMjM=
Copy the code
Proxy-Authenticate
Same as the WWW-authorization header for proxy server authentication.
Proxy-Authorization
Same as Authorization header, used for proxy server authentication.
ETag vs Last-Modified vs Expires
ETag usually carries the version number of the resource. The protocol does not specify the version number. It can be an MD5 check code for the resource, a UUID, an incremented number, or the modification time of the resource. It matches in equal/unequal ways. Because the server needs to maintain the version number, depending on the version number, this can be a storage and computing burden.
Last-modified Specifies the modification time of the carried resource. It matches more than/less than. If it is a static resource file, it is usually the modification time recorded by the operating system.
Expires is when the server tells the client that a resource Expires. Resources cached by the client automatically expire after this time without having to check with the server if they are 304 Not Modified.
Cache-Control
This is probably the most complex HTTP header of all. This header can be used for either a request or a response. The values in the request and response are different and represent different meanings.
- No-cache If no value is specified for no-cache, caching is not allowed. For requests, the server must not return the cached content directly. For a response, the client must not cache the resource content of the response. If no-cache specifies a value, it means that the header corresponding to the value should not be cached, but other information can be cached. Tell them I only want data fresh out of the shower.
- No-store tells the other party not to persist request/response data elsewhere; this information is sensitive and should be kept volatile. Just tell them to keep it in memory, not on a disk.
- No-transform tells the peer not to transform data. For example, when the client uploads raw image data, the server generally compresses the image data for storage. No-transform tells the peer to keep the original data and do no conversion. Tell the other person not to touch anything I send.
- Only-if-cached is used in the request header to tell the server to only want the cached content and not reload it. Return a 504 Gateway Timeout error if there is no cached content. The client doesn’t want to bother the server too much.
- Max-age Indicates the request header. Limit the age of cached content. If the age exceeds max-age, the server needs to reload the content resource. This is called client age discrimination.
- Max-stale Indicates a request header. The client allows the server to return cached resource content that has expired, but the maximum expiration time is limited. Although the client is very tolerant, it is also limited.
- Min-fresh is used in the request header. The client constrains the server not to use the resource content that is about to expire. Just like when we go to the supermarket to buy milk, if the milk is about to expire even though it’s still within its sell-by date we don’t think about it.
- Public is used for the response header. Indicates that the client is allowed to cache response information and make it available to others. For example, a proxy server caches static resources for use by all proxy users.
- Private is used for the response header. Indicates that only the client is allowed to cache response information for its own use. This is to disallow proxy caching and allow clients to cache the resource content themselves. It means keep it for yourself, don’t lend it to anyone else.
Because HTTP protocol is very detailed, the above text does not fully cover all HTTP header details. In the future [code hole] will continue to update more detailed articles, it is recommended that readers pay attention to the public number [code Hole] to see the relevant articles for the first time.