1. Understand web and network basics

1.1 Accessing the Web Using HTTP

According to the URL specified in the address bar of the Web browser, the Web browser obtains information such as file resources from the Web server to display the Web page. The Web uses THE HyperText Transfer Protocol (HTTP) Protocol as a specification to complete a series of operations from the client to the server. It can be said that the Web is built on the HTTP protocol for communication.


  • What is an agreement?

    For computers and network devices to communicate with each other, they must be based on the same method. For example, how to detect the communication target, which side initiates the communication first, and so on. These rules are called protocols.

  • What is TCP/IP and what does it have to do with HTTP? :

    The protocols associated with the Internet are collectively called TCP/IP. HTTP belongs to a subset of it:

1.2 Layered TCP/IP Management

  • Application layer: Determines the activities of communication when providing application services to users. The TCP/IP protocol family prestores various common application services:

    • File TransferProtocol (FTP)
    • DNS (Domain Name System)
    • The HTTP protocol
  • Transport layer: Provides the application layer with data transfer between two computers in a network connection. There are two different protocols at the transport layer:

    • Transmission Control Protocol (TCP)
    • User Data Protocol (UDP)
  • Network layer: Handles packets that flow over the network. A packet is the smallest unit of data transmitted over a network. This layer defines the path through which the packets are sent to each other’s computers. The role of the network layer is to select a transmission route among many options when it is transmitted to and from the other computer through multiple computers or network devices.

  • Link layer: Handles the hardware part that connects the network. Including control operating system, hardware device driver, NIC (Network Interface Card, Network adapter, namely Network Card), optical fiber and other physical visible parts (also including connectors and other transmission media)

1.3 TCP/IP Traffic

When the TCP/IP protocol family is used for network communication, the sending end goes down from the application layer, and the receiving end goes up from the application layer.

** First, ** as the sending client makes an HTTP request at the application layer (HTTP protocol). Then at the transport layer (TCP protocol), the data (HTTP request packets) received from the application layer are segmented, and each packet is marked with serial number and port number and forwarded to the network layer. Then add the MAC address as the communication destination at the network layer (IP protocol) and forward it to the link layer. In this case, a communication request is successfully sent to the network.

The server at the receiving end receives data at the link layer and sends the data to the upper layer in sequence, all the way to the application layer. HTTP requests sent by clients are received only when they are transmitted to the application layer.

  • When transmitting data from layer to layer, the sender must print the header information of the layer every time it passes through the layer.

  • As the receiver transmits data from layer to layer, it cancels out the corresponding header each time it passes through the layer.

1.4 Protocol closely related to HTTP

TCP/IP protocol

** located at the network layer, the function is to transmit various packets to each other. ** Three important conditions are required to implement transport:

MAC address: fixed IP address of the NIC. 3. ARP: resolved address. Check the MAC address based on the IP addressCopy the code

The communication between IP addresses depends on MAC addresses. On the network, the communication parties usually search for the next transit destination through the MAC addresses of multiple computers and network devices.

TCP protocol

Located at the transport layer, it provides reliable byte stream service and verifies that data is successfully sent to the other party. (Byte Stream service is a service that divides a chunk of data into data packets for easy transmission.)

TCP uses a three-way handshaking strategy to deliver data accurately to the target. The TCP tokens SYN (Synchronize) and ACK (Acknowledgement) are used during the handshake. The handshake process is as follows:

1. The sender sends a packet with 'SYN' flag to the recipient. 2. The sender sends back a packet with an ACK flag, indicating the end of the handshakeCopy the code

DNS protocol

** is located at the application layer to resolve and translate domain names and IP addresses. ** Computers can be assigned either IP addresses or host names and domain names, such as www.example.com. In order for a computer to understand these names, the DNS protocol is used.


The whole process

1.5 the URI and URL

  • Uniform Resource Locator (URL) : specifies a Web page address that you need to enter when accessing a Web page using a Web browser, for examplehttp://hackr.jp/
  • Uniform Resource Identifier (URI) : IDENTIFIES an Internet Resource with a string. A URL represents only the location of a resource, and a URI represents the resource in the corresponding location
- Uniform: Specifies the Uniform format. It makes it easy to work with many different types of resources without having to identify the access method specified by the resource based on the context. Also called protocol scheme. - Resource: identifies anything. Not only documents, but images or services (such as the weather forecast for the day) that can be distinguished from other types can be used as resources. - Identifier: indicates the Identifier of the objectCopy the code

If HTTP is used, the protocol scheme is HTTP. In addition, there are FTP, Mailto, Telnet, file, etc. Here are some examples of several types of URIs:

Second, HTTP protocol

2.1 the sample

According to the HTTP protocol, a request is made from the client and the server responds to the request and returns it. That is, the communication starts from the client.

The following is the contents of a request packet sent by a client to an HTTP server:

GET/index. HTTP / 1.1 HTM Host: hackr. JpCopy the code
  • GETRepresents the type of server to which access is requested
  • /index.htmSpecifies the resource object to request access to, also called the request URI (request-URI)
  • HTTP / 1.1Specifies the HTTP version number
  • Host:hackr.jpRepresents the requested server address

2.2 Type of Packets

The information used for HTTP interaction is called HTTP packets. The types of packets are as follows:

  • Request packet: HTTP packet sent by the requesting end (client). A request message consists of the request method, request URI, protocol version, optional request header field, and content entity

  • Response packet: HTTP packet sent by the responder (server). The response message consists of the protocol version, status code, reason phrase used to explain the status code, optional response header field, and entity body

2.3 Packet Structure

  • Header: The content and attributes of the request or response to be processed by the server or client

  • Message body: The data to be sent

    What is entity body?

    HTTP packet body The entity body used to transmit a request or response. Generally, the message body is equal to the entity body. Only when the encoding operation is carried out in transmission, the content of the entity body changes, causing it to be different from the packet body.


2.4 Packet Instance

Request line: contains the method used for the request, request URI, and HTTP version

Status line: Contains the status code, reason phrase, and HTTP version indicating the result of the response

Header fields: Various headers that contain the various conditions and attributes that represent the request and response. Generally, there are four types: general header, request header, response header and entity header

Others: may contain undefined headers in HTTP RFCS (cookies, etc.)

2.5 the request URI

URI writing

When a client requests access to a resource, it sends a request in one of two ways:

  • Write the network domain name or IP address in the header field Host

    GET/index. HTTP / 1.1 HTM Host: hackr. JpCopy the code
  • Write the complete request URI:

    GET http://hackr.jp/index.htm HTTP / 1.1Copy the code

URI format

To represent a specified URI, use an absolute URI, an absolute URL, and a relative URL that covers all the necessary information. A relative URL is a URL specified from the basic URI in the browser, such as /image/logo.gif. Here is the format of the absolute URI:

  • Login information (authentication) : Specify the user name and password as necessary login information (authentication) to obtain resources from the server. This item is optional.

  • Server address: You must specify the address of the server to be accessed using the absolute URI. The address can be a DNS resolvable name like hackr.jp, an IPv4 address name like 192.168.1.1, or an IPv6 address name enclosed in square brackets like 0:0:0:0:0:0:0:1.

  • Server port number: specifies the network port number to which the server is connected. This option is optional. If omitted, the default port number is automatically used.

  • Hierarchical file path: Specifies the file path on the server to locate the specified resource. This is similar to the file directory structure on UNIX systems.

  • Query string: You can use the query string to pass in any parameter for a resource within a specified file path. This item is optional.

  • Fragment identifiers: Fragment identifiers are often used to mark a child resource (a location within a document) within an acquired resource. However, the RFC does not specify its use method. This item is also optional.

2.6 Request Methods

GET

** Requests access to a resource identified by the URI. ** The specified resource is parsed by the server and the response content is returned.

POST

The body of the transport entity.

PUT

Transfer files.

HTTP/1.1’s PUT method does not have an authentication mechanism of its own, so anyone can upload a file, so it is not used by the average Web site. Typically, comparable Web sites that require validation mechanisms for Web applications or are architecting Representational State Transfer (REST) standards may be open to use of the PUT method. (Same with DELETE method)

DELETE

Delete files.

HEAD

** Obtains the packet header. The ** method is the same as the GET method, but does not return the body part of the packet. This parameter is used to check the validity of the URI and the date and time of resource update.

OPTIONS

Ask for supported methods.

TRACE

** Trace the path. ** Let the Web server return the request communication loop to the client.

At the time of sending the request, the forward field of max-forwards is filled with a value. After passing through each end, the value is reduced by one. When the value reaches zero, the transmission is stopped. The client can TRACE how the outgoing request was modified or tampered with. It is prone to XST (cross-site tracing) attacks.

CONNECT

A tunnel protocol is required to connect the agent.

The CONNECT method requires that a tunnel be established when communicating with the proxy server to realize TCP communication using the tunnel protocol. Secure Sockets Layer (SSL) and TransportLayer Security (TLS) protocols are used to encrypt communications and then transmit them through network tunnels. The format is as follows:

CONNECT Proxy server name: port number HTTP versionCopy the code

2.7 the Cookie

**HTTP is stateless protocol. ** THE HTTP protocol does not store the state of communication between requests and responses. To implement state saving, you need to reference Cookie technology

Cookie technology controls client status by writing Cookie information in request and response packets. The Cookie notifies the client to save the Cookie based on the set-cookie header field in the response packet sent from the server. When the client sends a request to the server next time, the client automatically adds the Cookie value to the request packet and sends the request packet. After discovering the Cookie sent by the client, the server will check which client sent the connection request, and then compare the records on the server to obtain the previous status information.


3. HTTP transmission

3.1 Persistent Connection

In the original version of the HTTP protocol, TCP connections were disconnected for every HTTP communication

When the document contains a large number of images:

HTTP/1.1 and some HTTP/1.0 came up with a ** Persistent Connections (also known as HTTP keep-alive or HTTP Connectionreuse) solution to the above TCP connection problem. ** The TCP connection remains as long as either end does not explicitly request the disconnection. The benefits of persistent connections are that they reduce the overhead caused by the repeated establishment and disconnection of TCP connections and reduce the load on the server side.

3.2 pipelines

After sending the previous request, wait and receive the response before sending the next request. With the advent of pipelining, the next request can be sent directly without waiting for a response.

3.3 Transmission Coding

HTTP can directly transmit data as it is, but can also improve the transmission rate through encoding during transmission. ** Can efficiently handle large numbers of access requests by encoding at transport time. ** However, the operation of coding requires the computer to complete, so it will consume more RESOURCES such as CPU.

3.4 Content Coding

HTTP protocol has the function of content encoding. Content coding can compress the size of the packet entity content and speed up the response. The encoded entity is received and decoded by the client. Gzip (GNU Zip), COMPRESS (Standard UNIX compression), Deflate (Zlib), Identity (no coding)

3.5 Block transmission coding

During HTTP communication, the browser cannot display the requested page until the requested encoded entity resource is transferred.

When transferring large amounts of data, the browser can gradually display the page by dividing the data into multiple pieces. This ability to block entity bodies is called Chunked Transfer Coding

The size of each block is marked in hexadecimal, while the last block of the entity body is marked with “0(CR+LF)”. The entity body that uses the chunking transfer encoding is decoded by the receiving client and reverts to the entity body before encoding. There is a mechanism called Transfer Coding in HTTP/1.1, which can be transmitted in a certain encoding mode during communication, but is only defined for block Transfer encoding.

3.6 Multi-part Object collections

When we send an email, we can write text in the email and add multiple attachments. This is due to the adoption of the Multipurpose Internet Mail Extensions (MIME) mechanism, which allows Mail to handle multiple types of data, such as text, images, and video. For example, binary data such as pictures are indicated as ASCII string encoding, using MIME to describe token data types. In MIME extensions, a method called Multipart objects is used to hold multiple pieces of data of different types.

The HTTP protocol also adopts the multi-part object set. The body of a packet sent can contain multiple types of entities. Usually used when uploading images or text files, etc. When using the multi-part object set in HTTP packets, you need to add the Content-Type in the header field. The multi-part object set contains the following objects:

  • Multipart /form-data: used when uploading Web form files.

  • Multipart/Byteranges: Status code 206 (Partial Content) used when the response packet contains multiple ranges of Content

Use the Boundary string to divide various entities specified by a collection of multi-part objects. Insert a ‘–‘ flag before the start line of each entity specified by the Boundary string (e.g. –AaB03x, — this_string_procedure) and insert a ‘–‘ flag at the end of the string corresponding to the multi-part object set (e.g. –AaB03x–, — this_string_procedure –) Each part type of a multi-part object collection can contain a header field. Alternatively, you can use a collection of multi-part objects nested within a part.

3.7 Range Request

Previously, downloading a slightly larger image or file was already a struggle. If you encounter a network interruption during the download, you have to start all over again. In order to solve the above problems, a recoverable mechanism is needed. Recovery refers to the ability to recover a download from a previous download break. To do this, you need to specify the Range of entities to download. The Request sent by the specified Range is called a Range Request.

When a Range request is executed, the header field Range is used to specify the byte Range of the resource:

  • The value ranges from 5001 to 10000 bytes

    Range:bytes=5001-10000
    Copy the code
  • From 5001 bytes onwards

    Range:bytes=5001-
    Copy the code
  • From the beginning to multiple ranges of 3000 bytes and 5000-7000 bytes

    The Range: 0-7000 bytes = - 3000500Copy the code

    For a range request, a 206 Partial Content response message is returned. In addition, for multiple range requests, the response will return a response message after the content-Type field indicates multipart/ Byteranges. If the server is unable to respond to the range request, the status code 200 OK and the complete entity content are returned.

3.8 Content Negotiation

A Web site may have multiple pages with the same content. The English and Chinese Web pages, for example, are identical in content but in different languages. If the default language of the browser is English or Chinese and you access the Web page with the same URI, the English or Chinese version of the Web page is displayed. Such a mechanism is called ContentNegotiation.

Some header fields in the request message are the benchmarks for judgment (explained below) :

  • Accept
  • Accept-Charset
  • Accept-Encoding
  • Accept-Language
  • Content-Language

There are three types of content negotiation techniques:

  • ** Server-driven Negotiation :** Content Negotiation is done on the Server side. The header field of the request is used as a reference and is automatically processed on the server side. But for users, judging by what the browser sends is not always a good way to filter out the best content.
  • ** Agent-driven Negotiation :** The way content Negotiation is done by clients. The user manually selects from the list of options displayed in the browser. You can also make this selection automatically on a Web page using JavaScript scripts. For example, according to the OS type or browser type, the PC version of the page or mobile version of the page.
  • Transparent Negotiation :** is a combination of server-driven and client-driven. It is a method of content Negotiation by the server-side and client-side respectively.

HTTP status code

4.1 What is a Status Code

The status code is responsible for describing the returned request results when the client sends a request to the server. The status code lets the user know whether the server handled the request normally or if an error occurred.

4.2 Status Code Categories

The first digit of the number specifies the response category, and the last two digits are unclassified. There are five response categories.

The HTTP status codes recorded in RFC2616 alone amount to 40 kinds, if combined with WebDAV (Web-based distributed Authoring and Versioning, Distributed authoring and versioning based on the World Wide Web (RFC4918, 5842) and additional HTTP status code (RFC6585) and other extensions, the number of more than 60. Despite the variety, there are only about 14 commonly used ones.

4.3 2 xx success

The response from 2XX indicates that the request was processed normally.

  • 200 OK

    The request from the client is processed normally on the server

  • 204 NO CONTENT

    The request received by the server is successfully processed, but the response message returned does not contain the body part of the entity. In addition, it is not allowed to return the body of any entity

  • 206 Partial Content

    This status code indicates that the client made a range request and that the server successfully executed that part of the GET request. The response message contains the entity Content in the Range specified by content-range.

4.4 3XX Redirection

The 3XX response results indicate that the browser needs to perform some special processing to properly handle the request

  • 301 Moved Permanently

    Permanent redirect. The URI has been assigned a new URI. If the URI corresponding to the resource is already bookmarked, it should be saved again as prompted by the Location header field. As shown in the request URI below, a 301 status code is generated when you forget to add a slash “/” at the end of the specified resource path.

    http://example.com/sample
    Copy the code

  • 302 Found

    Temporary redirection. The URI has been assigned a new URI and is expected to be accessed by the user (this time) using the new URI. Unlike 301, 302 status codes represent resources that are not permanently moved, but only temporarily. The URI of a resource that has been moved may change in the future. For example, the user saves the URI as a bookmark, but does not update the bookmark as in 301.

  • 303 See Other

    Because the resource corresponding to the request has another URI, the GET method should be used to target the requested resource. It has the same functionality as 302, and the 303 status code explicitly states that the client should use the GET method to GET the resource

When the 301, 302, and 303 response status codes return, almost all browsers change POST to GET and remove the body of the request message, and the request is automatically sent again. Standards 301 and 302 prohibit the change from POST to GET.

  • 304 Not Modified

    Indicates that when a client sends a conditional request, the server allows the request to access a resource. But if the request does Not meet the criteria, 304 Not Modified is returned. Although 304 is classified as 3XX, it has nothing to do with redirection.

  • 307 Temporary Redirect

    Temporary redirect. The status code has the same meaning as 302. 307 will comply with browser standards and will not change from POST to GET.

4.5 4XX Client Errors occur

The 4XX response results indicate that the client is the cause of the error

  • 400 Bad Request

    Syntax errors exist in the request packet. When an error occurs, you need to modify the content of the request and send the request again.

  • 401 Unauthorized

    The request to be sent requires the AUTHENTICATION information that passes the HTTP authentication (BASIC authentication or DIGEST authentication). A response returned with 401 must contain a WWW-authenticate first challenge user information applicable to the requested resource. When the browser receives the 401 response for the first time, an authentication dialog will pop up. In addition, if the request has been made once before, the user authentication fails.

  • 403 Forbidden

    This status code indicates that access to the requested resource was denied by the server. The reasons for 403 May be that the file system is not authorized to access the file system, or that the access permission is incorrect (an unauthorized sending source IP address attempts to access the file system).

  • 404 Not Found

    This status code indicates that the requested resource could not be found on the server. In addition, it can be used when the server rejects the request without giving a reason.

4.6 5XX Server Error

  • 500 Internal Server Error

    This status code indicates that an error occurred on the server side while executing the request. It could also be a Web application bug or some temporary glitch.

  • 503 Service Unavailable

    This status code indicates that the server is temporarily overloaded or is down for maintenance and is unable to process requests at this time. If you know in advance how long it will take to resolve the above situation, it is best to write the retry-after header field and return it to the client.

The status code and condition may be inconsistent. Many of the status code responses returned are incorrect, but the user may not be aware of this. It is not uncommon for the status code to return 200 OK after an error occurs within a Web application.

Web server with HTTP assistance

5.1 Multiple Domain names are implemented on a single virtual host

The HTTP/1.1 specification allows a single HTTP server to host multiple Web sites. For example, Web Hosting Service providers:

  • You can use one server to serve multiple customers
  • You can also run a different web site with a domain name held by each customer

On the Internet, a domain name is mapped to an IP address by the DNS service to access the target website, that is, when the request is sent to the server, it is accessed as an IP address. If multiple domains are hosted on one server, the same IP address will be accessed. For the same IP address, you must specify the URI of the Host name or domain name in the Host header when sending the HTTP request.

5.2 Communication data forwarding program

In HTTP communication, in addition to clients and servers, there are applications for forwarding communication data, such as proxies, gateways, and tunnels. These applications and servers can forward requests to the next server on the communication line, and can receive the response from that server and forward it to the client.

The agent

A proxy is a forwarding application that acts as a “middle man” between the server and the client. It receives requests sent by the client and forwards them to the server, and receives responses returned by the server and forwards them to the client:

The server that holds the resource entity is called the source server. The response returned from the source server passes through the proxy server to the client. When forwarding, you need to attach the Via header field to mark the passing host information:

** Uses caching techniques (explained later) to reduce traffic from network bandwidth, internal organization access control for specific sites, with access logs as the primary purpose, and so on.

There are two types of agents:

  • Caching Proxy: When a Proxy forwards a response, the Caching Proxy stores a copy of the resource (cache) on the Proxy server in advance. When the proxy receives a second request for the same resource, it can not fetch the resource from the source server, but return the previously cached resource as a response.
  • Transparent Proxy: Transparent Proxy is a type of Proxy that forwards requests or responses without processing packets. On the contrary, an agent that processes the packet content is called an opaque agent.

The gateway

A gateway is a server that forwards communication data from other servers. When it receives a request from a client, it processes the request as if it were a source server that owns its own resources. Gateways work much like proxies. The gateway enables the server on the communication line to provide non-HTTP services:

** Function: ** Improves communication security because the communication line between the client and the gateway can be encrypted to secure the connection. For example, a gateway can connect to a database and query data using SQL statements. In addition, the gateway can be linked with the credit card settlement system when the credit card settlement is carried out on the Web shopping website.

The tunnel

A tunnel is an application that communicates between a remote client and a remote server. A tunnel can establish a communication line with other servers as required and use encryption methods such as SSL to communicate. The tunnel itself does not parse the HTTP request, which is forwarded to the subsequent server as is. The tunnel ends when the communication ends.

** Function: ** Ensures secure communication between the client and the server.

5.3 the cache

The cache is a copy of resources stored on the local disk of the proxy server or client. Using caching reduces access to the source server, thus saving traffic and communication time. A cache server is a type of proxy server that keeps a copy of the resource when the proxy forwards the response returned from the server.


Caches have an expiration date, so there is no guarantee that a request for the same resource will be returned each time. When a resource on the source server is updated, using the same cache will revert to the “old” resource before the update. The cache verifies the validity of resources to the source server based on client requirements and cache validity. If the cache is deemed invalid, the cache server will fetch the “new” resource from the source server again.


** Clients can also cache. ** Caches can exist not only in the cache server, but also in the client browser. For example, the InternetExplorer program calls the client cache Temporary InternetFile. If the browser cache is valid, it does not have to request the same resource from the server and can be read directly from the local disk.

Also, like the cache server, when the cache is determined to be expired, the resource is validated to the source server. If the browser cache is invalid, the browser requests new resources again.

HTTP header

6.1 Header Field

The header field provides the browser and server with information such as the size of the packet body, language used, and authentication information.

6.2 Header Field Structure

  • HTTP header fields are composed of header field names and field values separated by colons (:) :
Header field: Field value Content-type :text/ HTMLCopy the code
  • Field values can have multiple values:
Keep-Alive:timeout=15,max=100
Copy the code

6.3 Header Field Type

  • General Header Fields

  • Request Header Fields

  • Response Header Fields

  • Entity Header Fields

6.4 General header Field

Header used by both request and response packets

Cache-Control

Cache-Control:private,max-age=0,no-cache
Copy the code

Indicates whether it can be cached:

  • Cache-control :public # indicates that other users can use the CacheCopy the code
  • Cache-control :private # The Cache server will Cache resources for specific usersCopy the code

  • Cache-control :no-cache # The Cache server cannot Cache resources. The client does not receive cached responses. Prevents the return of expired resources from the cacheCopy the code

  • Cache-control :no-cache=Location # If a specific parameter value is specified, then the client cannot use the Cache after receiving the response packet corresponding to the header field of the specified parameter value. In other words, a header field with no parameter value can be cached.Copy the code

Controlling cacheable objects:

  • Cache-control :no-store # specifies that the Cache cannot store any part of the request or response locally.Copy the code

Specify cache duration and authentication:

  • Cache-control :s-maxage:604900(in seconds) The processing of the Expires header field and max-age directive is ignoredCopy the code
  • Cache-control :max-age=604900 ## If the value of the Cache resource is determined to be smaller than the specified value of the Cache time, then the client receives the cached resource # for the same userCopy the code

    A cache server running HTTP/1.1 will process a max-age directive in preference to an Expires header field when both exist. On HTTP/1.0, however, the max-age directive is ignored.

  • The cache-control :min-fresh=60 #min-fresh directive requires the Cache server to return cached resources that have not expired for at least 60 secondsCopy the code

  • Cache-control :max-stale=3600 # If no value is specified in the cache-control :max-stale=3600 # If no value is specified in the cache-control :max-stale=3600 # If no value is specified in the cache-control :max-stale=3600 # If no value is specified in the cache-control :max-stale=3600 # If no value is specified in the cache-control :max-stale=3600 # If no value is specified in the cache-control :max-stale=3600 # If no value is specified in the cache-control :max-stale=3600 #Copy the code
  • Cache-control :only-if-cached # The client will only require the target resource to be returned if it is cached locally by the Cache server. The cache server does not reload the response and does not revalidate the resource.Copy the code
  • Cache-control :must-revalidate # The proxy verifies with the source server again that the Cache of the response to be returned is currently valid. If the proxy fails to connect to the source server to obtain a valid resource again, the cache must give the client a 504 status codeCopy the code
  • Cache-control :proxy-revalidate 3 Requires all Cache servers to verify the validity of the Cache again before receiving a response from a client with this directive.Copy the code
  • Cache-control :no-transform # The Cache cannot change the media type of the entity body, either in the request or in the response. Prevent caching or proxy compression of images and similar operationsCopy the code

Connection

Controls header fields that are no longer forwarded to the agent

Connection: the name of the header field that will not be forwardedCopy the code

Managing persistent Connections

The default connection for HTTP/1.1 is persistent. When the server wants to explicitly disconnect, specify the value of the Connection header field as Close:

Connection:close
Copy the code

The default connection for HTTP versions prior to HTTP/1.1 was non-persistent. To do this, if you want to maintain a persistent Connection over older versions of THE HTTP protocol, you need to specify the keep-alive header field:

Connection:Keep-Alive
Copy the code

Date

Indicates the date and time when the HTTP packet is created

Date:Tue,03 Jul 2012 04:40:59 GMT #HTTP/1.1 uses the format defined in RFC1123. Date:Tue, 03-jul -12 04:40:59 GMT # previous HTTP protocols use the format defined in RFC850 Date:Tue Jul 03 04:40:559 2012 # Other formats The output format is the same as that of the ASctime () function in the C libraryCopy the code

Pragma

Pragma is a legacy field from prior HTTP/1.1 and is defined only as backward compatibility with HTTP/1.0. This header field is a generic header field, but is used only in requests sent by the client. The client will require all intermediate servers not to return cached resources.

Pragma:no-cache
Copy the code

If all intermediate servers can use HTTP/1.1 as a benchmark, then cache-control: no-cache is ideal. However, it is not practical to know the HTTP protocol version used by all intermediate servers. Therefore, the sent request will contain both of the following header fields.

Cache-Control:no-cache
Pragma:no-cache
Copy the code

Trailer

Specify in advance which header fields are recorded after the body of the message. This header field can be used in HTTP/1.1 block transfer encoding. The value of the header field Trailer is specified as Expires below, and the header field Expires appears after the packet body (after the block length is 0).

Transfer-Encoding

Specifies the encoding mode used to transmit the packet body

Upgrade

Used to detect whether HTTP and other protocols can communicate with a higher version, and the parameter value can be used to specify a completely different communication protocol

The value of Connection is specified as Upgrade. The Upgrade object where the Upgrade header field takes effect is only between the client and the adjacent server. Therefore, when using the header field Upgrade, you also need to specify Connection:Upgrade as an additional parameter.

Via

** Trace the transmission path of request and response packets between the client and the server. ** When a packet passes through a proxy or gateway, information about the server is appended to the header field Via before it is forwarded. It also avoids request loops. Therefore, the header field content must be appended as it passes through the broker.

The Via header is used to TRACE the transport path, so it is often used with the TRACE method. For example, when the proxy server receives a request (max-forwards: 0) sent through the TRACE method, the proxy server can no longer forward the request. In this case, the proxy server appends its own information to the Via header and returns a response to the request.

Warning

Alert users to some cache-related problems

Warning:113 gW.hackr.jp :8080 "Heuristic expiration" Tue,03 Jul=>201205:09:44 GMT # format: Warning: Port number] "[Warning content]" ([date and time])Copy the code

6.5 Request Header Field

Header used to send request packets from the client to the server. This section provides additional information about the request, client information, and priority of the response.

Accept

Notifies the server of the media types that the user agent can handle and the relative priority of the media types

Accept:text/html,application/xhtml+xml,application/xml; Q = 0.9 * / *; Q = 0.8Copy the code

  • Text files: Text/HTML, text/plain, Text/CSS, Application/XHTML + XML, Application/XML…
  • Image files: image/ JPEG, image/ GIF, image/ PNG…
  • Video files: Video/MPEG, Video/QuickTime…
  • Binary files used by the application: application/octet-stream, application/zip…

If you want to give priority to the media type displayed, use q= to indicate additional weight values, using a semicolon (;). Delimit. The weight value q ranges from 0 to 1 (accurate to 3 decimal places), and 1 is the maximum value. If q is not specified, the default weight is Q =1.0. When the server serves multiple types of content, the media type with the highest weight value is returned first.

Accept-Charset

Notifies the server of the character set supported by the user agent and the relative priority of the character set.

Accept-Charset:iso-8859-5,unicode-1-1; Q = 0.8Copy the code

Accept-Encoding

Inform the server of the content encoding supported by the user agent and the priority order of the content encoding. Multiple content encodings can be specified at once.

Accept-Encoding:gzip,deflate
Copy the code
  • Gzip: an encoding format generated by the file compression program gzip (GNU zip) (RFC1952) using the lempel-ziv algorithm (LZ77) and Cyclic RedundancyCheck (CRC) of 32 bits.
  • Compress: An encoding format generated by the UNIX compress program, using the Lempel-Ziv-Welch algorithm (LZW).
  • Deflate: A combination of the Zlib format (RFC1950) and the encoding format generated by the Deflate compression algorithm (RFC1951)
  • Identity: Default encoding format that does not perform compression or does not change

The relative priority can be expressed by the weight Q value. Alternatively, you can use the asterisk (*) as a wildcard to specify any encoding format

Accept-Language

Tell the server which natural language sets (Chinese, English, etc.) the user agent can handle, and the relative priority of the natural language sets. Multiple sets of natural languages can be specified at once.

Accept-Language:zh-cn,zh; Q = 0.7, en - us, en. Q = 0.3Copy the code

Authorization

Informs the server of the user agent’s authentication information (certificate value). Occurs between the client and the server

Authorization:Basic dWVub3Nlb jpwYXNzd29yZA==
Copy the code

Proxy-Authorization

** When receiving an authentication challenge from the proxy server, the client informs the server of the information required for authentication. ** occurs between the client and the proxy

Expect

** Tells the server to expect a particular behavior. ** When an error occurs because the server cannot understand what the client expects to respond, the status code 417 ExpectationFailed is returned

Expect:100-continue
Copy the code

From

Tells the server the E-mail address of the user using the user agent.

Form:[email protected]
Copy the code

Host

** Tells the server the Internet host name and port number of the requested resource. The ** header field Host is closely related to the working mechanism of virtual hosts with multiple domain names assigned by a single server. When a request is sent to the server, the host name in the request is replaced directly with the IP address. However, if multiple domain names are deployed running under the same IP address, the header field Host is used to specify the requested Host name.

Host:www.hackr.jp
Copy the code

If the server does not have a host name, send a null value:

Host:
Copy the code

Range

For scope requests that fetch only part of the resource, including the header field Range tells the server the specified scope of the resource. The server receiving the request with the Range header field returns a response with a status code of 206Partial Content after processing the request. When the range request cannot be processed, a response with status code 200 OK is returned along with all resources.

Range:bytes=5001-10000
Copy the code

A request header field in the form of if-xxx is called a conditional request. When the server receives a conditional request, it will execute the request only if it determines that the specified condition is true

If-Match

** Tells the server the entity tag (ETag) value used to match the resource. ** The server cannot use weak ETag values. The request is executed only if the two are consistent. Otherwise, the response of state code 412 Precondition Failed is returned. You can also use an asterisk (*) to specify the value of the if-match field, in which case the server will ignore the ETag value and process the request as long as the resource exists.

If-Match:"123456"
Copy the code

If-Range

Tells the server that If the specified if-range field value (ETag value or time) matches the ETag value or time of the requested resource, it will be processed as a Range request. Otherwise, all resources are returned. (an improved version of if-match and Range)

If-Modified-Since

Tell the server to process the request If the resource is updated later than the if-Modified-since field value; Conversely, If the resource was updated before the if-Modified-since field value, and none of the requested resources has been updated, a response with status code 304Not Modified is returned.

If-Modified-Since:Thu,15 Apr 2004 00:00:00 GMT
Copy the code

If-modified-since is used to verify the validity of a local resource owned by an agent or client. The update date and time of the resource can be determined by confirming the header field last-Modified

If-Unmodified-Since

This is the opposite of if-modified-since. Tell the server to process the request if the requested resource has not been updated after the field value. If an update occurs after the specified date and time, the feed Failed is returned in response with the status code 412

Max-Forwards

Specify the maximum number of servers to be moved through when sending a request containing the header field max-forwards through the TRACE method or OPTIONS method. When the server receives a request for max-forwards with a value of 0, it does not forward the request but directly returns the response.

Max-Forwards:10
Copy the code

When HTTP is used for communication, requests may pass through multiple servers, such as proxies. Along the way, if the proxy server fails to forward the request for some reason, the client can’t wait for the response from the server. We don’t know. Max-forwards can investigate the causes of these problems

Referer

** Tells the server the URI of the original resource requested. ** Clients typically send the Referer header field to the server. If you enter the URI directly in the browser’s address bar, or for security reasons, you may not send this header field. Because the query string in the URI of the original resource may contain confidential information such as ID and password, if it is written in the Referer and forwarded to other servers, it may lead to disclosure of confidential information.

TE

** Tells the server the transport encoding and relative priority of the response that the client can handle. ** This is similar to the accept-encoding function of the header field, but is used to transmit the Encoding.

TE:gzip,deflate; Q = 0.5Copy the code

You can also specify how to block transfer coding along with the Trailer field:

TE:trailers
Copy the code

User-Agent

Information such as the browser and user agent name that created the request is passed to the server.

6.6 Response header Field

Header used to return response packets from the server to the client. Additional content added to the response also requires the client to attach additional content information.

Accept-Ranges

** Tells the client whether the server can handle scope requests to specify a portion of the server’s resources. ** There are two types of field values that can be specified, bytes for range requests and None for range requests

Accept-Ranges:bytes
Copy the code

Age

** Tells the client how long ago the source server created the response. The ** field value is in seconds.

Age:600
Copy the code

If the server that creates the response is a cache server, the Age value indicates the time between the cached response initiating authentication again and the completion of authentication.

ETag

** Tells the client about the entity identity. ** It is a way to uniquely identify a resource as a string. The server assigns an ETag value to each resource

ETag:"82e22293907ce725faf67773957acd12"
Copy the code

In addition, the ETag value needs to be updated when the resource is updated. When ETag values are generated, there is no uniform algorithmic rule, but only allocation by the server.

When a resource is cached, it is assigned a unique identity. For example, when using a Chinese browser to access www.google.com/, the Chinese version of the corresponding…


Strong ETag and weak Tag values

Strong ETag value: Changes the value of an entity no matter how subtle it is

ETag:"usagi-1234"
Copy the code

Weak ETag value: used only to indicate whether the resources are the same. The ETag value is changed only when the resource is fundamentally changed, resulting in a difference. At this point, W/ is appended at the beginning of the field value

ETag:W/"usagi-1234"
Copy the code

Location

** Use the header field Location to direct the response recipient to a resource at a different Location than the request URI. ** Basically, this field provides the URI for the Redirection in conjunction with the 3xx:Redirection response. Almost all browsers, upon receiving a response containing the header field Location, will force an attempt to access the prompted redirect resource.

Location:http://www.usagidesign.jp/sample.html
Copy the code

WWW-Authenticate

** Used for HTTP access authentication. ** It tells the client which authentication scheme (Basic or Digest) and challenge with parameter prompts is appropriate for accessing the resource specified by the request URI. Status code 401 Unauthorized The response must contain the first field wwW-authenticate. (Between client and server)

WWW-Authenticate:Basic realm="Usagidesign Auth"
Copy the code

Proxy-Authenticate

Sends the authentication information requested by the proxy server to the client. (Between client and agent)

Proxy-Authenticate:Basic realm="Usagidesign Auth"
Copy the code

Retry-After

Tell the client how long it should take to send the request again. This parameter is used with the 503 Service Unavailable response or 3XX Redirect response.

Retry-After:120
Copy the code

Server

** The header field Server tells the client about the HTTP Server application currently installed on the Server. ** May also include a version number and an option to enable at installation time.

Server:Apache/2.2.17 (Unix)
Server:Apache/2.2.6 (Unix) PHP/5.2.5
Copy the code

vary

** Controls the cache. ** After the proxy server receives a response from the source server that contains the Vary specified item, if caching is required, only requests with the same Vary specified header field in the request are returned. Even if a request is made for the same resource, because Vary specifies a different header field, the resource must be retrieved from the source server.

Vary:Accept-Language
Copy the code

6.7 Entity header Field

The header used for the entity portion of request and response messages. Added entity-related information such as when the resource content was updated.

Allow

** Notifies the client that it can support all HTTP methods for the resource specified by request-URI. ** When the server receives an unsupported HTTP Method, the server returns a response with the status code 405 Method Not Allowed. At the same time, all supported HTTP methods are written to the header field Allow and returned.

Content-Encoding

Inform the client of the content encoding method chosen by the server for the body of the entity. Content encoding refers to compression without loss of entity information. (Mainly used: Gzip, COMPRESS, Deflate, identity)

Content-Language

Tells the client the natural language used by the entity body

Content-Length

Tells the entity the size (in bytes) of the body part. (The content-Length header field is no longer used when transferring Content encoding to entity bodies.)

Content-Location

Gives the URI corresponding to the body part of the message. Unlike the header field Location, content-location indicates the URI of the resource returned by the packet body.

For example, for server-driven requests that use the accept-language header field, the URI will be specified in the content-location header field if the page Content returned is different from the actual requested object. (Visiting www.hackr.jp/ returns HTTP…

Content-MD5

Content-md5 is a string of values generated by the MD5 algorithm to check whether the packet body is intact during transmission and to confirm the arrival of the transmission.

The 128-bit binary number obtained by using the MD5 algorithm on the packet body is written into the content-MD5 field after Base64 encoding. Since the HTTP header cannot record binary values, it is processed through Base64 encoding. To ensure the validity of the packet, the receiving client executes the same MD5 algorithm on the packet body. After comparing the calculated value with the field value, you can judge the accuracy of the packet body.

Using this method, it is impossible to verify accidental changes in content, nor can malicious tampering be detected. One reason is that Content that can be tampered with also means content-MD5 can be recalculated and tampered with. Therefore, the client in the receiving phase cannot realize that the packet body and header field Content-MD5 have been tampered.

Content-Range

For Range requests, the header field, content-range, used to return the response, tells the client which part of the entity returned as a response complies with the Range request.

Content-Type

Notesthe media type of an object within an entity body.

Content-Type:text/html; charset=UTF-8Copy the code

Expires

Inform the client of the expiration date of the resource. The cache server responds to a request with a cache after receiving a response containing the header field Expires. A copy of the response is stored until the Expires field value is specified. When the specified time passes, the cache server turns to the source server to request the resource when the request is sent.

Expires:Wed,04 Jul 2012 08:26:05 GMT
Copy the code

When the source server does not want the cache server to cache the resource, it is best to write the same time value in the Expires field as in the header field Date. However, when a header field cache-Control has a max-age directive, the max-age directive is processed before the header field Expires.

Last-Modified

** Specifies when the resource is finally modified. ** In general, this value is the request-uri that specifies when the resource is modified. However, for dynamic data processing like CGI scripts, this value can become the time when the data was finally modified.

Last-Modified:Wed,23 May 2012 09:59:55 GMT
Copy the code

6.8 Non-HTTP /1.1 Header Fields

Header fields used in HTTP communication are not limited to the 47 header fields defined in RFC2616. There are also header fields such as Cookie, set-cookie, and Content-Disposition, which are defined in other RFCS and are also used frequently.

6.9 E2E header and hop-by-hop header header fields

  • End-to-end Header: The Header in this category is forwarded to the final recipient of the request/response, must be stored in the response generated by the cache, and must be forwarded.

  • Hop-by-hop Header: The Header in this category is valid only for a single forward and will not be forwarded because it passes the cache or proxy. In HTTP/1.1 and later versions, if a hop-by-hop header is to be used, the Connection header field must be provided.

    • Connection

    • Keep-Alive

    • Proxy-Authenticate

    • Proxy-Authorization

    • Trailer

    • TE

    • Transfer-Encoding

    • Upgrade

      Except for these, all other fields belong to the end-to-end header

6.10 is the header field of the Cookie service

The working mechanism of Cookie is user identification and state management. In order to manage users’ status, Web sites temporarily write some data to users’ computers through Web browsers. Then when the user visits the Web site, the Cookie stored before can be retrieved through communication.

When the Cookie is invoked, the validity period of the Cookie can be verified, as well as the domain, path, protocol and other information of the sender, so the data in the formally published Cookie will not be leaked by attacks from other Web sites and attackers.

Set-Cookie

When the server is ready to start managing the state of the client, various information is given in advance.

Set-Cookie:status=enable; expires=Tue,05 Jul 2011 07:26:31 GM T; = > path = /; domain=.hackr.jp;Copy the code

  • Expires: Specifies the expiration date when a browser can send a Cookie. When the Expires attribute is omitted, it is only valid for the duration of the browser Session. This is usually limited to until the browser application is closed. In addition, once a Cookie is sent from the server side to the client, there is no way to explicitly delete the Cookie on the server side. However, the substantial deletion of client cookies can be achieved by overwriting expired cookies.

  • Path: the file directory that limits the range of cookies to be sent

  • Domain: The specified domain name can be the same as the end. For example, if example.com is specified, cookies can be sent by either www.example.com or www2.example.com in addition to example.com. Therefore, it is safer not to specify the domain attribute, except to send cookies to multiple specifically specified domain names.

  • Secure: Prevents Web pages from sending cookies only in HTTPS secure connections

    Set-Cookie:name=value; secureCopy the code

    The above example will only enter if www.example.com/ is securely connected.

  • HttpOnly: Prevents JavaScript scripts from obtaining cookies. Its main purpose is to prevent Cookie information theft by cross-site scripting (XSS)

    Set-Cookie:name=value; HttpOnlyCopy the code

    Cookies can also be read from within a Web page. JavaScript document.cookie cannot read the content of the cookie with the HttpOnly attribute attached. Therefore, there is no way to hijack cookies with JavaScript in XSS.

Cookie

Tell the server that when the client wants HTTP state management support, it includes the Cookie it receives from the server in the request. When multiple cookies are received, they can also be sent in the form of multiple cookies.

Cookie:status=enable
Copy the code

6.11 Other header Fields

HTTP header fields are self-extensible. Therefore, in the application of Web server and browser, there will be various non-standard header fields.

X-Frame-Options

It is the HTTP response header and is used to control the display of Web content within the Frame tag of other Web sites. ** The main purpose is to prevent clickjacking attacks.

X-Frame-Options:DENY
Copy the code

The x-frame-options header field can be specified as follows:

  • DENY: refused to
  • SAMEORIGIN: Allows the browsing of pages within a same-origin domain name only. (For example, when specifying hackr.jp/sample.html…

Browsers that support the header field include Internet Explorer 8, Firefox 3.6.9+, Chrome4.1.249.1042+, Safari 4+, and Opera 10.50+.

X-XSS-Protection

Belongs to the HTTP response header, which is a countermeasure against cross-site scripting attacks (XSS), used to control the browser XSS protection mechanism on or off.

X-XSS-Protection:1
Copy the code

Header field x-xss-protection can be specified as follows:

  • 0: Sets XSS filtering to invalid state
  • 1: XSS filtering is set to a valid state

DNT

It belongs to the first HTTP request. DNT is short for Do Not Track, which means to refuse to collect personal information and is a way to refuse to be tracked by accurate advertisements.

DNT of the header field can be specified as follows:

  • 0: Consent to be tracked
  • 1: Refuse to be tracked

P3P

P3P (The Platform for PrivacyPreferences) technology is used to make personal privacy on Web sites into a form that can only be understood by programs, so as to achieve The purpose of protecting user privacy.

P3P:CP="CAO DSP LAW CURa ADMa DEVa TAIa PSAa PSDa=>
IVAa IVDa OUR BUS IND UNI COM NAV INT"
Copy the code

To set up P3P, follow the following steps:

  • Step 1: Create P3P privacy
  • Step 2: After creating the P3P privacy contrast file, save the file named /w3c/ p3P.xml
  • Step 3: Create Compact Policies from P3P Privacy and export them to the HTTP response

In many protocols, such as HTTP, nonstandard parameters are distinguished from standard parameters by prefixing them with X-, making it possible to extend those nonstandard parameters. However, this unsophisticated approach was harmful and Deprecating the “X-” Prefix and SimilarConstructs in Application Protocols “was proposed to be discontinued. However, for the X- prefix that is already in use, it should not be required to change.