preface

Recently, I read “Illustrated HTTP protocol” in my spare time, and received a lot of goods. After reading this book, I was suddenly enlightened about the places I didn’t understand before. So I spent some time to summarize, which I also looked up some other information to supplement, I hope this article can bring help to everyone. If you think I write good words, but also hope you a lot of collection point in support oh!

Network based

TCP/IP layer

The TCP/IP system is divided into four layers and five layers, depending on the book. The difference between Layer 5 and Layer 4 is that the data link layer and physical layer of Layer 5 correspond to the network interface layer of Layer 4. Both of them are right. Don’t bother. Just look at them. If the physical layer is divided according to protocol, it is not necessary to separate the physical layer, after all, there is no protocol. This paper adopts a four-tier structure.

The name of the role
The application layer Protocols such as HTTP, FTP, and DNS reside on this layer and are used to provide application services to users.
The transport layer The main protocols at this layer are TCP and UDP, which provide end-to-end communication for application layer entities.
The network layer The core of this layer is the IP protocol, which defines the path through which to reach each other’s computers and send packets to each other. ARP and RARP protocols also reside at this layer.
Network interface layer Responsible for receiving IP packets and sending them over the network, or receiving physical frames from the network, extracting IP packets and delivering them to the network layer. Common protocols at this layer are Ethernet 802.3 and Token Ring 802.5.

TCP/IP traffic

When the TCP/IP protocol family is used to communicate with each other, the communication is hierarchical and sequential. The sending end goes down from the application layer, and the receiving end goes up from the bottom. When transmitting data from layer to layer, the sender will type a header corresponding to the layer each time it passes through the layer. On the other hand, when the receiver transmits data from layer to layer, it will remove the corresponding header information after each layer.

TCP three-way handshake

TCP is located at the transport layer and its purpose is to improve reliable byte stream service. In order to send data to the target accurately, TCP adopts the three-way handshake policy to ensure its reliability. The overall process is as follows:

The sender first sends a packet with the SYN flag to the peer. After receiving the packet, the receiving end sends a packet with the SYN/ACK flag to confirm the packet. Finally, the sender sends back a packet with an ACK flag, indicating the end of the handshake.

The TCP three-way handshake, I believe there are many partners have such a question. Why do you have to shake hands three times to be reliable, not once or twice? Let’s take a simple example to illustrate this problem.

Scene description: A pair of good gay friends lost contact after graduation from high school. Ten years later, gay friend A got the contact information of gay friend B through some means. He is going to make a phone call to gay friend B to catch up and rekindle the love of the gay friend who picked up soap in those days.

Scenario analysis: In this scenario, gay friend A is the client, and gay friend B is the server. Now the question is, how do you make sure that gay friend A and gay friend B can smoothly rekindle their soapy relationship? From gay friend A’s point of view, I have to make sure I’m on the phone with gay friend B. From gay friend B’s point of view, I have to make sure I’m on the phone with gay friend A. Only when both parties can be sure that the other is on the other end of the phone can the romance be rekindled. Let’s analyze the three handshakes in this dimension.

The first time shake hands: gay friend a dial telephone, said 1 “feed, excuse me is gay friend b?” This is similar to the client sending a packet with the SYN flag to the server. This sentence is issued, gay friend A is not sure that the other party is gay friend B, the same gay friend B is not sure that the other party is gay friend A.

Second handshake: gay friend b picked up the phone and said, “I’m gay friend B, who are you?” (This is similar to a server sending back a packet with the SYN/ACK flag as an acknowledgement.) As soon as this sentence is issued, gay friend A can confirm that the other person is gay friend B, but gay friend B can not confirm that the other person is gay friend A.

The third time shake hands: gay friend a confirmed that the other end of the phone is gay friend b, very excited, returned a “you don’t know me? I’m gay friend A “(this is like sending back a packet with an ACK flag). Once this sentence is issued, then gay friend B can be based on this sentence to confirm that the other end of the phone is gay friend A. In this case, both parties can confirm that they are the other party. Meet the conditions you need to rekindle your soapy feelings, and start rekindle your gay friendship.

Through the analysis of three handshakes, we find that only three handshakes can guarantee the reliability of communication, while two handshakes cannot. I hope this gives you a little bit of an understanding of the three-way handshake. ##TCP four breakups above we talked about TCP establish a connection, after the connection is bound to encounter the problem of disconnection. TCP requires four breaks to disconnect. Many people may not understand this problem. Once the client sends a disconnection request, and once the server receives it and sends back a confirmation message. Why break it four times when you can break it twice? The TCP protocol is in full duplex mode. What is full duplex? Full duplex simply means that two-way data can be transmitted simultaneously. Therefore, the client is not only the sender, but also the receiver. The server is not only the receiver, but also the sender. So, we split the TCP connection into two one-way connections. As follows:

1. The client is the sender and the server is the receiver.

2. The server is the sender and the client is the receiver.

For each disconnection, we need a sender to send a disconnection request, a receiver to confirm the disconnection request, and two breakups. Two connections makes two, and a total of four breakups. The whole process is as follows:

First break: The client sends a FIN packet to disconnect the client from the server. At this point, the client is in FIN_WAIT_1 state, that is, there is no data to send and is waiting for confirmation that the server is disconnected.

Second break up: After receiving a FIN packet from the client, the server sends an ACK packet agreeing to the client’s request. In this case, the server enters the close-wait state to CLOSE the connection. When the client receives the ACK packet from the server, the client closes the connection for sending data to the server, and the client enters the FIN_WAIT_2 state.

Third breakup: After the client disconnects from the server, the server sends a FIN packet to the client requesting to disconnect the connection. The server enters the LAST_ACK state and waits for confirmation from the client.

Fourth break up: After receiving a FIN packet from the server, the client sends an ACK packet agreeing to close the connection. At this point, the client enters the TIME_WAIT state. After receiving the ACK packet from the client, the server closes the connection for sending data to the client, and the server enters the CLOSED state. The client is still in the waiting state. Quite simply, the client will automatically enter the CLOSED state after waiting two MSLS.

The difference between URIs and urls

When it comes to URLS and URIs, the tiger and the mouse really don’t know the difference. Urls are called Uniform resource locators, and URIs are called uniform resource identifiers. Although the names are similar, it’s easy to tell them apart. Urls and URIs are analogous to zip codes and shipping addresses. The scope of a URL is greater than a URI. Let’s take taobao as an example. The domain name https://www.taobao.com/ is the URL, and the address of each product is a URI.

Many AJAX tools, such as jquery’s $. AJAX method, use the URL as the address parameter name, but we need to make sure that the HTTP request address is a URI, not a URL. # Simple HTTP protocol

HTTP request methods

The name of the describe The lowest supported protocol version
GET Request a resource on the server. 1.0
POST Submit data to the specified resource to process the request, and the data is contained in the request body. 1.0
HEAD This parameter is used to confirm the validity of the URI and the date and time of resource update. Only the header of the packet is returned. 1.0
PUT To transfer the file, the file content is put into the packet body and saved to the specified URI location. 1.1
DELETE In contrast to PUT, requests the URI to delete the specified resource. 1.1
OPTIONS Query the supported methods for the resource specified by the request URI. 1.1
TRACE Used to track a path. When requests are sent, the header field max-forwards specifies a value which is reduced by one for each server moved through. When the value is 0, the transmission is stopped and the server response is finally received. 1.1
CONNECT It is used to establish a tunnel when communicating with the proxy server to realize TCP communication using the tunnel protocol. 1.1

The HTTP protocol 1.1

A persistent connection

In version 1.0, A TCP connection was disconnected for every HTTP communication. The cost of creating TCP connections is high, and frequent opening and closing of TCP connections greatly increases the overhead and affects performance. The method of persistent connection was introduced in version 1.1. If either end does not specify disconnection, the TCP connection is maintained.

Pipeline mechanism

Prior to version 1.1, in a TCP connection, a client sent a request and had to wait for the server to respond before sending the next request, instead of sending multiple requests in parallel. The 1.1 pipelining mechanism solves this problem by allowing clients to send multiple requests in parallel instead of waiting for a response from the server.

Add HOST

In version 1.0, each physical server has a unique IP address. So, in version 1.0, there was no concept of a hostname. But with the development of Web technology, a physical server can have multiple virtual hosts, they share the same IP address. To solve this problem, the HOST field was created.

Block transfer encoding

In version 1.1, there were multiple responses within a TCP connection. How to tell which packet corresponds to which response becomes a problem. In version 1.1, the content-Length field marks the data Length of the response.

For example, content-Length: 2333 indicates that the Length of the response is 2333 bytes. The following bytes do not belong to the response

The prerequisite for using content-Length is to know the entire Length of the data. Therefore, when a data block is generated, it cannot be immediately transmitted to the client. It can only be sent after all data is generated. This waiting time certainly affects performance. To address this drawback, version 1.1 proposed a “chunk-transfer Encoding” solution, with a transfer-encoding field in the response header that tells the client that the response is made up of an indeterminate number of blocks, each of which has a hexadecimal value to indicate its length. Finally, the data block with the length of 0 indicates that the data of this response is sent.

Add the 100 status code

As the complexity of Web applications increases, the server often adds permission control. If a client sends an HTTP request and the request comes in with a lot of data, the server doesn’t have permission to call it back. So that creates unnecessary spending. After the 100 status code is introduced, the client sends an HTTP request with a partial request body. If the response code of the server is 100, the client sends another HTTP request with the remaining request body. Otherwise, subsequent HTTP requests with the remaining request body are cancelled.

The HTTP protocol 2.0

Binary framing

In HTTP 2.0, there will be an additional binary framing layer between the application layer and the transport layer. At the binary framing layer, HTTP 2.0 splits all transmitted information into smaller frames and encodes them in binary format. In the previous http1. x version, the HTTP header is encapsulated in the Headers frame, whereas our HTTP header is encapsulated in the Data frame. We used to transmit HTTP packets as a unit, but now HTTP packets are divided into multiple frames, and these frames can be sent out of order. We only need to reassemble them according to the stream identifier at the beginning of each frame. This greatly improves HTTP performance.

multiplexing

Multiplexing allows multiple request-response messages to be sent simultaneously over a single TCP connection connection. In HTTP/1.1, there is a limit on the number of requests that can be received by clients under the same domain name at the same time. Requests exceeding the limit are blocked. In response to this, 2.0 has adopted multiplexing mechanisms that allow multiple request-response messages to be sent from a single HTTP/2 connection rather than relying on multiple TCP connections.

The first compression

Each HTTP request has a request header, which puts some important information, such as Cookie, User Agent, etc. fields, which are the same in each request, but must be carried. This leads to unnecessary waste. In 2.0, the header compression mechanism is introduced to optimize this point. The client and server will maintain the same header information table. Each request only needs to send the index number, and there is no need to carry the redundant key-value on the request header, which greatly reduces unnecessary waste.

Server push

In versions prior to 2.0, the server was passive and could only send resources if the client sent the request. In the 2.0 protocol, the server can proactively send resources to the client. For example: the client requests an HTML, which needs JS and CSS completely do not need the client after parsing the HTML to request these content so troublesome, the server can be in the client request HTML together back.

Use cookies to manage state

HTTP is a stateless protocol, meaning that it does not manage the status of previous requests and responses. That is, the request cannot be processed based on the previous state. For example, if we visit a web page that requires login authentication and we do not manage its login status after login, we will need to re-log in each time we request a new page. It’s a bad experience. In order to solve this embarrassment, the technology of Cookie was introduced.

In the state without Cookie information: The client sends an authentication request to the server. After the server passes the authentication, it adds the authentication information to the Cookie and returns it to the client. The client takes the Cookie and stores it locally.

After the second request (with Cookie information state) : when the client requests again, it will add the previously existing local Cookie and send it to the server. The server will verify the Cookie according to the authentication information brought by the Cookie. The server will directly return the data for verification, otherwise it will jump to the login page.

Since cookies are stored in the browser, JS can access cookies. We can also use cookies locally to do some storage operations. The following is attached with the operation code of COOKIES in JS.

/ / write cookiesfunction setCookie(cname, cvalue, exdays) {  
    var d = new Date();  
    d.setTime(d.getTime() + (exdays*24*60*60*1000));  
    var expires = "expires="+d.toUTCString();  
    document.cookie = cname + "=" + cvalue + "; "+ expires; } // Read cookiesfunction getCookie(cname) {  
    var name = cname + "=";  
    var ca = document.cookie.split('; ');  
    for(var i=0; i<ca.length; i++) {  
        var c = ca[i];  
        while (c.charAt(0)==' ') c = c.substring(1);  
        if (c.indexOf(name) != -1) return c.substring(name.length, c.length);  
    }  
    return ""; } // Clear cookiesfunction clearCookie(name) {    
    setCookie(name, "", 1); }Copy the code

HTTP Information contained in HTTP packets

HTTP packet Structure

The information used for HTTP interaction is called HTTP packets. HTTP packets sent by the requesting end (client) are called request packets, and those sent by the responding end (server) are called response packets. The HTTP message itself is a string text composed of multiple lines of data (using CR+LF as a newline character). HTTP packets can be divided into request lines, response lines, packet headers, and packet bodies. The two are separated by the initial blank line (CR+LF). The message header is the key to grasp, and we will elaborate it in a large length in the later part of the article, which will be covered here.

A request line consists of three parts: request method, URI address, and HTTP version.

Such as:

POST https://segmentfault.com/api/article/draft/save HTTP / 1.1Copy the code

The response line consists of the HTTP version number, HTTP status code, and status description.

Such as:

HTTP / 1.1 200 OKCopy the code

HTTP can be used to transfer data as it is, but it can also be encoded to increase the transfer rate during transmission. A large number of access requests can be efficiently handled by encoding at transport time. However, the operation of coding requires the computer to complete, so it will consume more RESOURCES such as CPU.

Compressed transmission of content encoding

When adding an attachment to a message to be sent, in order to reduce the size of the message, we ZIP the file first and then add the attachment to send. A feature of the HTTP protocol called content encoding can do something similar.

Content encoding specifies the encoding format to be applied to entity content and keeps entity information compressed as is. The encoded entity is received and decoded by the client.

Common content encoding formats:

  • gzip
  • compress
  • deflate
  • identity

Split transmit block transmission code

We covered this in this article when we introduced version 1.1 of the HTTP protocol. You can understand the whole process by combining this picture again.

A collection of multipart objects that send a variety of data

When we send an email, we can attach pictures, videos and other data to the email. This feature benefits from the MIME mechanism, which allows mail to handle different types of data, such as text, images, and video. MIME uses a multipart collection of objects to hold multiple copies of different types of data.

When we upload data such as pictures or text files under the FRAMEWORK of HTTP protocol, we also adopt this method of multi-part object collection.

The multi-part object collection contains the following objects:

  • multipart/form-data

    Web form for uploading files

  • Multipart/Byteranges Status code 206 Used when the response packet contains multiple ranges of content. Such as:

content-type:multipart/form-data; Boundary = WebKitFormBoundary5V53Jp7BUFBGzu9B / / note: boundary specify more divided into part of the collection of objects of start-stop operatorCopy the code
--WebKitFormBoundary5V53Jp7BUFBGzu9B
Content-Disposition: form-data; name="image"; filename="Table 2: Values scale. Numbers"Content-Type: Application/x - iwork - keynote - sffnumbers - WebKitFormBoundary5V53Jp7BUFBGzu9B - / / tag: - WebKitFormBoundary5V53Jp7BUFBGzu9B / / the end of the tag: - WebKitFormBoundary5V53Jp7BUFBGzu9B -Copy the code

Scope request to get part of the content

In the long before the Internet technology is not well developed, such as we download a game a little bit big, what if she met interrupted downloads, you must download again from scratch, that kind of pain in the era of high-speed network is now unable to realize, of course I also is unable to realize, because I was very young, ha ha ha! To address this pain, range request technology was developed so that interrupted downloads could be resumed.

When performing a Range request, the header field Range is used to specify the byte Range of the resource.

Range:bytes=5001-10000 5001-10000 2.Range:bytes=5001- all contents from 5001 bytes 3.Range:bytes=-3000,5000-7000 Multiple ranges from 0 to 3000 bytes and from 5000 to 7000 bytesCopy the code

For this range request, we said above that the response will return status code 206. One for multiple range of request, the response will be used first field the content-type: multipart/byteranges.

Content negotiation returns the most appropriate content

Under the tide of globalization, a large number of international companies have been born. For example, YOUTUBE, an internationally famous video website, has been logged in by netizens from all over the world every day. One problem is that different countries speak different languages. YOUTUBE cannot be the same for everyone and use English, which is not in line with the image of an internationally famous company. So YOUTUBE is unlikely to do that, but how do you make sure that people in different countries can respond to websites in different languages? In order to solve this problem, the content negotiation mechanism came into being. If our browser is set up in simplified Chinese, YOUTUBE will show us simplified Chinese when we visit it, and so on.

The content negotiation mechanism is where the client and server negotiate the response resources and then provide the resources that are most suitable for the client.

Some header fields of the request packet are used as the criteria for judgment, as follows:

  • Accept
  • Accept-Charset
  • Accept-Encoding
  • Accept-Language
  • Content-Language

There are three types of content negotiation techniques:

  • Server-driven negotiation (server-side content negotiation)
  • Client-driven negotiation (client-side content negotiation)
  • Transparent negotiation (server – and client-side content negotiation)

Returns the HTTP status code of the result

The status code is responsible for describing the returned request results when the client sends a request to the server. The status code lets the user know whether the server handled the request normally or if an error occurred. Here we list the common HTTP status codes for those who are interested. Here we attach a complete HTTP status code address: https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Status

1XX

100 Continue

The client should continue to send requests. This temporary response is used to inform the client that some of its requests have been received by the server and have not yet been rejected. The client should continue to send the rest of the request, or ignore the response if the request is completed. The server must send a final response to the client after the request completes.

101 Switching Protocols

The server has understood the client’s request and will inform the client via the Upgrade header that a different protocol is used to complete the request. After sending the blank line at the end of this response, the server will switch to the protocols defined in the Upgrade header.

2XX

200 OK (Success)

Indicates that the request from the client is processed on the server

204 No Content

The server successfully processed the request, but returned a response message with no physical content. For example, if there is an A tag on the page, its href attribute is set to HTTP-204.html. Clicking on the A tag will normally redirect to HTTP-204.html. However, if the response code of HTTP-204.html is 204, the page does not jump and stays on the current page.

You should be prepared for Partial Content.

This status code, mentioned earlier in the scope request, indicates that the client made the scope request and that the server successfully executed that part of the GET request. The response message contains the entity Content in the Range specified by content-range.

3XX

301 Moved Permanently Permanently

This status code indicates that the requested resource has been assigned a new URI and that the URI to which the resource now refers should be used later. The header field of the response message marks the new URI with the Location field.

302 Found (Temporary Redirection)

This status code indicates that the requested resource has been temporarily assigned a new URI. Since such redirects are temporary, the client should continue to send future requests to the original address. Similarly, location, the header field of the response message, is marked with the new URI assigned to it.

303 See Other

This status code indicates that the response to the current request can be found at another URI, and that the client should access that resource using GET. One thing to note here is that 303 explicitly states that the GET method is required. As above, the response header field location is also marked with the new URI assigned to it.

304 Not Modified

This status code indicates that the requested resource is not modified. When the server returns this status code, no resource is returned. Clients typically cache accessed resources by providing a header indicating that the client wants to return only resources that have been modified after a specified date.

307 Temporary Redirect (Temporary Redirect)

It’s the same thing as 302 Found. Why did 307 show up? The wish is that while 303 forbids POST to become GET, which is often the case, 307 will strictly abide by the standard and will not change POST to GET. Can be useful in POST redirection.

4XX

400 Bad Request

The status code indicates that the current request cannot be understood by the server due to syntax errors. You need to modify the status code and send the request again. During development, people generalize 400 to problems on the front end, and then the back end ignores them. In most cases, the syntax error is not the front-end writing problem, but some field types in the front end do not cooperate with the back end, for example, the back end accepts the Long or Date type, while the front end passes the String. Or maybe the front end passes json to the back end, and the back end doesn’t parse the JSON. In many cases, these problems are caused by poor coordination between the front and back ends, rather than purely front-end problems.

401 Unauthorized (Unauthorized)

This status code indicates that the current request requires user authentication. The response must include a WWW-Authenticate header applicable to the requested resource to ask for user information.

403 Forbidden

This status code indicates that access to the requested resource has been denied by the server. This is most common in crawlers, where they find you crawling their site’s data and restrict your IP, resulting in 403.

404 Not Found

The status code indicates that the requested resource was not found on the server. This happens all the time, like a shaky hand, a typo or the website you visited being blocked.

405 Method Not Allowed

This status code indicates that the request method specified in the request line cannot be used to request the corresponding resource. The response message must return an Allow header representing a list of request methods that the current resource can accept.

5XX

500 Internal Server Error

This status code indicates that the server has made an error while executing the request, either because of a bug in the code or some temporary glitch.

503 Service Unavailable

This status code indicates that the server is temporarily overloaded or is down for maintenance and cannot process requests at this time.

The HTTP header

When we talked about the structure of HTTP packets, we said that the HTTP header is a very important piece, and we will talk about it at length. We will focus on this question here.

First of all, we need to determine the composition of the packet head. The packet head can be request head (request header) and response head (response header). The request head can be divided into request head field, general head field, entity head field and other extended head field, while the response head can be divided into response head field, general head field, entity head field and other extended head field. It can be concluded from the above that:

Packet header = Request header field/response header field + general header field + entity header field + other extended header field

Next, we will start from the request header field, response header field, general header field and entity header field. Due to too many fields, we only select the more common header field to elaborate. The full version of the HTTP header was able to see, enclosed address: https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Headers # # request header field

Accept

This field is used to specify which types of information the client accepts.

Accept:text/html,application/xhtml+xml,application/xml; Q = 0.9, image/webp image/apng, * / *; Q = 0.8Copy the code

Q = in the previous example to give extra weight, use (;) Divide it up. The value of q ranges from 0 to 1 (accurate to 3 decimal places), and 1 is the maximum value. If q is not specified, the default weight is Q =1.0. Alternatively, you can use (*) as a wildcard to specify any type of information.

Accept-Charset

This field is used to specify the character set accepted by the client.

Accept-Charset:iso-8859-15,unicode-1-1; Q = 0.8Copy the code

The effect of Q here is the same as that of Accept, so I will not repeat it. The wildcard (*) is also consistent.

Accept-Encoding

Used to specify the content encoding acceptable to the client.

Accept-encoding:gzip, deflate; Q = 0.9, brCopy the code

Here we can also use q to indicate its priority, the same as Accept. In addition, the wildcard character (*) is also consistent.

Accept-Language

Used to specify the set of natural languages (Chinese, English, etc.) acceptable to the client.

Accept-Language:zh-CN,zh; Q = 0.9, en. Q = 0.8Copy the code

The weight value q is the same as Accept, and the wildcard character (*) is the same.

Authorization

It is used to prove that a client has the right to view a resource. Typically, the user agent that wants to authenticate with the server will add the first field Authorization to the request after receiving the return 401 status code response.

Authorization: Basic YWxhZGRpbjpvcGVuc2VzYW1l
Copy the code

Expect

Used to specify an expected condition and tell the server that the request can be properly processed only if this expected condition is met. HTTP/1.1 specifies only one expected condition, namely:

Expect: 100-continue
Copy the code

The server returns either a status code of 100 indicating that the expected condition in the request header was met or a status code of 417 indicating that the expected condition in the request header was not met.

From

The E-mail address used to tell the server the actual user of the user agent that sent the request. For example, if you are running a bot agent (such as a crawler), the Form header should be sent with the request, so that the site administrator can contact you if the server encounters a problem, such as the bot sending excessive, unwanted, or illegal requests.

From: [email protected]
Copy the code

Host

Specifies the domain name of the server (to distinguish virtual hosts) and, optionally, the TCP port number on which the server listens. The port number is optional; if no port is specified, the default port number for the requested server is automatically invoked.

Host: developer.cdn.mozilla.net
Copy the code

If-Match

Request header fields, such as if-xxx, are conditional requests. When the server receives a conditional request, it will execute the request only if it determines that the specified condition is true.

If-match: informs the server of the ETag value used to Match the resource. The server compares the field value of if-match with the ETag value of the resource. The request is executed only when the two values are consistent. Comparisons between eTags use a strong comparison algorithm, meaning that two files are considered identical only if every byte is identical. Adding a W/ prefix to ETag indicates a relatively loose algorithm.

If-none-match, which is the opposite of if-match.

If-Match

If-Match: "bfc13a64729c4290ef5b2c2730249c88ca92d82d"
If-Match: W/"67ab43"."54ed21"."7892dd"
If-Match: *
Copy the code

If-None-Match

If-None-Match: "bfc13a64729c4290ef5b2c2730249c88ca92d82d"
If-None-Match: W/"67ab43"."54ed21"."7892dd"
If-None-Match: *
Copy the code

If-Modified-Since/If-Unmodified-Since

If-modified-since, the server will only return the requested resource If its contents have been Modified Since the date and time given by if-modified-since, with status code 200. If unmodified, a 304 response is returned without the message body, with the Last modification time in the last-Modified header.

If-unmodified-since, which is the opposite of if-modified-since

If-Modified-Since

If-Modified-Since: Wed, 21 Oct 2015 07:28:00 GMT
Copy the code

If-Unmodified-Since

If-Unmodified-Since: Wed, 21 Oct 2015 07:28:00 GMT
Copy the code

If-Range

This field is used in conjunction with the Range field. The Range header field takes effect only when the conditions in the if-range field value are met, and the server replies with the status code 206 (part of the content) and the content specified in the Range field. If not, the server will return the status code 200 and the full requested resource.

If-Range: Wed, 21 Oct 2015 07:28:00 GMT
Copy the code

Max-Forwards

Back in the TRACE method, we talked about the Max-forward field. When communicating using HTTP, requests may pass through multiple proxy servers. On the way, if the proxy server fails to forward the request for some reason, we do not know which proxy server failed. After the max-forwards field is set, every time you move through a proxy server, max-forwards is reduced by 1. After zero, no forwarding is performed. So we can easily figure out which proxy server is the problem.

Proxy-Authorization

The function of this field is the same as that of Authorization. The difference between them is that proxy-authorization is used for authentication between client and Proxy server, while Authorization is used for authentication between client and server

Proxy-Authorization: Basic YWxhZGRpbjpvcGVuc2VzYW1l
Copy the code

Range

This field is used for Range requests that fetch only part of the resource, and the Range field specifies its Range. If the scope request is successfully processed, status code 206 (partial content) is returned. If the specified Range exceeds the limit, return the status code 416 Range Not Satisfiable. Scope valid but not successfully processed, return status code 200 and respond to all resources.

Range: bytes=200-1000, 2000-6576, 19000-
Copy the code

Referer

The address of the source page of the currently requested page, which indicates that the current page is accessed through a link in this source page.

Referer: https://developer.mozilla.org/en-US/docs/Web/JavaScript
Copy the code

TE

This field is used to tell the server which transport encoding and relative priority the client can process the response. Not to be confused with Accept-encoding, TE is used for transmission Encoding, while accept-encoding is used for content Encoding.

TE: trailers, deflate; Q = 0.5Copy the code

User-Agent

The application type, operating system, software developer, and version number of the browser or user agent that initiates the request.

The user-agent: Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36 NameCopy the code

Response header field

Accept-Ranges

Used to tell the client whether the server can handle scope requests and, if so, define the units of scope requests. There are two possible values: None and bytes.

Accept-ranges: bytes // The unit for range requests is bytes Accept-ranges: None // Range requests are not supportedCopy the code

Age

When the cache server responds to a request with its own cached resource, this header identifies the length of time, in seconds, that the resource has been cached by the cache server.

Age:600
Copy the code

ETag

A unique identifier that the server assigns to each resource. When the resource is updated, the ETag is updated accordingly. Etags are classified into strong ETAGS and weak Etags.

Etag:"33a64df551425fcc55e4d42a148795d9f25f89d4"// No matter how small the entity changes, it changes its value. ETag: W/"0815"// Change the ETag value only when the resource is fundamentally changed, resulting in a difference. Field values start with W/ appendedCopy the code

Location

Specify the address to redirect the page to, which is typically meaningful only in response codes 3xx.

Location:http//www.haimaiche.com/index.html
Copy the code

Retry-After

Tell the client how long it should take to send the request again. It works with 503 Service Unavailable or 3XX Redirect responses. The value can be a specific date and time or the number of seconds after the response is created.

Retry-After: Wed, 21 Oct 2015 07:28:00 GMT
Retry-After: 120
Copy the code

Server

Contains information about the software used by the server.

Server: Apache/against 2.4.1 (Unix)Copy the code

vary

Used to control the cache. If caching is required after receiving a response from the proxy server that contains the Vary specified item, only requests with the same Vary specified header field in the request are returned to the cache. Even if a request is made for the same resource, because the header field specified by Vary is different, the resource must be retrieved from the source server.

Vary:Accept-Language
Copy the code

Note: When the proxy server receives a request for a resource with the Vary header field specified, it returns the response directly from the cache if the accept-language field is the same value. Otherwise, the resource must be retrieved from the source server before being returned as a response.

WWW-Authenticate

Defines what authentication method to use to obtain connections to resources.

WWW-Authenticate: Basic realm="Access to the staging site"
Copy the code

Generic header field

Cache-Control

Specifies directives that control cache behavior. Instructions can be multiple choices, use “, “split.

Cache-Control: no-cache, no-store, must-revalidate
Copy the code

Cache request instruction

instruction parameter instructions
no-cache There is no Force revalidation to the source server
no-store There is no No content of the request or response is cached
no-transform There is no Agents cannot change media types
only-if-cached There is no Fetch resources from the cache
max-age= necessary The maximum Age value of the response
max-stale[=] Can be omitted Accept the response that has expired
min-fresh= necessary The expected response within the specified time is still valid

Cache response instruction

instruction parameter instructions
no-cache There is no The cache must be validated before being cached
no-store There is no No content of the request or response is cached
no-transform There is no Agents cannot change media types
public There is no Caching of responses can be provided to any party
private There is no Returns a response only to a specific user
must-revalidate There is no Cacheable but must be validated with the source server
proxy-revalidate There is no The intermediate cache server is required to validate the cached response
max-age= necessary The maximum Age value of the response
s-maxage= necessary The maximum Age value for the public cache server response

Connection

Determines whether the network connection will be closed after the current transaction completes. After Http 1.1, the default is keep-alive (persistent connection) and 1.0 is close (non-persistent connection).

Connection: keep-alive
Connection: close
Copy the code

Date

Specify the creation date and time of the HTTP packet.

Date: Wed, 21 Oct 2015 07:28:00 GMT
Copy the code

Pragma

Used for backward compatibility with HTTP/1.0 only cache servers

Pragma: no-cache // unique formCopy the code

Trailer

Allows the sender to add additional meta information to the end of a message sent in blocks, often used in block transfer encoding.

HTTP/1.1 200 OK Content-type: text/plain... Transfer-Encoding: chunked Trailer: Expires ... (Packet body)... 0 Expires: Wed, 21 Oct 2015 07:28:00 GMTCopy the code

In the above use case, the value of the header field Trailer is specified as Expires, and the header field Expires appears after the packet body (after the block length is 0).

Transfer-Encoding

Specifies the encoding mode used for the packet body.

Transfer-Encoding: chunked
Copy the code

Via

Tracing the transmission paths of request and response packets between clients and servers. When a packet passes through a proxy or gateway, information about the server is appended to the header field Via before it is forwarded. Often used with the TRACE method.

Via: 1.0 fred, 1.1 p.example.net
Copy the code

Warning

A warning to inform the user of some cache-related problems.

Warning: < Warning code > < Warning host: port number > < Warning for don't put > [< date and time (optional) >] Warning: 112 GW.hacker. jp:8080"cache down" " Wed, 21 Oct 2015 07:28:00 GMT"
Copy the code

Warning clock

The warning code Warning content instructions
110 Response is Stale The response provided by the cache server has expired (the set expiration time has expired).
111 Revalidation Failed Response validation failed because the server could not be accessed.
112 Disconnected Operation The cache server is disconnected.
113 Heuristic Expiration If the cache server uses a heuristic, set the cache to be valid for 24 hours and send the response when it is older than 24 hours.
199 Miscellaneous Warning Arbitrary warning message.
214 Transformation Applied Added by the proxy server if it makes any transformations to the returned presentation content, such as changing the content encoding, media type, etc.
299 Miscellaneous Warning Response validation failed because the server could not be accessed.

Entity head field

Allow

Tell the client which HTTP methods the resource supports.

Allow: GET, POST, HEAD
Copy the code

Content-Encoding

Inform the client of the content encoding method chosen by the server for the body of the entity.

Content-Encoding: gzip
Copy the code

Content-Language

Tells the client the natural language used by the entity body.

Content-Language: zh-CN
Copy the code

Content-Length

Indicates the size of the body part of the entity, in bytes.

Content-Length:15000
Copy the code

Content-Location

Indicates the URI of the returned data, which is used to specify the result of content negotiation of the resource to be accessed.

Content-Location:http://www.hacker.jp/index.html
Copy the code

Content-Range

It is mainly used for scope requests to inform the client of the content scope of the currently sent part as well as the overall entity size.

Content-Range: bytes 200-1000/67589
Copy the code

Content-Type

Indicates the media type of the object within the entity body.

Content-Type: text/html; charset=utf-8
Copy the code

Expires

Tell the client the expiration date of the resource. After the cache server receives a response containing the header field Expires, a copy of the response is stored until Expires is specified. Instead, the cache server requests resources from the source server. Expires is ignored when cache-control has a max-age or s-maxage directive.

Expires: Thu, 01 Dec 1994 16:00:00 GMT
Copy the code

Last-Modified

Indicates the final modification time of the resource.

Last-Modified: Wed, 21 Oct 2015 07:28:00 GMT
Copy the code

Other expand header field

Set-Cookie

Used to send cookies from the server to the client

Set-Cookie: id=a3fWa; Domain=somecompany.co.uk; Path=/ Expires=Wed, 21 Oct 2015 07:28:00 GMT; Secure; HttpOnlyCopy the code
  • Domain: specifies the host name to which cookies can be sent
  • Path: specifies the Path to which cookies can be sent
  • Expires: Specifies the expiration date of a Cookie
  • Max-age: The number of seconds after the Cookie Expires. Ie8 and below does not support this attribute. Max-age has a priority over Expires
  • Secure: Cookies are sent only for Secure HTTPS communication
  • HttpOnly: prevents JS from getting cookies

Cookie

Contains cookies previously placed by the server through the set-cookie header and stored to the client.

Cookie: PHPSESSID=298zf09hf012fh2; csrftoken=u32t4o3tb3gg43; _gat=1;
Copy the code

DNT

At the head of the HTTP request. Do Not Track. Represents a way of refusing to be tracked by precise advertising.

DNT:0; // Allow the target site to track user personal information. DNT:1; // Do not allow the target site to track user personal information.Copy the code

X-Frame-Options

It is mainly located in the HTTP response header and is used to control the display of Web content within the Frame tag of other Web sites. Its main purpose is to prevent clickjacking attacks.

There are three values:

  • DENY: indicates that the page is not allowed to be displayed in the frame, even if the page is nested in the same domain name.
  • SAMEORIGIN: indicates that the page can be displayed in the frame of the same domain name page.
  • Allow-from URI: indicates that the page can be displayed in the frame FROM the specified source.
X-Frame-Options: DENY
X-Frame-Options: SAMEORIGIN
X-Frame-Options: ALLOW-FROM http://caibaojian.com/
Copy the code

X-XSS-Protection

When a cross-site scripting attack (XSS) is detected, the browser stops loading the page. For modern browsers, the more powerful Content-security-Policy is preferred. For more information about content-security-policy, please refer to this article “Content Security Policy Introduction” by Ruan Yifong.

X-xss-protection: 0 // Disables XSS filtering. X-xss-protection: 1 // Enable XSS filtering (usually the default for browsers). If a cross-site scripting attack is detected, the browser wipes out the page (removing unsafe portions). X-XSS-Protection: 1; Mode =block // Enable XSS filtering. If an attack is detected, the browser will not clear the page, but block the page from loading. X-XSS-Protection: 1; Report =<reporting-uri> // Enable XSS filtering. If a cross-site scripting attack is detected, the browser clears the page and sends a violation report using the functionality of the CSP report-URI directive.Copy the code

Verify HTTPS for Web security

Speaking of HTTPS, I’m sure you’re all familiar with it. What is HTTPS? In other words, HTTP is a secure version, with security controls added on top of HTTP. What are the security issues with HTTP? How is HTTPS secure?

HTTP shortcomings

Communications using plaintext may be eavesdropped

HTTP does not provide encryption, so HTTP packets are sent in plain text. And the Internet as a whole those network devices can’t be your personal, which can’t rule out malicious snooping at some point.

Even if you deliberately encrypt it with symmetric encryption before sending it, the recipient will need to decrypt it every time, not to mention efficiency. Does that really keep it safe? The answer is not guaranteed. In order to decrypt the ciphertext, the recipient must first know how it is encrypted or what is the key? This needs to be sent from the sender to the receiver, and the process of sending the key or encryption method is still in danger of being eavesdropped, so it is impossible to guarantee the security of data only by encrypting it.

Failure to verify the identity of the communicating party may expose you to impersonation

The HTTP protocol does not acknowledge the communicator. No matter who sends the request, a response is returned. There may be a disguised client or server, and some resources that should have access to control may be stolen. Dos attack is to make use of the vulnerability of HTTP protocol without communication party confirmation to send massive meaningless requests, which exceed the load of the server and lead to server crash and breakdown.

Message integrity could not be proved and may have been tampered with

When we talked about the Internet, with devices that don’t belong to you, it’s hard to guarantee that someone won’t take your information and tamper with it. The file sent from the server to the client cannot be consistent with the file received by the client.

HTTP Communication process

Step 1: The client sends a packet to the server containing the SSL/TLS protocol version supported by the client and the encryption component (encryption algorithm and key length) supported by the client. Step 2: The server selects an SSL/TLS protocol and encryption component supported by the client as the SSL/TLS protocol and encryption component to be used in subsequent communication and sends the packet to the client. Step 3: The server then sends another packet containing the public key certificate. The private key that matches the public key stays with the server. Step 4: After receiving the certificate of the public key, the client will confirm its validity to the authority that issued the digital certificate. If it is valid, the system randomly generates a key based on the encryption component negotiated with the server. The key is used to encrypt subsequent packets. The generated key is encrypted with the public key from the server and sent to the server. Step 5: after the server gets the encrypted key from the client, it will decrypt and obtain the key by using the private key it left behind. Step 5: All packets communicated between the client and server are encrypted with this key.

The above process may be difficult to understand on first reading, so let me summarize it again. The whole HTTPS communication can be divided into three phases:

Determine the encryption component through negotiation. 2. Determine the encryption and decryption key. 3

To be clear, the key used for encryption and decryption is not generated by the server, but by the client. We’ll talk about some of the details later.

HTTP+ Encryption + Authentication + Integrity Protection =HTTPS

First of all, let’s be clear: HTTPS is not a new protocol for the application layer. The HTTP communication interface is replaced by SSL or TLS. Whereas HTTP communicates with TCP directly, HTTPS communicates with SSL/TLS first, and then with TCP.

encryption

As mentioned earlier, HTTP is transmitted in clear text, which means that your information can be snooped on by others. Even with symmetric encryption algorithms, there is no guarantee that your key will not be stolen in the process of sending it. In other words, we guarantee the safe transmission of the key, so that the whole communication process is not afraid of prying eyes. HTTPS uses asymmetric encryption. The public key is used for encryption and the private key is used for decryption. The server sends the public key to the client to encrypt the communication key, and then decrypts it with its own private key to obtain the communication key. The most common asymmetric encryption algorithm is RSA. If you are interested, you can learn about it by yourself.

The asymmetric algorithm mentioned above feels like a good thing, the server and client communication messages using asymmetric encryption algorithm is not over, why also around a bend, using the combination of asymmetric encryption + symmetric encryption. HTTPS is designed this way for a reason. Asymmetric encryption algorithms seem nice, but they are too expensive on CPU and memory. Some people have done experiments, under the same number of files encryption and decryption, the overhead of asymmetric algorithm is more than 1000 times that of symmetric algorithm. Thus, this asymmetric algorithm is multi – impact performance. Performance even if tolerable, asymmetric algorithms have a fatal drawback that the length of the encrypted content cannot exceed the length of the public key, which is 2048 bits, or 256 bytes, meaning that the size of the encrypted ciphertext cannot exceed 256 bytes. This is too pit daddy, nowadays the picture on the Internet is just a few thousand bytes, this is really intolerable. If you can endure this, then you are the real diao, is losing.

certification

During HTTP, your message can be hijacked, so it’s hard to keep it from being swapped when sending a public key. To ensure the validity of keys, HTTPS adopts the concept of digital certificates. The entire digital certificate authentication process looks like this:

First, the server operator requests a public key from an authoritative third-party digital certificate authority. After identifying the identity of the applicant, the digital certificate authority performs digital signature operations on the applied public key, assigns the digital signature to the public key, and puts the signed public key in the public key certificate. After obtaining the public key certificate, the client sends a request to the DIGITAL certificate Authority to verify the digital signature on the public key certificate to verify the authenticity of the public key.

With a third-party digital certificate authority, we can ensure that the public key cannot be maliciously tampered with.

Integrity protection

Encryption and authentication are sufficient to ensure that communication packets can not be seen by others, but our packets can be intercepted and tampered with by others. Such as this is complete information like “I don’t want you to be my girlfriend, I want you to be my wife”, it is a very romantic sentence, if be destroyed the only pass the “I don’t want you to be my girlfriend”, the receiver also don’t know if this information is been tampered with, then the event, the young couple bye, a good marriage is so broken. We also protect the integrity of messages for the sake of world peace. HTTPS uses the MAC algorithm to ensure its integrity. The sender sends a packet with a MAC value calculated by the MAC algorithm. After receiving the packet, the receiver calculates a MAC value based on the key and MAC algorithm and compares it with the transmitted MAC value. If the MAC value is the same, the packet is not tampered with. In this way, the receiver knows whether the packet has been tampered with.

HTTPS has disadvantages compared to HTTP

HTTPS is not superior to HTTP in every respect. If that were the case, we wouldn’t be seeing HTTP sites anymore. HTTPS has three major disadvantages compared to HTTP:

Slow speed

HTTP and TCP used to communicate directly, but now there is a third party SSL/TLS in the middle, which is bound to cause a large amount of communication processing, dragging down the speed.

Resources such as CPU and memory are consumed

Frequent encryption and decryption will undoubtedly require more CPU and memory resources to support, resulting in increased load.

For the money

For HTTPS communication, a certificate is essential, and you have to buy it from a certification authority. You can’t give it to a certification authority for free. The first two are acceptable for security, which I personally think is the main reason why some sites don’t use HTTPS. After all, talking about money hurts feelings. Of course, poverty does not mean that there is no way to live, there are always some gods in the silent rescue of our group of poor silk, [” Let’s Encrypt, free and easy to use HTTPS certificate “][15]. In addition to help others hit a small advertisement, [” to put on the website “condom”][16], please remember my name, MY name is Lei Feng!

Authentication that confirms the identity of an accessing user

With the development of Web technology, more and more sites are more refined, some content or operations only a specific user can see or operate. At this point, you need to verify that the person sitting in front of the computer belongs to that particular user. Here’s how HTTP uses authentication.

BASIC authentication

Procedure 1: When the client accesses a resource that requires BASIC authentication, the server returns 401 and adds the first field www-authenticate in the response header. This field contains BASIC authentication and a realm that tells the client which region of the server the resource belongs to. If a realm is not specified, the client usually displays a formatted hostname instead. Flow 2: After receiving the status code 401, the client will connect the user name and password with colon (:) according to the authentication mode (BASIC) specified by www-authenticate in the response header, and then Base64 encoding is processed. Finally, the user name and password are inserted into the Authorization field in the request header and sent to the server.

Here’s an example:

User name: admin Password :123456 Admin :123456 => YWRtaW46MTIzNDU2 Authorization: BASIC YWRtaW46MTIzNDU2Copy the code

Flow 3: After the server receives the Authorization request containing the first field, it confirms its authentication information. If yes, the requested resource is returned.

BASIC authentication is only Base64 encoding without encryption, which may be eavesdropped and stolen by others, resulting in poor security.

DIGEST authentication

Flow 1: When the client accesses a resource that requires DIGEST authentication, the server returns 401. The response header is more complex than Basic mode, www-authenticate: Digest realm myTomcat “=”, qop = “auth”, nonce = “XXXXXXXXXXX”, opaque = XXXXXXXX. Where, QOP auth represents the identification method; Nonce is a random string; The opaque server specifies the value. The client returns the original value. Flow 2: The browser displays a dialog box asking the user to enter the user name and password. The browser performs MD5 calculation on the user name, password, nonce value, HTTP request method, and requested resource URI, and sends the calculated summary to the server. The request header looks like the following, Authorization: Digest username=”xxxxx”,realm=”myTomcat”,qop=”auth”,nonce=”xxxxx”,uri=”xxxx”,cnonce=”xxxxxx”,nc=00000001,response=”xxxxxxxxx”,o Paque = “XXXXXXXXX”. Username is the username. Cnonce is a random string generated by the client; Nc is the number of times of running certification; Response is the summary of the final calculation. ** Process 3: ** the server web container obtains authentication information related to the HTTP packet header, obtains username from it, obtains the corresponding password according to username, and performs MD5 calculation on the combination of username, password, nonce value, HTTP request method, and requested resource URI. The calculation results are compared with response. If they are consistent, authentication succeeds and relevant resources are returned.

DIGEST authentication is more secure than BASIC authentication because it does not transmit passwords. However, if an attacker intercepts your packet, it can use the value of the header field Authorization to disguise the request to the server.

SSL client authentication

When we talked about HTTPS earlier, the server is the one that sends the certificate, and the client is the one that receives the certificate to verify the certificate. Here, the roles are reversed, and the call key is generated on the server, not the client. It is usually not used in isolation, but in conjunction with form validation. SSL client authentication is used to authenticate the client computer, and form authentication is used to authenticate the person sitting at the computer.

Form-based validation

At present, the mainstream websites are using form verification. Its principle is very simple, the client submits the form with user name and password to the background, the background will generate a specific SessionId in the response header field set-cookie, the client receives it and stores it in the local Cookie. All subsequent requests will bring the Cookie, and the server will identify the user according to the SessionId stored in the Cookie.

conclusion

This article is a summary of my reading of the book Scheming HTTP protocol. HTTP protocol we focus on the GRASP of TCP/IP model, HTTP communication process, HTTP status code, HTTP header, HTTPS principle, HTTP and HTTPS their respective advantages. Among them, HTTP status code and HTTP header are the most important parts of this paper, which spend a lot of space to explain these two. And HTTPS. More and more websites are adopting this protocol, so knowing HTTPS is essential. I hope this article can help those who are still very vague about HTTP protocol partners can deepen their understanding of HTTP protocol, this article’s shortcomings welcome comments pointed out. Finally, it is not easy to write a blog, but also hope that small partners to praise the collection support!