preface

HTTP protocol is the network protocol we deal with the most, but you are familiar with people you may not know, commonly known as the dark under the light. This article explains how to understand HTTP role positioning, teach you to understand the packet information. This article looks at the HTTP protocol from four dimensions.

  • The past and present of HTTP
  • HTTP in networks
  • Description of HTTP packets
  • HTTP security

1.HTTP’s past and present

HTTP/0.9 In 1990, HTTP was released before the HTTP standard was fully established. HTTP/1.0 1996, the official and early version of HTTP. HTTP/1.1 in 1997, should be the most widely used version at present, after all, there has been no big change in nearly 20 years till 2016, can not be said to be excellent but win in stability. HTTP/2.0 hasn’t been rolled out yet, it’s definitely faster and better, and it’s optimized for mobile. My favorite is WebSocket, which is a full-duplex communication standard.

2. HTTP on the network

2.1 Build the big picture first

It’s helpful to understand HTTP’s place or role in the overall network communication process. Butt determines head, right?




A communication process is simply divided into three stages: the client sends, the line transmits, and the server responds. Devices in the Internet must communicate based on rules that can be recognized by both parties, such as communication voice, format, hardware and operating system, etc. The collection of these rules is collectively called TCP/IP protocol family. Everyone plays by the same set of rules.

2.2 Client Sent

The client initiates a request through a URL.

Uniform Resource Locator (URL) is a subset of The Uniform Resource Identifier (URI). Uris can identify any resources on the network. Only with URIs can we find the part we need in the massive network resources.

You can see that the request goes through the DNS service first and reaches the IP address corresponding to the domain name in the URL.

DNS (Domain Name System), because the MEMORY of IP address is anti-human, and the Domain Name machine can not recognize, so there is a DNS will be free conversion of the two.

After receiving the complete information, the HTTP protocol wraps the request as an HTTP request packet. We know the protocol layers are OSI’s 7-tier model and TCP/IP’s 4-tier model.




According to the figure above, we can see that HTTP uses TCP protocol at the transmission layer and IP protocol at the network layer in the process of interaction to send information layer by layer after adding header subcontracting.

2.3 Transmission in line

In the process of information transmission in the line, the router constantly forward forward, then how to find the target server? You might say there are IP addresses, yes, but the communication between IP addresses depends on MAC addresses. In this case, Address Resolution Protocol (ARP) is used. ARP is a protocol used to resolve addresses. Based on the IP address of the communication party, the CORRESPONDING MAC address can be traced. In addition to the router, the information to reach the target server address may also go through the proxy server, gateway and other devices, the space is limited not to elaborate, want to listen to the message.

2.4 Server Response




After receiving the packet, the server restores the original packet information through the reverse process of subcontracting with the client.

3.HTTP packet details

3.1 Packet Structure

Answer first what exactly is the message?

The information used for HTTP interaction is called HTTP packets.

The following figure shows the packet structure. The important part is the header and the main body of the packet. The middle part is mainly used to separate the header and the main body.




There are some differences between the request message and the response message.




Request line vs. status line

The request line contains the method used for the request, the request URI, and the HTTP version. The status line contains the status code, reason phrase, and HTTP version indicating the result of the response

The status line has the familiar 200,404,500 status codes.

3.2 Message Details

This section describes the commonly used HTTP header information in detail.

Cache-Control

Instructions to manipulate the cache. There are several main uses

Cache-Control: no-cache

Indicates that the client does not accept cached responses and must request the latest resource.

Cache-Control: no-store

Indicates that the client cannot cache any part of the request or response.

Cache-control: max-age=604800 (unit: seconds)

Max-age indicates the maximum length of time a resource can be kept in the cache. When a max-age value of 0 is specified or the maximum cache time is exceeded, the cache server usually needs to forward requests to the source server.

Connection

After HTTP1.1, the client and server can communicate multiple times after establishing a connection, whether the connection is interrupted can rely on the following command control.

Connection: close

Indicates that you want to disconnect the current connection.

Connection: Keep-Alive

Indicates that you want to keep the current connection.

Date

Indicates the date and time when the HTTP packet is created.

Upgrade

Used to detect whether HTTP and other protocols can communicate with a higher version, and the parameter value can be used to specify a completely different communication protocol. Like webSocket, which I prefer

Upgrade: websocket

Via

To track the transmission path of request and response messages between client and server. As mentioned earlier, in addition to a router, a request may also pass through a proxy, gateway, etc., whose path will be recorded.

Warning

Some warning messages.

Accept

Media types and priorities that the user agent can handle.

Accept: Text/HTML, Image/JPEG Media types that can be processed by the client, including text and JPEG images

Accept-Charset

The character set supported by the user agent and the relative priority of the character set.

Accept-Charset: iso-8859-5, unicode-1-1; Q = 0.8

Weight q value to indicate the relative priority.

Accept-Encoding

Content encoding supported by the user agent and the priority order of content encoding.

Accept-Encoding: gzip, deflate,compress

Accept-Language

The set of natural languages (Chinese, English, etc.) and priority that the user agent can handle.

Accept-Language: zh-cn

Authorization

Authentication information of the user agent.

Host

The Internet host name and port number of the requested resource.

Range

Range: bytes=5001-10000 Requests to obtain resources from the 5001st byte to the 10,000th byte.

Referer

Which page the request URI originated from.

Referer: www.xxx.com/index.html

User-Agent

Create information such as the requested browser and user agent name.

Age

How long ago did the source server create the response? Field values are in seconds.

Age: 600

Expires

The expiration date of the resource.

Last-Modified

Specifies the time when the resource is finally modified.

Allow

Support request-URI for all HTTP methods of specified resources.

Allow: GET, HEAD

You know, GET, POST, stuff like that.

Content-Type

The media type of the object in the entity body.

Content-Type: text/html; charset=UTF-8

Content-Encoding

The content encoding method chosen by the server for the body of the entity.

Content-Encoding: gzip

Content-Language

Natural language used by the entity subject (Chinese or English, etc.)

Content-Length

Content-length specifies the size of the body of the entity in bytes.

Content-Length: 15000

We often use this information when we get information about download progress.

Set-Cookie

Information related to cookies.




Two attributes related to security




Cookie

If you want HTTP state management, the header of the request is added

Cookie: status=enable

HTTP is stateless and relies on cookies for state management.

3.3 Packet Analysis

The following is the packet information about the image request from Tmall. Request header

:authority:img.alicdn.com :method:GET :path:/tps/i2/TB1xgT8LVXXXXaZXFXX8ueZHFXX-180-72.png :scheme:https accept:image/webp,image/*,*/*; Q =0.8 Accept-encoding :gzip, deflate, SDCH Accept-language: zh-cn,zh; Q = 0.8 cache-control: Max - age = 0 if - modified - since: Tue. 15 Mar 2016 11:51:20 GMT referer:https://www.tmall.com/?ali_trackid=2:mm_26632322_6858406_23810104:1469694734_252_1633093166 The user-agent: Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36Copy the code

Response header

access-control-allow-origin:* age:535988 cache-control:max-age=31536000 content-type:image/png date:Fri, 22 Jul 2016 03:39:36 GMT eagleid:deba31c914696947642232836e expires:Sat, 22 Jul 2017 03:39:36 GMT last-modified:Tue, 15 Mar 2016 11:51:20 GMT Server :Tengine status:304 Timing-allow-Origin :* via: cache3.L2CN8 [0,200-0,H], Cache15.l2cn8 [0,0], cache1.cn74[0,304-0,H], cache1.cn74[0,0] X-cache :HIT TCP_IMS_HIT dirn:2:604845409Copy the code

Each children shoes according to the posture of the first section of self analysis.

3.4 status code




Sometimes the server will define some of its own status codes that do not adhere to the HTTP conventions. Using 400 to indicate a successful request is sad.

4. HTTP security

HTTP communication has the following security risks

  1. Plaintext transmission, information leakage
  2. Communication identity is not authenticated
  3. Information integrity cannot be guaranteed and may be falsified or altered

HTTPS is a widely accepted solution that adds Secure Socket Layer (SSL) between the HTTP and transport layers. SSL takes care of authentication, integrity protection, encryption.

Afterword.

Learn more about HTTP and turn familiar strangers into relatives. Useful help stamp like, have a question welcome message discussion.

Thank you

Illustrated HTTP