preface
HTTP protocol is the network protocol we deal with the most, but you are familiar with people you may not know, commonly known as the dark under the light. This article explains how to understand HTTP role positioning, teach you to understand the packet information. This article looks at the HTTP protocol from four dimensions.
- The past and present of HTTP
- HTTP in networks
- Description of HTTP packets
- HTTP security
1.HTTP’s past and present
HTTP/0.9 In 1990, HTTP was released before the HTTP standard was fully established. HTTP/1.0 1996, the official and early version of HTTP. HTTP/1.1 in 1997, should be the most widely used version at present, after all, there has been no big change in nearly 20 years till 2016, can not be said to be excellent but win in stability. HTTP/2.0 hasn’t been rolled out yet, it’s definitely faster and better, and it’s optimized for mobile. My favorite is WebSocket, which is a full-duplex communication standard.
2. HTTP on the network
2.1 Build the big picture first
It’s helpful to understand HTTP’s place or role in the overall network communication process. Butt determines head, right?
A communication process is simply divided into three stages: the client sends, the line transmits, and the server responds. Devices in the Internet must communicate based on rules that can be recognized by both parties, such as communication voice, format, hardware and operating system, etc. The collection of these rules is collectively called TCP/IP protocol family. Everyone plays by the same set of rules.
2.2 Client Sent
The client initiates a request through a URL.
Uniform Resource Locator (URL) is a subset of The Uniform Resource Identifier (URI). Uris can identify any resources on the network. Only with URIs can we find the part we need in the massive network resources.
You can see that the request goes through the DNS service first and reaches the IP address corresponding to the domain name in the URL.
DNS (Domain Name System), because the MEMORY of IP address is anti-human, and the Domain Name machine can not recognize, so there is a DNS will be free conversion of the two.
After receiving the complete information, the HTTP protocol wraps the request as an HTTP request packet. We know the protocol layers are OSI’s 7-tier model and TCP/IP’s 4-tier model.
According to the figure above, we can see that HTTP uses TCP protocol at the transmission layer and IP protocol at the network layer in the process of interaction to send information layer by layer after adding header subcontracting.
2.3 Transmission in line
In the process of information transmission in the line, the router constantly forward forward, then how to find the target server? You might say there are IP addresses, yes, but the communication between IP addresses depends on MAC addresses. In this case, Address Resolution Protocol (ARP) is used. ARP is a protocol used to resolve addresses. Based on the IP address of the communication party, the CORRESPONDING MAC address can be traced. In addition to the router, the information to reach the target server address may also go through the proxy server, gateway and other devices, the space is limited not to elaborate, want to listen to the message.
2.4 Server Response
After receiving the packet, the server restores the original packet information through the reverse process of subcontracting with the client.
3.HTTP packet details
3.1 Packet Structure
Answer first what exactly is the message?
The information used for HTTP interaction is called HTTP packets.
The following figure shows the packet structure. The important part is the header and the main body of the packet. The middle part is mainly used to separate the header and the main body.
There are some differences between the request message and the response message.
Request line vs. status line
The request line contains the method used for the request, the request URI, and the HTTP version. The status line contains the status code, reason phrase, and HTTP version indicating the result of the response
The status line has the familiar 200,404,500 status codes.
3.2 Message Details
This section describes the commonly used HTTP header information in detail.
Cache-Control
Instructions to manipulate the cache. There are several main uses
Cache-Control: no-cache
Indicates that the client does not accept cached responses and must request the latest resource.
Cache-Control: no-store
Indicates that the client cannot cache any part of the request or response.
Cache-control: max-age=604800 (unit: seconds)
Max-age indicates the maximum length of time a resource can be kept in the cache. When a max-age value of 0 is specified or the maximum cache time is exceeded, the cache server usually needs to forward requests to the source server.
Connection
After HTTP1.1, the client and server can communicate multiple times after establishing a connection, whether the connection is interrupted can rely on the following command control.
Connection: close
Indicates that you want to disconnect the current connection.
Connection: Keep-Alive
Indicates that you want to keep the current connection.
Date
Indicates the date and time when the HTTP packet is created.
Upgrade
Used to detect whether HTTP and other protocols can communicate with a higher version, and the parameter value can be used to specify a completely different communication protocol. Like webSocket, which I prefer
Upgrade: websocket
Via
To track the transmission path of request and response messages between client and server. As mentioned earlier, in addition to a router, a request may also pass through a proxy, gateway, etc., whose path will be recorded.
Warning
Some warning messages.
Accept
Media types and priorities that the user agent can handle.
Accept: Text/HTML, Image/JPEG Media types that can be processed by the client, including text and JPEG images
Accept-Charset
The character set supported by the user agent and the relative priority of the character set.
Accept-Charset: iso-8859-5, unicode-1-1; Q = 0.8
Weight q value to indicate the relative priority.
Accept-Encoding
Content encoding supported by the user agent and the priority order of content encoding.
Accept-Encoding: gzip, deflate,compress
Accept-Language
The set of natural languages (Chinese, English, etc.) and priority that the user agent can handle.
Accept-Language: zh-cn
Authorization
Authentication information of the user agent.
Host
The Internet host name and port number of the requested resource.
Range
Range: bytes=5001-10000 Requests to obtain resources from the 5001st byte to the 10,000th byte.
Referer
Which page the request URI originated from.
Referer: www.xxx.com/index.html
User-Agent
Create information such as the requested browser and user agent name.
Age
How long ago did the source server create the response? Field values are in seconds.
Age: 600
Expires
The expiration date of the resource.
Last-Modified
Specifies the time when the resource is finally modified.
Allow
Support request-URI for all HTTP methods of specified resources.
Allow: GET, HEAD
You know, GET, POST, stuff like that.
Content-Type
The media type of the object in the entity body.
Content-Type: text/html; charset=UTF-8
Content-Encoding
The content encoding method chosen by the server for the body of the entity.
Content-Encoding: gzip
Content-Language
Natural language used by the entity subject (Chinese or English, etc.)
Content-Length
Content-length specifies the size of the body of the entity in bytes.
Content-Length: 15000
We often use this information when we get information about download progress.
Set-Cookie
Information related to cookies.
Two attributes related to security
Cookie
If you want HTTP state management, the header of the request is added
Cookie: status=enable
HTTP is stateless and relies on cookies for state management.
3.3 Packet Analysis
The following is the packet information about the image request from Tmall. Request header
:authority:img.alicdn.com :method:GET :path:/tps/i2/TB1xgT8LVXXXXaZXFXX8ueZHFXX-180-72.png :scheme:https accept:image/webp,image/*,*/*; Q =0.8 Accept-encoding :gzip, deflate, SDCH Accept-language: zh-cn,zh; Q = 0.8 cache-control: Max - age = 0 if - modified - since: Tue. 15 Mar 2016 11:51:20 GMT referer:https://www.tmall.com/?ali_trackid=2:mm_26632322_6858406_23810104:1469694734_252_1633093166 The user-agent: Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36Copy the code
Response header
access-control-allow-origin:* age:535988 cache-control:max-age=31536000 content-type:image/png date:Fri, 22 Jul 2016 03:39:36 GMT eagleid:deba31c914696947642232836e expires:Sat, 22 Jul 2017 03:39:36 GMT last-modified:Tue, 15 Mar 2016 11:51:20 GMT Server :Tengine status:304 Timing-allow-Origin :* via: cache3.L2CN8 [0,200-0,H], Cache15.l2cn8 [0,0], cache1.cn74[0,304-0,H], cache1.cn74[0,0] X-cache :HIT TCP_IMS_HIT dirn:2:604845409Copy the code
Each children shoes according to the posture of the first section of self analysis.
3.4 status code
Sometimes the server will define some of its own status codes that do not adhere to the HTTP conventions. Using 400 to indicate a successful request is sad.
4. HTTP security
HTTP communication has the following security risks
- Plaintext transmission, information leakage
- Communication identity is not authenticated
- Information integrity cannot be guaranteed and may be falsified or altered
HTTPS is a widely accepted solution that adds Secure Socket Layer (SSL) between the HTTP and transport layers. SSL takes care of authentication, integrity protection, encryption.
Afterword.
Learn more about HTTP and turn familiar strangers into relatives. Useful help stamp like, have a question welcome message discussion.
Thank you
Illustrated HTTP