In THE HTTP protocol, the most core part is the packet transmitted between the client and server during communication. HTTP packets are string text consisting of multiple lines of data. Generally, an HTTP packet consists of the following four parts:
-
Start line
-
Header fields
-
A blank line (CRLF)
-
Message Body
In the above four parts, the start line and the header field are often referred to as “request header” or “response header”, and the message body is often referred to as “entity”, which is separated by the blank line (CRLF) that appears initially. The last packet body is optional and does not necessarily exist.
Generally, we use the above colloquial form to describe the format of HTTP packets, but this colloquial expression is not very precise. For example, a request header is usually described as follows: The request header contains the request method, request URI, and HTTP version number, but is there a space between them? Can I add two Spaces? For example, in the Host header field shown above, does the colon have to be followed by a space? Can I add two Spaces? Can I use Tab instead of Spaces? Can a colon be preceded by a space?
If we use colloquial expressions to describe HTTP messages, it is difficult to make sense of these issues. Therefore, RFC 7230 documents use the ABNF paradigm to describe HTTP packets rigorously.
ABNF paradigm
The ABNF paradigm is generally divided into two aspects: operators and core rules. We do not distinguish between them here, but introduce the descriptions related to the HTTP protocol.
- Select: A backslash “/” is used to indicate that one of multiple rules can be selected
Rule 1 / Rule 2 start-line = request-line/status-line // The start line can be a request line or a status lineCopy the code
- Variable to repeat
M * n * represents zero or more elements: *(header field CRLF) // There can be zero or more, and a header field ending in CRLF 1* // represents one or more elementsCopy the code
- Sequence composition: Use the parentheses “()” to group the rules as a whole.
(Rule 1 rule 2) *(SP/HTAB) // Zero or more (Spaces or horizontal tabs)Copy the code
- Optional sequence: use brackets for “[]”
[] [message-body]: The message body is an optional parameterCopy the code
- Whitespace: Represented by “SP”, used to separate defined elements
SP %x20 space request-line = method SP request-target SP http-version CRLFCopy the code
- Horizontal TAB: indicates HTAB
HTAB %x09 Horizontal TAB header-field = field-name ":" *(SP/HTAB) field-value *(SP/HTAB) // Header fields consist of field names and field values separated by colons. Colons can be followed by zero or more Spaces or horizontal tabsCopy the code
- CRLF: Internet standard newline, consisting of CR (Carriage return) and LF (newline)
CRLF start-line header-filed CRLF body-message // There must be a CRLF between the header field and the packet bodyCopy the code
Knowing the operators and core rules of the ABNF paradigm above, we can use the ABNF paradigm to define HTTP packets rigor:
// HTTP message structure: a start line; Zero or more header fields; A blank line; An optional message body http-message = start-line *(header-field CRLF) CRLF [message-body] // Start line composition: Request line or status line start-line = request-line/status-line // Request line composition: request method; The blank space; Request target; The blank space; Protocol version; Request-line = method SP request-target SP http-version CRLF // Status line: protocol version; The blank space; A status code. The blank space; Cause phrase; Status-line = http-version SP status-code SP reason-phrase CRLF // Header field structure: // A case insensitive field name; An English colon; Zero or more Spaces or horizontal tabs; The field values. Zero or more Spaces or horizontal tabs header-field = field-name ":" OWS field-value OWS field-name = token OWS = *(SP/HTAB) field-value = *(field-content/obs-fold) // Message body composition: message-body of the payload carrying the request or response = *OCTETCopy the code
The starting line
An HTTP packet can be a request packet sent from a client to a server or a response packet sent from a server to a client. Normally, for a request message, we call it the initial action request line; For the response message, we call it the initial behavior state line.
The request line
The request line describes how the client wants to operate on the resource on the server. It usually includes:
- Request method: How do you want to manipulate the resource
- Request-target: Usually a URI that represents the location of a resource
- Protocol version (HTTP-version) : indicates the HTTP version
Take a practical example:
GET/index. HTTP / 1.1 HTMLCopy the code
“GET” is the request method, “/index.html” is the request target, and “HTTP/1.1” is the protocol version. With this request line, you can explicitly tell the server: I want to get the index.html file in the root directory, and my HTTP version number is 1.1.
The status line
The status line describes the response status of the server. It usually includes:
-
Protocol version (HTTP-version) : indicates the HTTP version
-
Status-code: The status code also has a corresponding ABNF description. 3DIGIT indicates a three-digit integer, such as 200
-
Reason-phrase: indicates the reason of a status code
Again, the actual status line:
HTTP / 1.1 200 okCopy the code
“HTTP/1.1” is the protocol version, “200” is the status code, and “OK” is the cause phrase. This means to tell the client that the appropriate resource has been found and I have handled your request.
Header fields
As you can see from the figure above, each header field is a typical key-value format that ends with a CRLF representation, and at the end of the entire header field, there must be a CRLF representation for the end of the header field.
Host: 127.0.0.1:9090
Content-Type: text/html
...
Copy the code
For header fields, there are a few features to note:
- Header fields are fully extensible and there is no limit to using new field names
- Field names are case-sensitive, but usually the first letter is capitalized
- The field order of the different field names is not important, but it is a good practice to send the header field containing the control data first. Such as Host in the request and Date in the response, so that when the implementation does not process a message, it can make a judgment as early as possible
- A blank space between the field name and the colon is not allowed, as this could lead to a security breach. A colon can be followed by one or more Spaces (or landscape tabs). Field values can also be followed by one or more Spaces (or horizontal tabs), but with a CRLF at the end. In general, it is good practice to follow a colon with a space
Header fields usually fall into the following four categories:
- Common headers: can appear in either the request header or the response header, such as the Date field;
- Request header: can only appear in the request header, used to explain the request information, such as the Host field;
- Response header: appears only in the response header and is used to explain response information, such as the Server field.
- Entity header: indicates the body of the packet. For example, the Content-Length field indicates the Length of the body of the packet.
HTTP packets are the core of THE HTTP protocol, and header fields are the core of HTTP packets. With a good understanding of common header fields, the HTTP protocol is no longer a problem, and later articles will focus on common and important header fields.
Message body
The HTTP protocol does not require a message body to exist, and if it does, the message body is used to carry the payload of the request or response.
Typically, content-Length or Transfer-Encoding in the header field is a signal that the body of the message is present in the request. The existence of the message body in the response depends on the request method and status code. For example, the response of the HEAD request method never contains the message body, and all 1xx, 204, and 304 responses do not contain the message body.
summary
This paper introduces the HTTP packet structure in detail. In addition to the common colloquial expression, it also introduces the ABNF paradigm used in RFC 7230 documents to describe HTTP packets.
- HTTP packets consist of the start line, zero or more header fields, CRLF, and optional packet bodies
- The starting line and header fields are often referred to as request headers or response headers
- The starting line in the request is called the request line and consists of the request method, the request target, and the protocol version
- The starting line in the response is called the status line and consists of the protocol version, status code, and cause phrase
- The field name of the header field is usually capitalized with the first letter followed by a colon and no Spaces are allowed. Colons and field values can have zero or more Spaces or horizontal tabs directly
- Header fields typically fall into four categories: general header, request header, response header, and entity header
- The header field must be distinguished from the packet body by a CRLF
- The message body may not exist
The last word
Your “like” will give me a good mood, and it will be even more perfect if I can get a star.
The resources
- RFC 7230 document
- Geek Time – Perspective HTTP Protocol
- Geek Time – Web Protocol Details and Packet Capture
- Illustrated HTTP