Abstract: Understand the HTTP protocol…
- After all this HTTP, do you understand Content-Length and Transfer-Encoding?
- Author: Park Seo-kyung’s blog
A series of thoughts caused by the problem of Content-Length: some time ago, when developing API gateway, there was a timeout situation when using Postman debugging. After investigation, it was determined that the problem was caused by the inconsistency between the content-Length and the actual data after the request was processed, so there is this article.
Content-length: The Length of an HTTP message. The number of eight-bit bytes expressed in decimal digits. In general, a lot of work is done by the framework and we don’t pay much attention to it, but in rare cases where content-Length is inconsistent with the actual message Length, the program may experience strange exceptions such as:
- No response until timeout.
- The request is truncated and the next request is resolved incorrectly.
Content-length is the Length of an HTTP message, the number of eight bytes expressed in decimal digits. It is a common field in Headers. Content-length should be exact, otherwise it will cause an exception (specifically, this field is optional in HTTP1.0).
The content-length header indicates the size in bytes of the entity body in the packet. This size includes all Content encoding. For example, if a text file is gzip compressed, the content-Length header refers to the compressed size, not the original size.
How does Content-Length work
Content-length specifies the Length of the message as a decimal number, which the server/client uses to determine the Length of the message to be read.
If this length is incorrect, the following happens:
Content-length > The actual Length
If the content-Length is larger than the actual Length, the server/client will wait for the next byte after reading the end of the message and will not respond until timeout.
Similarly, content-Length exceeding the actual Length in a response message has the same effect:
Content-length < The actual Length
If the Length is smaller than the actual Length, the first request message will be intercepted, for example with param= piaorUIqing and Content-Length 10, then the message will be intercepted with param=piao, as shown in the figure below:
But is that all? Of course not. Let’s take a look at what happens on the second request, as shown here:
Two consecutive requests. The first time the message was truncated, but the second time the server threw an exception: Request method ‘ruiqingPOST’ not supported. Blue)゚ д ゚()
So ruiqingPOST is a fairy method?? At this point, with the sensitivity of years of debugging experience, we can roughly guess that the last request was intercepted and the rest of the message appears this time. Take out the Wireshark and verify it, as shown below:
The reason for this is that Connection:keep-alive is turned on. If Connection:close is used, the result is that each request is truncated without parsing clutter (such as splicing the last remaining message into subsequent request messages).
What if you’re not sure about the value of Content-Length
The content-length header indicates the size in bytes of the entity body in the packet. However, if the message Length cannot be obtained before the request processing is complete, we cannot specify content-Length explicitly and should use transfer-Encoding: chunked instead
What is transfer-encoding: chunked
Data is sent in a series of chunks. The Content-Length header is not sent in this case. At the beginning of each partition, the length of the current partition is added, in hexadecimal form, followed by \r\n, followed by the partition itself, followed by \r\n. A terminating block is a regular partition, different in that it has a length of 0.
Transfer-encoding: How does Chunked work
Let’s explore how transfer-encoding: chunked works using an example of downloading a file. The server code is as follows:
Use Postman to initiate a request and wireshark to capture and view the packet, as shown in the figure below:
Chunked data can be clearly seen in Wireshark. The structure of the chunked data is roughly as follows: the returned message is divided into multiple data blocks. Each data block has two parts, length + data, and both parts end with CRLF(i.e. \r\n). The termination block is a special data block with a length of 0, as shown in the figure below:
In this way, the block coding is completed. It is mainly used in situations where a large amount of data is transferred, but the length of the response cannot be obtained until the request is processed. For example, when you need to generate a large HTML table with data obtained from a database query, you need to transfer a large number of images, etc.
Content-Length
If it exists and is valid, it must be correct, otherwise an exception will occur (larger than the actual value will timeout, smaller than the actual value will truncate and possibly cause subsequent data parsing chaos).- If the packet contains
Transfer-Encoding: chunked
The head, thenContent-Length
Will be ignored.
reference
- developer.mozilla.org
- The Definitive GUIDE to HTTP
Copyright statement
This article is posted on Piao Ruiqing’s blog. It can be reproduced for non-commercial purposes, but only by the original author piao Ruiqing and the link :blog.piaoruiqing.com. For licensing negotiation or cooperation, please contact [email protected].
About Fundebug
Fundebug focuses on real-time BUG monitoring for JavaScript, wechat applets, wechat games, Alipay applets, React Native, Node.js and Java online applications. Since its official launch on November 11, 2016, Fundebug has handled over 2 billion error events in total. Its paid customers include Sunshine Insurance, Walnut Programming, Lychee FM, Zhanmen 1-on-1, Weimai, Qingtuanshe and many other brand enterprises. Welcome to try it for free!