Do you understand Content-Length and Transfer-Encoding after all this HTTP?

A series of reflections on the problems caused by Content-Length

The article addresses: blog.piaoruiqing.com/2019/09/08/…

Some time ago, during the development of API gateway, a timeout occurred during the debugging of postman. After investigation, it was determined that the content-Length of request data was inconsistent with the actual one, so this article exists.

preface

Content-length: The Length of an HTTP message. The number of eight-bit bytes expressed in decimal digits. In general, a lot of work is done by the framework and we don’t pay much attention to it, but in rare cases where content-Length is inconsistent with the actual message Length, the program may experience strange exceptions such as:

No response until timeout.
The request is truncated and the next request is resolved incorrectly.

What is the Content – Length

Content-length is the Length of an HTTP message, the number of eight bytes expressed in decimal digits. It is a common field in Headers. Content-length should be exact, otherwise it will cause an exception (specifically, this field is optional in HTTP1.0).

The content-length header indicates the size in bytes of the entity body in the packet. This size includes all Content encoding. For example, if a text file is gzip compressed, the content-Length header refers to the compressed size, not the original size.

How does Content-Length work

Content-length specifies the Length of the message as a decimal number, which the server/client uses to determine the Length of the message to be read.

If this length is incorrect, the following happens:

Content-length > The actual Length

If the content-Length is larger than the actual Length, the server/client will wait for the next byte after reading the end of the message and will not respond until timeout.

Similarly, content-Length exceeding the actual Length in a response message has the same effect:

Content-length < The actual Length

If the Length is smaller than the actual Length, the first request message will be intercepted, for example with param= piaorUIqing and Content-Length 10, then the message will be intercepted with param=piao, as shown in the figure below:

But is that all? Of course not. Let’s take a look at what happens on the second request, as shown here:

Two consecutive requests. The first time the message was truncated, but the second time the server threw an exception: Request method ‘ruiqingPOST’ not supported. Blue)゚ д ゚()

So ruiqingPOST is a fairy method?? At this point, with the sensitivity of years of debugging experience, we can roughly guess that the last request was intercepted and the rest of the message appears this time. Take out the Wireshark and verify it, as shown below:

The reason for this is that Connection:keep-alive is turned on. If Connection:close is used, the result is that each request is truncated without parsing clutter (such as splicing the last remaining message into subsequent request messages).

[Copyright Notice]

This article was published on
Park Seo-kyung’s blog, allow non-commercial reprint, but reprint must retain the original author
PiaoRuiQingAnd links:
blog.piaoruiqing.comFor negotiation or cooperation on authorization, please contact:
[email protected].

What if you’re not sure about the value of Content-Length

The content-length header indicates the size in bytes of the entity body in the packet. However, if the message Length cannot be obtained before the request processing is complete, we cannot specify content-Length explicitly and should use transfer-Encoding: chunked instead

What is transfer-encoding: chunked

Data is sent in a series of chunks. The Content-Length header is not sent in this case. At the beginning of each partition, the length of the current partition is added, in hexadecimal form, followed by \r\n, followed by the partition itself, followed by \r\n. A terminating block is a regular partition, different in that it has a length of 0.

Transfer-encoding: How does Chunked work

Let’s explore how transfer-encoding: chunked works using an example download file 🌰. The server code is as follows:

Use Postman to initiate a request and wireshark to capture and view the packet, as shown in the figure below:

Chunked data can be clearly seen in Wireshark. The structure of the chunked data is roughly as follows: the returned message is divided into multiple data blocks. Each data block has two parts, length + data, and both parts end with CRLF(i.e. \r\n). The termination block is a special data block with a length of 0, as shown in the figure below:

In this way, the block coding is completed. It is mainly used in situations where a large amount of data is transferred, but the length of the response cannot be obtained until the request is processed. For example, when you need to generate a large HTML table with data obtained from a database query, you need to transfer a large number of images, etc.

conclusion

Content-LengthIf it exists and is valid, it must be correct, otherwise an exception will occur (larger than the actual value will timeout, smaller than the actual value will truncate and possibly cause subsequent data parsing chaos).
If the packet containsTransfer-Encoding: chunkedThe head, thenContent-LengthWill be ignored.

If this article is helpful to you, please give a thumbs up (~ ▽ ~)”

reference

developer.mozilla.org
The Definitive GUIDE to HTTP

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Do you understand Content-Length and Transfer-Encoding after all this HTTP?

preface