You need to implement a force-download feature (that is, force a pop-up download dialog to prevent the browser from trying to parse and display certain file formats), and the file name must remain the same as when the user uploaded it previously (possibly containing non-ASCII characters).

The former requirement is easy to implement: one that uses HTTP headers

Content

Content-type: application/octet-stream: content-type: application/octet-stream: content-type: application/octet-stream: content-type: application/octet-stream: Content-type: application/octet-stream The latter requirement is more painful, involving the encoding of the Header (the filename is placed in Content-Disposition as a filename argument). As we all know,

HTTP

The content-type in the Header can specify the encoding of the Content (body), but how to specify the encoding of the Header itself? Even, do headers allow non-ASCII encodings at all?



If allowed

coding

Problem no matter, then you will encounter in a system and browser download file name garbled situation; If you try to search for solutions, you’re likely to find a bunch of contradictory solutions (I can safely tell you that 99% of them are substandard tricks). Let’s see how to solve this problem gracefully and flawlessly.

I took a lot of detours to explore this question. From their own attempts, to Google (respectively tried both Chinese and English search), and then to read the source code of classic projects such as Discuz, there are different opinions, no agreement. Finally, I thought about going back to the RFC and looking at the standard documentation, and it worked. Because of the tortuous nature of the inquiry, I’ll write down the standard approach — content-disposition should be set like this:

[AppleScript]

Plain text view
Copy the code

?
1
2
3
Content
-
Disposition
:
attachment
;
filename
=
"$encoded_fname"
;
filename
*
=
utf
- 8 -
''$encoded_fname

Where $encoded_fname refers to the utF-8-encoded original file name obtained after percent encoding according to RFC 3986 (rawurlencode() is used in PHP). These lines can also be combined into one line (separated by a space is recommended).

Also, in order to be compatible with IE6, please ensure that the original file name must contain an English extension!

Dwell on

Let’s take a look at why and why we can do this.

First, according to THE HTTP 1.1 protocol defined in RFC 2616 (RFC 2068 is the earliest version; 2616 replaced 2068 and was the most widely used, and later replaced by other RFCS, which will be mentioned later),

HTTP

The message format is actually based on old ARPA Internet Text Messages, which can only be ASCII

coding

(RFC 822 Section 3). RFC 2616 Section 2.2 emphasizes that TEXT (Section 4.2: To use any other character set, the string must be encoded/escaped using the rules of RFC 2047 — it must be noted that this rule was originally an extension to MIME (E-mail) in a very different format than percent encoding. To give an example in MIME:

[AppleScript]

Plain text view
Copy the code

?
1
Subject
:
=
? ISO
- 8859.
- 1
? B? SWYgeW
91
IGNhbiByZWFkIHRoaXMgeW
8
=
?
=

At the time of the launch of RFC 2616 in 1999, This Header is not yet part of the official HTTP protocol, It is simply borrowed directly from the MIME standard because it is widely used (RFC 2616 Section 19.5.1). As a result, few browsers support content-Disposition’s multilingual encoding feature as an “extension of extension features”. In fact, RFC 2616’s proposed use of RFC 2047 for multilingual encoding has never been supported by mainstream browsers, so don’t worry about the MIME scheme above…

But this is a real problem, so browsers have come up with some solutions:

  • IE supports direct percent encoding in filename: filename= “$encoded_text” (not MIME encoding!) . According to RFC 2616, if the part inside quotes is not MIME encoded, it should be treated as content, even if it “looks like a percent encoded string.” Internet Explorer, however, will “automatically” decode such filenames — as long as they have an unencoded (ASCII) suffix!
  • Some other browsers support a rougher approach: allow utF-8-encoded strings to be used directly in filename= “TEXT”! This is also a direct violation of RFC 2616, which states that HTTP headers must be ASCII encoded.


The behavior of the two types of browsers is incompatible with each other. So you can determine UA and then use the former for IE and the latter for other browsers, which will normally give you the ability to just work (Discuz does this). For Opera and Safari, however, this may not work.

Times are moving forward, and RFC 5987 was released in 2010, formalizing the rules

HTTP

Parameter *=charset’lang’value format is used to process the multi-language encoding in Header, where:

  • Charset and lang are case insensitive.
  • Lang is the language used to annotate fields for screen reading software or special rendering based on language characteristics, and can be left blank.
  • Value uses percentage encoding according to RFC 3986 Section 2.1 and specifies that browsers should support at least ASCII and UTF-8.
  • When parameter and parameter* appear together in the HTTP header, the browser should use the latter.


This has the advantage of maintaining forward compatibility: HTTP headers are still ASCII-only, and older browsers that don’t support the standard treat parameter* as a field name and ignore it as an unknown field, as specified in RFC 2616. This was followed in 2011 by the release of RFC 6266, which formally incorporated Content-Disposition into the HTTP standard, re-emphasised the multilingual approach to encoding in RFC 5987, and gave an example to solve the problem of backward compatibility:

[AppleScript]

Plain text view
Copy the code

?
1
2
3
Content
-
Disposition
:
attachment
;
filename
=
"EURO rates"
;
filename
*
=
utf
- 8 -
''%e
2
%
82
%ac%
20
rates

In this case, the value of filename is a synonym for an English phrase — in accordance with RFC 2616, normal fields should not be encoded; Utf-8 is used only because it is mandatory in the standard. However, if we think about it a little more — the most common older browser on the market today is Internet Explorer. In this case, we can modify the filename field to use the percentage encoded string as well:

[AppleScript]

Plain text view
Copy the code

?
1
2
3
Content
-
Disposition
:
attachment
;
filename
=
"%e2%82%ac%20rates.txt"
;
filename
*
=
utf
- 8 -
''%e
2
%
82
%ac%
20
rates.txt

Newer browsers like Firefox, Chrome, Opera, Safari, etc., all support and use filename* as specified in the new standard, even if they don’t automatically decode filename. For older versions of Internet Explorer, they cannot recognize filename* and will automatically ignore it and use the old filename (the only minor glitch is that it must have an English suffix). In this way, the problem of multi-browser multi-language compatibility is perfectly solved, which does not need UA judgment and is more in line with the standard.

P.S. Why does PHP use the rawurlencode() function? Because this is the true “percent URL encoding” for RFC 3986, it’s just a strange name because there was a function urlencode() that implemented similar encoding rules in HTTP POST. The difference is that the former encodes Spaces as %20, while the latter encodes Spaces as + signs. If the latter is used, the Spaces change to plus signs when IE6 downloads filenames with Spaces. Normally, you won’t use urlencode() (Discuz some versions incorrectly use it for filename encoding, causing Spaces to change to plus signs).

Content-disposition from MDN

In a regular HTTP reply, the Content-Disposition header indicates whether the Content of the reply should be presented inline (that is, as a web page or part of a page) or downloaded and saved locally as an attachment.

In a multipart/form-data reply message body, the Content-Disposition header can be used in a sub-part of the multipart message body to give information about its corresponding field. Each subpart is separated by a delimiter defined in the Content-Type. The message body itself has no practical meaning.

Content-disposition headers were originally defined in the MIME standard, and HTTP forms and POST requests use only a subset of all their parameters. Only form-data and optional name and filename can be used in HTTP scenarios.

grammar

As the header in the message body

In HTTP scenarios, the first parameter is either inline (the default, which indicates that the message in the reply will be displayed as part of the page or as the entire page) or attachment (which means that the message body should be downloaded locally; Most browsers will present a “save as” dialog box, prefilling the value of filename with the downloaded filename, if it exists).

[AppleScript]

Plain text view
Copy the code

?
1
2
3
Content
-
Disposition
:
inline
Content
-
Disposition
:
attachment
Content
-
Disposition
:
attachment
; filename
=
"filename.jpg"

As a header in the multipart body

In the HTTP scenario. The first argument is always fixed form-data; Additional arguments are case insensitive and have parameter values, whose names are concatenated with an equal sign (=) and whose values are enclosed in double quotes. Use a semicolon (;) between arguments Space.

[AppleScript]

Plain text view
Copy the code

?
1
2
3
Content
-
Disposition
:
form
-
data
Content
-
Disposition
:
form
-
data
;
name
=
"fieldName"
Content
-
Disposition
:
form
-
data
;
name
=
"fieldName"
; filename
=
"filename.jpg"

instruction

name

This is followed by a string of form field names, each of which corresponds to a subpart. In cases where the same field name corresponds to multiple files (for example, the
element with a multiple attribute), multiple subparts share the same field name. If the value of the name parameter is ‘_charset_’, it means that the subpart is not an HTML field, but the default character set used by the parts if the character set information is not explicitly specified.

filename

This is followed by a string of the initial name of the file to be transferred. This parameter is always optional and should not be used blindly: the path information must be dropped and converted to conform to server file system rules. This parameter is mainly used to provide presentation information. When used with Content-Disposition: Attachment, it is used as the default filename presented to the user in the “Save as” dialog.

filename*

The only difference between “filename” and “filename*” is that “filename*” is encoded as specified in RFC 5987. When “filename” and “filename*” appear at the same time, “filename*” should be preferred if both are supported.