When the typical LAMP architecture is used for development, the coding is specified in many places in the environment, and if one place is ignored, it may cause garbled Chinese characters on the page. This paper will summarize the possible causes of these garbled characters, so as to facilitate our investigation.
1. Problems in the page.
Each web page file has its own code, and a location in the source code of the web page file tells the browser what code the page should be interpreted in.
< meta content-type HTTP – equiv = “” Content =” text/HTML. Charset = utf-8 “/ >
The encoding specified here should be the same as the encoding of the page itself, otherwise garbled characters will be generated.
One more thing to note: when using UTF-8 encoding when browsing web pages under Internet Explorer, the <title> tag is placed before the <meta> tag. When the title is In Chinese (for example, the Blog name is in Chinese or the title of the article is in Chinese), there will be a blank page in IE. And the use of GBK or GB2312 and other codes will not have any problem.
The problem is that UTF-8 uses three bytes to represent a Chinese character, while GB2312 or BIG5 uses two bytes. </title> </title> </title> </title> </title> </title> </title> </title> </title> </title> </title> </title> </title> At this time, the half Chinese character and </title> < combined into a garbled word, resulting in IE can not read the < title> part, resulting in the entire page blank output. If you look at the source file at this point, you’ll see that the entire page has actually been downloaded.
How the browser gets the encoding:
The “content-type” field in the HTTP Header; Whether there is a Byte of Marker at the beginning of the RETURNED HTML code; Meta tags in HTML code;
The browser decodes and parses a web page:
When parsing a page, browsers (whether Internet Explorer or Firefox) first take the content-Type item in the HTTP Header and assume that the page encoding is the value specified by Charset if charset is specified. If not specified, the default value is assumed. According to the table above, the default value of IE Chinese is GB2312, and the default value of Firefox Chinese is GBK, but IE GB2312 seems to be the same as GBK. The browser then looks to see if there is a BOM. Once a utF-8 3-byte BOM is found, the page is reinterpreted as UTF-8.
Decoding stage, after the completion of decoding is the stage of parsing HTML. <meta http-equiv= “content-type” Content = “text/ HTML”; <meta http-equiv= “content-type” Content = “text/ HTML; Charset = utF-8 “/> < span style =” box-sizing: border-box; color: RGB (51, 51, 51); line-height: 22px! Important; word-break: inherit! Important;”
Meta tags:
“Meta is a response header used to simulate the HTTP protocol in AN HTML document.” Writing in meta tags is the same as writing in HTTP headers, which is a solution to the problem that people writing web pages in normal HTML cannot define HTTP headers themselves. However, meta is an HTML tag, so you have to go through the HTML parsing steps to take effect, and when that happens, the browser will step back, reset the HTTP header and start decoding and parsing HTML all over again. So the content written in meta overwrites the content in the HTTP header, regardless of browser.
2, Apache DefaultCharset configuration.
The Apache2 configuration includes AddDefaultCharset, which is not specified in the configuration file by default.
The AddDefaultCharset directive says that when the reply is text/plain or text/ HTML, Add the default character set in the HTTP response header grammar AddDefaultCharset On | Off | charset default AddDefaultCharset Off scope server config, virtual host, Directory,.htAccess override item FileInfo Status core (C) module core
This directive adds the default character set to the HTTP reply header if and only if the reply content is text/plain or text/ HTML. Theoretically this will override the character set specified by the <meta> tag in the document body, but the actual behavior usually depends on the Settings of the user’s browser. AddDefaultCharset Off will disable this feature. AddDefaultCharset On will enable the default Apache internal character set ISO-8859-1. You can also specify another charset to use in the name of the character set registered with IANA. For example, AddDefaultCharset UTF-8
That is, when Apache does not specify defaultCharSet, the page encoding is specified by the page’s own meta tag. When specified by Apache, the encoding specified by the meta tag in the page is ignored. But allow the script to directly header encoding to the client.
This makes it clear that the server configuration generally does not choose this option, which gives us a lot of flexibility in page writing. Web pages with different codes can exist in the same server. Of course, this is not a very good habit.
References:
1, IE open UTF-8 coding page title display blank problem 2, solve a garbled code problem