Because of the usual development process always encountered the problem of garbled code, very upset, so summed up, deepen their impression, some rough, there are incorrect places welcome correction. The most effective way to do this is to simulate all the possible garbled situations.
Why is it garbled
In one sentence, garbled characters can occur if they are saved in a different encoding format than when they are displayed (decoded). Therefore, our front and back encodings are generally consistent with UTF-8.
Several kinds of garbled code parsing
- “??” Garbled analysis: ISO-8859-1 can encode only non-English characters, so non-English characters will be converted to 0x3F (i.e.? ASCII encoding, utF-8 encoding), then the encoding has been converted to irreversible garble. The resulting string is then decoded with “? “regardless of which ASCII-compatible encoding scheme is used. .
- “² aEO “garble analysis: ISO-8859-1 can only represent non-English characters, so the decoding will be strictly parsed byte by byte (this operation does not constitute a break to the encoding, but also can be re-used isO-8859-1 to obtain the byte stream and then use the correct encoding method to decode the correct string).
- “��� “garbled code analysis: when decoding byte streams encoded by GB18030 with UTF-8, four bytes are found to be UTF-8 illegal byte streams, so directly converted to ��.
The factors that determine coding
An HTTP request goes through:
Browser –> server —-> Browser
At each step of the process there are factors that affect coding:
1. Page coding
Charset in HTML Meta Data
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>hello</title>
</head>Copy the code
2. Browser coding
If the browser encoding is inconsistent with the page encoding, the page text will be garbled
3. The ajax requests
AjaxGet Query parameters in ajaxGet are affected by browser encoding, so it is best to manually and explicitly encode the entire URL or query string to UTF-8 using encodeURI or encodeURIComponent. AjaxPost An ajaxPost request uses UTF-8 encoding for both the URI and the request body, regardless of the content type. Therefore, it is necessary to set the characterEncoding in the server request to “UTF-8” to make the ajaxPost non-garbled.
4. The spring code
Through the class org. Springframework. Web. Filter. CharacterEncodingFilter, define the request and the response of the coding. This is because isO-8859-1 is used by default in Spring source code, which is not good for Chinese support. Therefore, utF-8 is configured here. Configure it in web.xml
<filter>
<filter-name>CharacterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>CharacterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>Copy the code
The CharacterEncodingFilter class has both encoding and forceEncoding, where encoding indicates whether the encoding of request is set and forceEncoding indicates whether the encoding of Response is set.
5. Tomcat configuration
<Connector port="8080" protocol="HTTP / 1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8" />Copy the code
The configuration in server. XML in the Tomcat conf is generally utF-8 or ISO-8859-1. The CONFIGURATION in this configuration encodes the URI when initiating a GET request, so it needs to be processed to avoid garbled characters. But the body of the POST request is not affected. Solutions:
$.ajax({
url:"/hello? param=" + encodeURI(encodeURI($("#before").html())),
type:"GET",
contentType:"application/x-www-form-urlencoded; charset=utf-8",
success:function(result){
$("#after").val(result); }});Copy the code
EncodeURI the parameters in the URL of the GET request twice;
@RequestMapping(value = "/hello",method = RequestMethod.GET,produces = "text/plain; charset=UTF-8")
@ResponseBody
public String index(@RequestParam("param") String param) throws UnsupportedEncodingException {
String newParam = URLDecoder.decode(param,"utf-8");
String handleMsg = "Data after background processing :";
String result = handleMsg + newParam;
return result;
}Copy the code
The background parameters are decoded once, so that the parameters obtained will not appear garbled.
Code address: github.com/PanPanda/en…
Reference: In-depth analysis of coding issues in Web request responses