The premise
When I was learning Tomcat today, I started Tomcat using the source code, but the information printed out by the console was garbled. As a Java development, for the code should be not unfamiliar, so at first I went to modify the configuration files as usual, but found that whether GBK or utf-8, console print information is garbled, so I will go to the Internet for relevant experience and blog, after a search, I found a good solution, but there still Some problems, eventually experienced a search finally solved the problem
First solution attempt
Record a Tomcat source start console Chinese garble problem debugging process
I started by following the blogger’s second method and modifying the methods in both classes
org.apache.tomcat.util.res.StringManager
In the classgetString(final String key, final Object... args)
org.apache.jasper.compiler.Localizer
Of the classgetMessage(String errCode)
A sentence was added to the corresponding method in both classes
value = new String(value.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
Copy the code
And then when you restart it, it turns out that the console information is correct
Second resolution attempt
However, when I open the Host-manager project and manager project of the Tomcat project, garbled characters appear on the page again
A lengthy debugging process followed to figure out why the garbled code problem was occurring
Why is there a garble problem?
- First I find a garbled output class in the garbled console, and then CLICK on it to find the output part
- Let’s go back to this one that was called
getString
methods
It finds itself in the same method that the blogger modified earlier, and then, logically, continues to trace the source of the string
Then click on the next source layer
From here you can see that the bundle’s getString method is called, so let’s continue inside
If you look closely at the upper left corner, the top directory is rt.jar, which is the source code loaded by the Bootstrap ClassLoader, and then further inside
You can see it’s a getObject method of this class, and the key is handleGetObject, so let’s go inside
The handleGetObject method of PropertyResourceBundle is called. The key of this method is the lookup. Get (key).
Through the definition and debug content, it is not difficult to guess that the information stored in the map is the string information to obtain. It can be seen that the information stored in the map is already garbled, so when we obtain the string information according to the key, the obtained string itself is a garbled string
And the way we tried it before, after we got the garbled information from it, was to re-encode the string in UTF-8 format, and I guess the reason why the console information is not garbled but host-manager and manager are still garbled is because there are other places where we still use this StringManager, but it doesn’t go in Line is manually recoded, so I wonder if I can make the string in the looup itself correct, not garbled, and the message is loaded by a ResourceBoundle, followed by a debug that relies on a control
As you can see, our bundle is finally created by the newBundle method of Control
Yes, now we have the answer, because when we created the bundle, the source code used the default InputStream, and InputStream was read in ISO-8859-1 by default, and our file was saved in UTF-8, so we have the problem of garbled characters, so we know the problem At first I wanted to change the format of the read with a wrapper
InputStreamReader isr = new InputStreamReader(stream, "UTF-8");
However, it was discovered in hindsight that this was part of the JDK source code and was not allowed to be modified, so does that mean there is no other way? No, look closely at the constructor section of the ResourceBundle
That’s right, in addition to a baseName constructor, we can also pass in a Control constructor, and since the bundle is created by calling the newBundle method of Control, we just need to inherit the Control class and override the newBundle method in the ne overridden In the wBundle method, InputStream is wrapped with InputStreamReader
I searched the Internet and found this answer on StackOverflow
java – How to use UTF-8 in resource properties with ResourceBundle – Stack Overflow
This person’s answer gave a very complete and detailed explanation, and also attached a complete solution. The original reason for reading garbled code is that the default format of ISO-8859-1 is used when InputStream is read. However, the IDE I use is IDEA, and the default encoding format is UTF-8, so when I read garbled code, I read it Because the format is inconsistent and display garble, the problem is found, so how to solve it?
The first method is to convert any characters in a saved file that are outside the ISO-8859-1 encoding into \uXXXX format. You can use the native2ASCIi.exe tool that comes with the JDK
But the whole conversion is a hassle and a lot of files, so I don’t think anyone will use this method
The second option is to pass in a custom UTF8Control when creating a ResourceBundle
Let’s take a look at a ready-made UTF8Control class provided by the blogger
package org.apache.tomcat.util.res;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.nio.charset.StandardCharsets;
import java.util.Locale;
import java.util.PropertyResourceBundle;
import java.util.ResourceBundle;
public class UTF8Control extends ResourceBundle.Control {
// Override the parent class's newBundle method
public ResourceBundle newBundle
(String baseName, Locale locale, String format, ClassLoader loader, boolean reload)
throws IllegalAccessException, InstantiationException, IOException {
// The below is a copy of the default implementation.
// The default implementation is c-v
String bundleName = toBundleName(baseName, locale);
String resourceName = toResourceName(bundleName, "properties");
ResourceBundle bundle = null;
InputStream stream = null;
if (reload) {
URL url = loader.getResource(resourceName);
if(url ! =null) {
URLConnection connection = url.openConnection();
if(connection ! =null) {
connection.setUseCaches(false); stream = connection.getInputStream(); }}}else {
stream = loader.getResourceAsStream(resourceName);
}
if(stream ! =null) {
try {
// Only this line is changed to make it to read properties files as UTF-8.
// This is the key point, wrapping the original InputStremReader that is read in UTF-8 format
// UtF-8 format will be read by default!!
bundle = new PropertyResourceBundle(new InputStreamReader(stream, StandardCharsets.UTF_8));
} finally{ stream.close(); }}returnbundle; }}Copy the code
Let’s test the results
The third solution is to use the corresponding IDE. In Eclipse, characters outside the ISO-8859-1 range are automatically converted to \uXXXX format when processing. Properties files, so you don’t need to do any setup when using Eclipse, you can start T without any garble omcat!!!
As you can see, I’m now commenting out all of the previous statements, so let’s run and see what happens
Refer to the link
java – How to use UTF-8 in resource properties with ResourceBundle – Stack Overflow
Note a tomcat source start console Chinese garble debugging process _zhoutaoping1992 blog -CSDN blog