In the Java language, there are eight basic types and a more specific type, String. Each of these types provides the concept of a constant pool in order to make them run faster and more memory efficient. A constant pool is similar to a cache provided at the Java system level.

The eight basic types of constant pools are all system-coordinated, but the String type is special. There are two main ways it can be used:

Strings declared directly in double quotes are stored directly in the constant pool. If a String is not declared in double quotation marks, you can use the intern method provided by String. The intern method will look up the string constant pool to see if the current string exists, and if it doesn’t it will put the current string in the constant pool so we’ll focus on the String#intern method.

Let’s take a closer look at how it works.

1, Java code /**

  • Returns a canonical representation for the string object.
  • <p>
  • A pool of strings, initially empty, is maintained privately by the
  • class String.
  • <p>
  • When the intern method is invoked, if the pool already contains a
  • string equal to this String object as determined by
  • the {@link #equals(Object)} method, then the string from the pool is
  • returned. Otherwise, this String object is added to the
  • pool and a reference to this String object is returned.
  • <p>
  • It follows that for any two strings s and t.
  • s.intern() == t.intern() is true
  • if and only if s.equals(t) is true.
  • <p>
  • All literal strings and string-valued constant expressions are
  • Interned. String literals are defined in section 3.10.5 of the
  • The < cite > The Java ™ Language Specification < cite >.
  • @return a string that has the same contents as this string, but is
  • guaranteed to be from a pool of unique strings.

    */

public native String intern(); In the String#intern method, we can see that this method is a native method, but the comment is very clear. “If the current string exists in the constant pool, the current string is returned directly. If the string is not in the constant pool, the string is put in the constant pool and returned.

2, Native code after JDK 7, Oracle takes over the Java source code is not open to the public, according to the JDK’s main developer statement OpenJDK7 and JDK7 use the same main code, but the branch code will be slightly changed. So you can directly trace the OpenJDK7 source code to explore the intern implementation.

Native implementation code: \ openJDK7 \ JDK \ SRC \share\native\ Java \lang\ string.c

Java_java_lang_String_intern(JNIEnv *env, jobject this)

{

return JVM_InternString(env, this);  

}

\openjdk7\hotspot\src\share\vm\prims\jvm.h

/ *

  • java.lang.String

    */

    JNIEXPORT jstring JNICALL

    JVM_InternString(JNIEnv *env, jstring str);

    \openjdk7\hotspot\src\share\vm\prims\jvm.cpp

// String support ///////////////////////////////////////////////////////////////////////////

JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))

JVMWrapper(“JVM_InternString”);

JvmtiVMObjectAllocEventCollector oam;

if (str == NULL) return NULL;

oop string = JNIHandles::resolve_non_null(str);

oop result = StringTable::intern(string, CHECK_NULL);

return (jstring) JNIHandles::make_local(env, result);

JVM_END

\openjdk7\hotspot\src\share\vm\classfile\symbolTable.cpp

oop StringTable::intern(Handle string_or_null, jchar* name,

                    int len, TRAPS) {  

unsigned int hashValue = java_lang_String::hash_string(name, len); int index = the_table()->hash_to_index(hashValue); oop string = the_table()->lookup(index, name, len, hashValue); // Found if (string ! = NULL) return string; // Otherwise, add to symbol to table return the_table()->basic_add(index, string_or_null, name, len,

                            hashValue, CHECK_NULL);  

}

\openjdk7\hotspot\src\share\vm\classfile\symbolTable.cpp

oop StringTable::lookup(int index, jchar* name,

                    int len, unsigned int hash) {  

for (HashtableEntry<oop>* l = bucket(index); l ! = NULL; l = l->next()) {

if (l->hash() == hash) { if (java_lang_String::equals(l->literal(), name, len)) { return l->literal(); }}

} return NULL; Java uses JNI to call the C ++ StringTable’s intern method. The StringTable’s intern method is similar to the Java implementation of HashMap, except that it does not automatically expand. The default size is 1009.

The String Pool of a String is a fixed size Hashtable. The default size is 1009. If there are too many strings in the String Pool, it will cause serious Hash conflicts and thus make the list very long. The immediate effect of having a long list is that the performance will drop dramatically when you call String.intern (because you have to find it one by one).

In JDK6 StringTable is fixed, which is 1009 in length, so too many strings in the constant pool can be inefficient very quickly. In JDK7, the length of a StringTable can be specified with a single argument:

-XX:StringTableSize=99991 -XX:StringTableSize=99991 -XX:StringTableSize=99991 -XX:StringTableSize=99991 This type of problem is designed to test the programmer’s understanding of the constant pool of string objects. The statement above creates two objects. The first object is the String “ABC” stored in the constant pool, and the second object is a String in the Java Heap.

Here’s a snippet of code:

public static void main(String[] args) {

String s = new String("1");
s.intern();
String s2 = "1";
System.out.println(s == s2);

String s3 = new String("1") + new String("1");
s3.intern();
String s4 = "11";
System.out.println(s3 == s4);

} The print result is

JDK6 false false false false false false false false false false false false false false false false false false false false false false false false false false false String s4 = “11”; String s4 = “11”; behind The s.i ntern (); String s2 = “1”; behind What’s the result

public static void main(String[] args) {

String s = new String("1");
String s2 = "1";
s.intern();
System.out.println(s == s2);

String s3 = new String("1") + new String("1");
String s4 = "11";
s3.intern();
System.out.println(s3 == s4);

} The printed result is:

False false under jdk6 and false false under jdk7

1, Explanation in JDK6

Note: The green lines in the figure represent the contents of the string object. The black lines represent addresses.

This is shown in the figure above. To begin with, in JDK6, all of the above prints are false because the constant pool in JDK6 is placed in the Perm area, which is completely separate from the normal Java Heap area. As mentioned above, strings declared in quotes are generated directly in the String constant pool, and new strings are placed in the Java Heap area. So comparing the object address of a Java Heap region to the object address of the String constant pool is definitely not the same, even calling the String.intern method doesn’t matter.

2. Explanation in JDK7

Let’s talk about what’s going on in JDK 7. To be clear, in JDK6 and earlier versions, the constant pool of a string is placed in the Perm section of the heap. The Perm section is a static section that stores information about the loaded class, the constant pool, method fragments, etc. The default size is only 4M. Once the extensive use of intern is in the constant pool can directly produce Java lang. OutOfMemoryError: PermGen space. So in the JDK7 release, the string constant pool has been moved from the PERM area to the normal Java Heap area. One of the main reasons for the move is that the Perm area is too small, and of course it is reported that JDK8 has eliminated the Perm area directly and created a new metasale. JDK developers have decided that the PERM area is not suitable for the current development of Java.

After the string constant pool is moved to the Java Heap region, we will explain why the above print results occur.

In the first code, first look at the S3 and S4 strings. String s3 = new String(“1”) + new String(“1”); Two final objects are now generated in this code, which are the “1” in the string constant pool and the object to which the S3 reference in the Java Heap points. There are also two anonymous new Strings (“1”) in the middle that we’re not going to talk about. At this point, the S3 reference object content is “11”, but there is no “11” object in the constant pool at this time. The s3. Intern (); This is a code, the s3 in the “11” String in the String constants in the pool, because at this point in the constant pool does not exist “11” String, so the conventional approach is to jdk6 has said, in the figure in the constant pool to generate a “11” object, The key point is that the constant pool in JDK7 is no longer in the Perm area. This has been adjusted. There is no need to store a single object in the constant pool; you can store references directly in the heap. This reference points to the object referenced by S3. That is, the reference address is the same. Finally String s4 = “11”; In this code, the “11” is declared, so it is created directly in the constant pool. When it is created, it is found that the object already exists, which is a reference to the S3 reference object. So the S4 reference points to the same thing as the S3 reference. So the final comparison s3 == s4 is true.

Let’s look at the s and s2 objects. String s = new String(“1”); The first line of code generates two objects. The “1” in the constant pool and the string object in the Java Heap. s.intern(); The S object looks in the constant pool and finds the “1” is already in the constant pool.

Then String s2 = “1”; This code generates a reference to s2 pointing to the “1” object in the constant pool. The result is that the reference addresses of s and s2 are significantly different. It’s very clear in the picture.

Take a look at the second piece of code, as shown in the second figure above. The change to the first and second pieces of code is s3.intern(); String s4 = “11”; String s4 = “11”; After the. Thus, first execute String s4 = “11”; When S4 is declared, there is no “11” object in the constant pool. After execution, “11” object is the new object generated by S4 declaration. Then execute s3.intern(); The “11” object already exists in the constant pool, so references to S3 and S4 are different. S and s2 in the second code, s.ntern (); String s = new String(“1”); String s = new String(“1”); String s = new String(“1”); The “1” object has been generated by the The s2 declarations below are referenced directly from the constant pool. The reference addresses of s and s2 will not be equal.

As you can see from the example code above, both the intern operation and the constant pool have been modified in the JDK7 version. It mainly includes two points:

When the pool of String constants is moved from the PERM section to the String#intern method in the Java Heap section, if an object exists in the Heap, the reference to the object is saved directly without recreating the object. Let’s take a look at a more common example of using the String#intern method.

The code is as follows:

static final int MAX = 1000 * 10000;

static final String[] arr = new String[MAX];

public static void main(String[] args) throws Exception {

Integer[] DB_DATA = new Integer[10];
Random random = new Random(10 * 10000);
for (int i = 0; i < DB_DATA.length; i++) {
    DB_DATA[i] = random.nextInt();
}
long t = System.currentTimeMillis();
for (int i = 0; i < MAX; i++) {
    //arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length]));
     arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length])).intern();
}

System.out.println((System.currentTimeMillis() - t) + "ms");
System.gc();

} -xmx2g -xms2g -xmn1500m -xmx2g -xmn1500m -xmx2g -xmn1500m -xmx2g -xmn1500m -xmx2g -xmn1500m -xmx2g -xmn1500m -xmx2g -xmn1500m The results are shown below

2160ms

Use the intern

826ms

Don’t use intern

From the above results, we found that the code without intern generated 1000W strings, occupying approximately 640M of space. We used the intern code to generate 1345 strings, taking up about 133K in total space. In fact, by observing that only 10 strings are used in the program, it should be exactly 100W off. While this example is a bit extreme, it does accurately reflect the significant space savings resulting from the use of intern.

Careful students will find that there is an increase in time after using the intern method. That’s because it takes me a lot of time to do a new String and then intern, which is inevitable if I have a lot of memory, but I don’t think I have an infinite amount of memory. Not using the intern will result in JVM garbage collection time that is much greater than this. After all, it took 1,000 times for the intern to get a little more than one second.

Having looked at the use of the intern and the theory of the intern, let’s look at a problem that can result from the improper use of the intern.

When using FastJSON to read the interface, we found that after reading nearly 70W pieces of data, our log printing became very slow. Each log printing took about 30ms. If we printed more than 2 or 3 logs in a request, we found that the request would take more than twice as long. The problem disappeared after restarting the JVM. As you continue to read the interface, the problem reappears. Now let’s look at the process that went wrong.

1. Find the cause of the problem according to the log4j printed log

It takes a very long time to print logs using log4j#info. So the housemd software is used to track the time stack of the info method.

trace SLF4JLogger. trace AbstractLoggerWrapper: trace AsyncLogger org/apache/logging/log4j/core/async/AsyncLogger.actualAsyncLog(RingBufferLogEvent) sun.misc.Launcher$AppClassLoader@109aca82 1 1ms org.apache.logging.log4j.core.async.AsyncLogger@19de86bb org/apache/logging/log4j/core/async/AsyncLogger.location(String) sun.misc.Launcher$AppClassLoader@109aca82 1 30ms org.apache.logging.log4j.core.async.AsyncLogger@19de86bb org/apache/logging/log4j/core/async/AsyncLogger.log(Marker, String, Level, Message, Throwable) sun.misc.Launcher$AppClassLoader@109aca82 1 61ms org.apache.logging.log4j.core.async.AsyncLogger@19de86bb The code comes out of the method asynclogger.location. Inside the main is to call for the return Log4jLogEvent. CalcLocation (fqcnOfLogger); And Log4jLogEvent calcLocation ()

Log4jLogEvent. CalcLocation () code is as follows:

public static StackTraceElement calcLocation(final String fqcnOfLogger) {

if (fqcnOfLogger == null) {  
    return null;  
}  
final StackTraceElement[] stackTrace = Thread.currentThread().getStackTrace();  
boolean next = false;  
for (final StackTraceElement element : stackTrace) {  
    final String className = element.getClassName();  
    if (next) {  
        if (fqcnOfLogger.equals(className)) {  
            continue;  
        }  
        return element;  
    }  
    if (fqcnOfLogger.equals(className)) {  
        next = true;  
    } else if (NOT_AVAIL.equals(className)) {  
        break;  
    }  
}  
return null;  

} Thread.currentThread().getStackTrace(); The problem.

Thread.currentThread().getStackTrace() : String#intern

Thread.currentThread().getStackTrace(); Native method:

public StackTraceElement[] getStackTrace() {

if (this ! = Thread.currentThread()) { // check for getStackTrace permission SecurityManager security = System.getSecurityManager(); if (security ! = null) { security.checkPermission( SecurityConstants.GET_STACK_TRACE_PERMISSION); } // optimization so we do not call into the vm for threads that // have not yet started or have terminated if (! isAlive()) { return EMPTY_STACK_TRACE; } StackTraceElement[][] stackTraceArray = dumpThreads(new Thread[] {this}); StackTraceElement[] stackTrace = stackTraceArray[0]; // a thread that was alive during the previous isAlive call may have // since terminated, therefore not having a stacktrace. if (stackTrace == null) { stackTrace = EMPTY_STACK_TRACE; } return stackTrace; } else { // Don't need JVM help for current thread return (new Exception()).getStackTrace(); }

}

private native static StackTraceElement[][] dumpThreads(Thread[] threads); Download the source code of OpenJDK7 to query the native implementation code of JDK. The list is as follows (due to space problems, the code involved is not listed in detail here. If you are interested, you can find the relevant code according to the file name and line number) :

\openjdk7\jdk\src\share\native\java\lang\Thread.c \openjdk7\hotspot\src\share\vm\prims\jvm.h line:294: \openjdk7\hotspot\src\share\vm\prims\jvm.cpp line:4382-4414: \openjdk7\hotspot\src\share\vm\services\threadService.cpp line:235-267: \openjdk7\hotspot\src\share\vm\services\threadService.cpp line:566-577: \ openjdk7 \ hotspot \ SRC \ share \ \ classfile \ javaClasses vm CPP line: 1635-1651165, 4165, 8] [:

After tracing the underlying JVM source code, it was the following three pieces of code that caused the entire program to slow down.

oop classname = StringTable::intern((char*) str, CHECK_0); oop methodname = StringTable::intern(method->name(), CHECK_0); oop filename = StringTable::intern(source, CHECK_0); The three pieces of code are to get the class name, method name, and file name. Because the class name, method name, and file name are all stored in a pool of string constants, they are fetched each time using the String#intern method. What is not taken into account is that the default StringPool has a length of 1009 and is immutable. Therefore, once the string size in the constant pool reaches a certain size, the performance degrades dramatically.

3, fastJSON improper use of String#intern

The reason this intern is slow is because fastJSON uses the String#intern method incorrectly. Tracking implementation code discovery in FastJSON,

com.alibaba.fastjson.parser.JSONScanner#scanFieldSymbol()

if (ch == ‘\”‘) {

bp = index;
this.ch = ch = buf[bp];
strVal = symbolTable.addSymbol(buf, start, index - start - 1, hash);
break;

}

com.alibaba.fastjson.parser.SymbolTable#addSymbol():

/ * *

  • Constructs a new entry from the specified symbol information and next entry reference.

    */

public Entry(char[] ch, int offset, int length, int hash, Entry next){

characters = new char[length];
System.arraycopy(ch, offset, characters, 0, length);
symbol = new String(characters).intern();
this.next = next;
this.hashCode = hash;
this.bytes = null;

} fastJSON uses the intern method for all JSON keys, caching them into a pool of string constants, so that each read is very fast, greatly reducing the time and space. And the JSON key is always the same. This place does not take into account the large number of JSON keys that, if they were variable, would place a significant burden on the string constant pool.

FastJSON has fixed this vulnerability in version 1.1.24. The program has added a maximum cache size beyond which it will not be added to the string constant pool.

[1.1.24 version of the com. Alibaba. Fastjson. Parser. SymbolTable# addSymbol () Line: 113] code

public static final int MAX_SIZE = 1024;

if (size >= MAX_SIZE) {

return new String(buffer, offset, len);

} This problem is caused by 70 watts of data, if it is millions of data, it may be more than 30ms problem. So be careful when using the String#intern mode provided at the system level!

In this article, we describe the common use of String#intern and string constant pools, the differences between JDK versions and String#intern methods, and the dangers of improper use of String#intern. Let us in the use and contact it can avoid some bugs, enhance the robustness of the system.