String is one of the basic classes commonly used in Java to represent String types, but it is more special than other objects because it is closely related to the String Pool. The STRING constant pool in the JVM specification is an area of the method area where Interned Strings reside. It is an optimized area for string storage that is shared across the virtual machine and moved into heap space in JDK7 and later versions.
How is a String created
There are two ways to create a String object:
- Literal assignment
String s = "Hello World";
- The new keyword is created
String s = new String("Hello World");
On the surface, there is no difference between the two methods, and they have no effect on subsequent use, but real storage is different. Taking a look at the way literals are assigned, the important bits of bytecode are picked up below.
Constant pool:
#1 = Methodref #4.#23 // java/lang/Object."<init>":()V
#2 = String #24 / / how are you. #24= Utf8 hello...... Code: stack=1, locals=2, args_size=1
0: ldc #2 / / the String hello
2: astore_1
......
LocalVariableTable:
Start Length Slot Name Signature
0 6 0 args [Ljava/lang/String;
3 3 1 s Ljava/lang/String;
Copy the code
After compiling, it stores the value of the current literal in the bytecode Constant pool. There are only two instructions to execute the literal. LDC loads the string Constant from the Constant pool. Astore_1 assigns a string to S in the local variable table. When the above bytecode is loaded into the JVM, the contents of the Constant pool are loaded into the runtime Constant pool, but the string is created into the heap, and the string Constant pool holds a reference. Until run time, the string constant pool looks for references to the same content object, otherwise a new string object is created in the heap and a reference to that object is created in the string constant pool.
By looking at the object in the heap and looking up the string, there is only one instance of the object in the heap, and the object has a unique reference, so it is inferred to be a reference in the string constant pool.
Looking at the way the new keyword is created, there are only more instructions to create objects and perform initialization functions from the bytecode alone, as well as generating constant pool literals. The first object has a reference in the string constant pool, and the second object has no reference in the heap, which means it is the object created by new, because the variable s is on the stack.
Constant pool:
#1 = Methodref #6.#24 // java/lang/Object."<init>":()V
#2 = Class #25 // java/lang/String
#3 = String #26 / / how are you. #26= Utf8 hello...... Code: stack=3, locals=2, args_size=1
0: new #2 // class java/lang/String
3: dup
4: ldc #3 / / the String hello
6: invokespecial #4 // Method java/lang/String."
":(Ljava/lang/String;) V
9: astore_1
......
LocalVariableTable:
Start Length Slot Name Signature
0 13 0 args [Ljava/lang/String;
10 3 1 s Ljava/lang/String;
Copy the code
Either way, a String object is first checked to see if a reference to the corresponding String exists in the String constant pool, returned if it exists, created on the heap if it does not exist, and then created in the String constant pool and returned. The direct assignment method takes the reference directly, and the new creation method goes through the same steps first, but creates a new object at run time. This leads to a common interview question: How many objects does String create? The following will be illustrated in detail through cases.
String is a combination of the two creation methods
The biggest difference between literal creation and new keyword creation is the way objects are generated in the heap. So what happens when you mix these two methods together? The following example shows all the blending methods:
public static void main(String[] args) {
String s1 = "String";
String s2 = "String";
String s3 = "Character" + "String";
String s4 = "Character" + new String("String");
String s5 = new String("String");
String s6 = "Word";
String s7 = "String";
String s8 = s6 + s7;
System.out.println(s1 == s2);
System.out.println(s1 == s3);
System.out.println(s1 == s4);
System.out.println(s1 == s5);
System.out.println(s1 == s8);
}
Copy the code
When s1 is created, string objects are generated in the heap and the reference is placed in the string constant pool. When S2 is created, the reference is directly retrieved from the string constant pool, so s1 and S2 are the same object. Upon checking that S1 and S3 are also equal, a look at the bytecode shows that the compiler generates literals of “string” directly, because the compiler is optimized to add literals directly and merge them. S4 is not equal to S1, and s5 is obviously not equal to s1. S8 looks similar to S3 on the surface, but is not as optimized as S3 and is implemented through StringBuilder like S4, so it is not equal to S1.
Intern () method of String
The intern() method is a special one for strings and is rarely used. It is a native method and is closely related to String constant pools. The method documentation describes it as follows: If there is an object in the string constant pool that has the same content as the current string, the reference to the object in the pool is returned directly, otherwise the current string is added to the string constant pool and the reference is returned.
This method is well documented, so let’s take a look at some complex use cases and see what needs to be noted:
String s1 = new String("String");
s1.intern();
String s2 = "String";
System.out.println(s1 == s2); // false
=====================================================
String s3 = new String("Word") + new String("String");
s3.intern();
String s4 = "String";
System.out.println(s3 == s4); // true
Copy the code
These two code execution result is unexpected, first of all, the first piece of code is easier to understand, s1 is equivalent to create a new object, s1. The intern () puts in the literal string constants in the pool, actually did not produce any operation here, because in the first line of code is, a string constant pool existed the string object reference; The result of the second code is a little harder to understand. Doesn’t S3 create new objects? From the previous section we can see that S3 is created like this:
// The following is pseudo-code based on bytecode, which is different from the actual code executed
StringBuilder builder = new StringBuilder();
builder.append(new String("Word"));
builder.append(new String("String"));
return builder.toString();
Copy the code
// StringBuilder source code
public String toString(a) {
// Create a copy, don't share the array
return new String(value, 0, count);
}
Copy the code
It is also creating a new String, but without putting anything into the String constant pool. S3. Intern () then pooled the s3 object reference, and s4 will fetch the reference from the pool.
Three constant pools in the JVM
In this article, we refer to the String Constant Pool. There are also Constant pools that exist in bytecode, and there is also a Runtime Constant Pool that belongs to the JVM Runtime data area. These three are the most important constant pools in the JVM.
1. String constant pool
Strings are the most commonly used type in the JVM, so there is a place to cache strings to avoid frequent creation and destruction. Besides, String is an immutable object, so a globally shared String constant pool is a good choice. The logical location of the string constant pool is also moved from the method area to the heap, depending on the version optimization.
2. Bytecode constant pool
When compiled, every class in Java generates a. Class bytecode file, which is simply a binary file that the JVM recognizes. The important part of the file is the constant pool, which records literal and symbolic references in the current class, such as text strings, constant values declared final, Symbolic references include fully qualified names of classes and interfaces, field names and descriptors, and method names and descriptors. Each constant in the constant pool is a table, and the structure, size, and type of representation of each type of table are strictly regulated by the Java Virtual Machine specification.
3. Runtime constant pool
The runtime constant pool is part of the method area, and the contents of the bytecode constant pool are stored in the runtime constant pool after the bytecode is loaded. Thus, each class has a small space, and the original symbolic references are also parsed into direct references during parsing.
The runtime constant pool is also dynamic compared to the bytecode constant pool, and the runtime can also put new constants into the pool, as described in the intern() method above.
How many objects are created?
String s1 = new String(" String ");
If you only consider objects in the heap, this line creates: 1 or 2 objects. First the compile-time “string” object is created and the reference is put into the string constant pool, and second the run-time new keyword creates a new object. Then there are 1 cases where the strings used already exist in the string constant pool, some of which are used by the previous code, and some of which are automatically loaded by the JVM at startup, such as Java, etc.
So that’s an in-depth look at the String class, and the String constant pool.
The resources
- Understanding the Java Virtual Machine in Depth: Advanced JVM features and Best Practices. Zhipeng Zhou
- The Java Virtual Machine Specification (Java SE 8 edition) — Tim Lindholm
- Understand Java string constant pooling and intern() methods
- Learn more about Java string constant pools
Text address – String class and String constant pool learning