1. The string

1.1. String Creation (JDK8)

1.1.1. char[] array creation

String s = new String(new char[] {'a'.'b'.'c'});
Copy the code

Byte [] Array creation

String s = new String(new byte[] {97.98.99});
Copy the code

Byte [] will be converted to char[] when constructed, depending on the character set (size).

In GBK character set conversion, two byte 0xD5 and 0xC5 are converted to a char 0x5F20.

String s = new String(new byte[] {(byte) 0xD5, (byte) 0xC5}, Charset.forName("gbk"));
Copy the code

In utF-8 character set conversion, the three byte types 0xE5, 0xBC, and 0xA0 are converted to a char type 0x5F20.

String s = new String(new byte[] {(byte) 0xE5, (byte) 0xBC, (byte) 0xA0}, Charset.forName("utf-8"));
Copy the code

1.1.3. int[] Array creation

Sometimes we also need two chars to represent a character, such as 😂, which is 0x1F602 in Unicode. The storage range is already beyond the maximum char can represent 0xFFF, so int[] is used to construct such a string.

String s = new String(new int[] {0x1F602}, 0.1);
Copy the code

An int 0x1F602 is converted to two char 0xD83D and 0xDE02 (emoji [😂]).

1.1.4. Create from an existing string

Pass in a source string and create a new string.

public String(String original) {
    this.value = original.value;
    this.coder = original.coder;
    this.hash = original.hash;
}
Copy the code

Method of use

String s1 = new String(new char[] {'a'.'b'.'c'});
String s2 = new String(s1);
Copy the code

According to the constructor, the values in both strings refer to the same char[] array.

1.1.5. Literal creation

The most common creation method.

int i = 10;			// 10 is a literal
String s = "abc";	// "ABC" is a string literal
Copy the code

[Non-object]

A literal is not a string object until the code runs into its statement.

After the above Java code is compiled into a class file, the “ABC” is stored in the class file constant pool.

Lazy loading

Objects are created only when the literal code is executed.

// Suppose there are now 100 String objects
System.out.println("a");	// There are 101 String objects after execution
System.out.println("b");	// There are 102 String objects after execution
System.out.println("c");	// There are 103 String objects after execution
Copy the code

【 Not repeated 】

For the same literal in the same class, there is only one literal in the class file constant pool at compile time and only one string object created at run time.

String s1 = "abc";
String s2 = "abc";
System.out.println(s1 == s2);	// true
Copy the code

The same literal in different classes, multiple literals in the class file constant pool at compile time, only one string object is created at run time.

public class StringTest {
    public static void main(String[] args) {
        String s1 = "abc";
        String s2 = "abc";
        StringTest2.main(newString[]{s1, s2}); }}public class StringTest2 {
    public static void main(String[] args) {
        String s = "abc";
        System.out.println(s == args[0]);	// true
        System.out.println(s == args[1]);	// true}}Copy the code

1.1.6. Splicing creation

Concatenate two strings into a new string using the plus operator +.

//
String s1 = "a" + "b";
// = = = = = = = =
final String x = "b";
String s2 = "a" + x;
// ③ Literals + variables
String x = "b";
String s3 = "a" + x;
// 4
String s4 = "a" + 1;
Copy the code

From the perspective of principle, ① and ② have the same principle, ③ and ④ have the same principle.

① : No real splicing operation. At compile time, “A” and “B” are already strung together.

Constant pool:
    #1 = Methodref	#4.#20		// java/lang/Object."<init>":()V
    #2 = String		#21		// ab
Copy the code

② : There is no real splicing operation. Final means that the value of x cannot be changed, and all other references to x are safely replaced with “b” at compile time.

Constant pool:
    #1 = Methodref	#5.#22		// java/lang/0bject. "<init>":()V
    #2 = String		#23		// b
    #3 = String		#24 		// ab
Copy the code

③ : there is a real splicing operation. At compile time, the plus operator + is replaced with StringBuilder for string concatenation.

Constant pool: 
    #1 = Methodref	#9.#26		// fava/1ang/0bject. "<init>":()V
    #2 = String		#27		// b
    #3 = Class		#28		// java/lang/StringBuilder
    #4 = Methodref	#3.#26		// java/lang/StringBuilder. "<init>":()V
    #5 = String		#29		// a
Copy the code

Since there is no true string concatenation, the source code is compiled into StringBuilder for execution.

String x = "b";
String s3 = new StringBuilder().append("a").append(x).toString();
Copy the code

In effect, the toString() method creates a new String object from the values array it maintains.

public final class StringBuilder extends AbstractStringBuilder
    implements java.io.Serializable.Comparable<StringBuilder>, CharSequence {    
    AbstractStringBuilder > AbstractStringBuilder > AbstractStringBuilder > AbstractStringBuilder
    char[] value;	// JDK 9 uses byte[] value;
    
    public String toString(a) {
        return new String(value, 0, count); }}Copy the code

④ : The principle is exactly the same as ③.

1.2. Changes to JDK 9

1.2.1. Memory structure changes

To save memory, byte[] instead of char[] stores characters.

(1) If the string contains only Latin characters, byte indicates that the string occupies only one byte, while char indicates that the string occupies two bytes.

String s = new String(new byte[] {97.98.99});
Copy the code

② If the string contains only Chinese characters, the memory space cannot be saved.

String s = new String(new byte[] {(byte) 0xD5, (byte) 0xC5}, Charset.forName("gbk"));
Copy the code

③ If the string contains both Latin and Chinese characters, the Latin characters will be treated as Unicode characters, occupying two bytes, thus saving no memory space.

String s = new String(new byte[] {(byte) 0xD5, (byte) 0xC5.97}, Charset.forName("gbk"));
Copy the code

1.2.2. Changes in stitching methods

JDK 8 replaces the + operator with StringBuilder for string concatenation. By default, JDK 9 uses the bytecode instruction InvokeDynamic, which reflects the implementation of concatenation methods for string concatenation.

public static void main(String[] args) throws Throwable {
    String x = "b";
    // String s = "a"+ x;
    // The following equivalent bytecode is generated
    
    // the compiler provides a lookup to lookup the MethodHandle
    MethodHandles.Lookup lookup = MethodHandles.lookup();
    CallSite callSite = StringConcatFactory.makeConcatWithConstants(
        lookup,
        // The method name is not important, the compiler will automatically generate
        "arbitrary".The first String is the return value type followed by the input parameter type
        MethodType.methodType(String.class, String.class),
        // Concrete prescription format, where \1 means placeholder for variable, to be replaced by x in the future
        "a\1"
    );
    // callsite.gettarget () returns a MethodHandle object that reflects the execution of the concatenation method
    String s = (String) callSite.getTarget().invoke(x);
}
Copy the code

Why bother! ! ! Mainly in order to string splicing to do a variety of extension optimization, the extension of the way. Is one of the most important MethodHandle, it USES the Strategy pattern generation, the JDK can provide all of the strategies in StringConcatFactory. Found in the Strategy:

Policy name Internal calls explain
BC_SB Bytecode stitching generates StringBuilder code Equivalent to new StringBuilder()
BC_SB_SIZED Bytecode stitching generates StringBuilder code Equivalent to new StringBuilder(n), where n is the estimated size
BC_SB_SIZED_EXACT Bytecode stitching generates StringBuilder code Equivalent to new StringBuilder(n), where n is the exact size
MH_SB_SIZED MethodHandle generates StringBuilder code Equivalent to new StringBuilder(n), where n is the estimated size
MH_SB_SIZED_EXACT MethodHandle generates StringBuilder code Equivalent to new StringBuilder(n), where n is the exact size
MH_NLINE_SIZED_EXACT MethodHandle constructs strings directly internally using byte arrays The default policy

If you want to change the policy, you can add JVM parameters at run time, such as changing the policy to BC_SB

-Djava.lang.invoke.stringConcat=BC_SB -Djava.lang.invoke.stringConcat.debug=true - Djava. Lang. Invoke. StringConcat. DumpClasses = anonymous class path are derivedCopy the code

1.2.3. Default splicing policy

The default policy is MH_NLINE_SIZED_EXACT, which uses byte arrays to construct strings directly.

For example, have the following string concatenation code

String x = "b";
String s = "a"+ x + "c"+ "d";
Copy the code

If the MH_NLINE_SIZED_EXACT policy is used, the following equivalent calls are made internally

String x = "b";
// Preallocates an array of bytes required by the string
byte[] buf = new byte[4];
// create a new string with an internal byte array value of [0,0,0,0]
String s = StringConcatHelper.newString(buf, 0);
// perform [concatenation], string internal byte array value is [97,0,0,0]
StringConcatHelper.prepend(1, buf, "a");
// perform [concatenation], the internal byte array value of the string is [97,98,0,0]
StringConcatHelper.prepend(2, buf, x);
// perform [concatenation], string internal byte array value is [97,98,99,100]
StringConcatHelper.prepend(4, buf, "cd");
// This is the end of the line.
Copy the code


2. StringTable

2.1. Domestic and wild

Of the six ways to create a String, all strings are wild, with the exception of literal strings that are domestic.

  • Domestic: Strings created literally are added to StringTable. Strings managed by StringTable are not repeated.
  • Wild:char[].byte[].int[].String, as well as+The way is essentially usenewCreating new string objects in the heap, regardless of string duplication, is a heavy memory drain.

The JDK uses a StringTable (which is a hash table in its data structure), where a string object acts as a key in the hash table. Non-repeatability of keys is a basic feature of hash tables.

[Example code]

String s1 = "abc";		/ / domestic
String s2 = "abc";		/ / domestic
String s3 = new String(new char[] {'a'.'b'.'c'});	/ / wild
String s4 = "a" + "bc"; 	/ / domestic
String x = "a";
String s5 = x + "bc";		/ / wild
System.out.println(s1 == s2);	// true
System.out.println(s1 == s3);	// false
System.out.println(s1 == s4);	// true
System.out.println(s1 == s5);	// false
Copy the code

2.2. Storage Location of StringTable

  • JDK1.6 and earlier: StringTable is in the method area.

  • JDK1.8 and later: StringTable is in heap memory.

2.3. Intern () method

Strings provide the intern() method for de-duplication, giving string objects a chance to be managed by StringTable.

// Try to put the caller into StringTable
public native String intern(a);
Copy the code

2.3.1. StringTable already exists

Intern () always returns an existing string object in a StringTable.

String x = new String(new char[] {'a'.'b'.'c'});	/ / wild
String y = "abc";		// add "ABC" to StringTable
String z = x.intern();		// If StringTable has "ABC", return "ABC" in StringTable.
System.out.println(z == x);	// false
System.out.println(z == y);	// true
Copy the code

2.3.2. StringTable does not exist (≥JDK7)

Intern () adds the caller to StringTable and returns the string object that already exists in StringTable.

String x = new String(new char[] {'a'.'b'.'c'});	/ / wild
String z = x.intern();		// StringTable does not have "ABC", add x first, return "ABC"
String y = "abc";		// if StringTable contains "ABC", it is used directly
System.out.println(z == x);	// true
System.out.println(z == y);	// true
Copy the code

2.3.3. StringTable does not exist (≤JDK6)

Intern () copies the caller, adds it to StringTable, and returns the string object that already exists in StringTable.

String x = new String(new char[] {'a'.'b'.'c'});	/ / wild
String z = x.intern();		// StringTable does not have "ABC", add "ABC" first, return "ABC"
String y = "abc";		// if StringTable contains "ABC", it is used directly
System.out.println(z == x); 	// false
System.out.println(z == y); 	// true
Copy the code

Principle of 2.3.4.

  1. Computes the hash value based on char[] and length.
  2. Calculates the bucket subscript of a Hashtable based on the hash value.
  3. Check if the string already exists in the hashtable:
    • If it already exists: returns directly.
    • If not: create a new string object, add it to the hashTable, and return.

2.4. The G1 to heavy

In JDK 8U20 and later, you can turn on the G1 garbage collector and enable string de-duplication.

-XX:+UseG1GC -XX:+UseStringDeduplication 
Copy the code

The principle is to save memory by having multiple string objects reference the same char[].

In comparison with intern, G1 has the advantage of automatic de-duplication. However, the disadvantage is that even if the char[] is not repeated, the string object itself needs some memory (object header, value reference, hash), whereas intern only stores one copy of the string object, which is less memory.



3. The interview questions

① Judgment output

String str1 = "string";			/ / domestic
String str2 = new String("string");	/ / wild
String str3 = str2.intern();		/ / domestic
System.out.println(str1 == str2);	// false
System.out.println(str1 == str3);	// true
Copy the code

② Judgment output

String baseStr = "baseStr";
final String baseFinalStr = "baseStr";
String str1 = "baseStr01";		/ / domestic
String str2 = "baseStr" + "01";		/ / domestic
String str3 = baseStr + "01";		/ / wild
String str4 = baseFinalStr + "01";	/ / domestic
String str5 = new String("baseStr01").intern();	// From wild to domestic
System.out.println(str1 == str2);	// true
System.out.println(str1 == str3);	// false
System.out.println(str1 == str4);	// true
System.out.println(str1 == str5);	// true
Copy the code

③ Judge the output (different versions)

String str2 = new String("str") + new String("01");
str2.intern();	// 1.6 Copy a copy to add; 1.7 Direct Add
String str1 = "str01";
System.out.println(str1 == str2);	// [1.6] false, [1.7] true
Copy the code

④ Judge the output

String str1 = "str01";
String str2 = new String("str") + new String("01");
str2.intern();	// str2 itself does not change, str1==str3 when the return value is received by str3
System.out.println(str1 == str2);	// false
Copy the code

5.String s = new String("xyz");How many string objects are created?

2.” Xyz “is created in StringTable, and new String() is created in heap memory. But both refer to the same char[] array.

⑥ Judge output

String str1 = "abc";			/ / domestic
String str2 = "abc";			/ / domestic
System.out.println(str1 == str2);	// true
Copy the code

⑦ Judgment output

String str1 = new String("abc");	/ / wild
String str2 = new String("abc");	/ / wild
System.out.println(str1 == str2);	// false
Copy the code

⑧ Judge output

String str1 = "abc";			/ / domestic
String str2 = "a";
String str3 = "bc";
String str4 = str2 + str3;		// wild, variable + variable, StringBuilder splice
System.out.println(str1 == str4);	// false
Copy the code

⑨ Judgment output

String str1 = "abc";			/ / domestic
final String str2 = "a";
final String str3 = "bc";
String str4 = str2 + str3;		// constant + constant, compile phase stitching
System.out.println(str1 == str4);	// true
Copy the code

⑩ Determine output

String s = new String("abc");
String str1 = "abc";
String str2 = new String("abc");
System.out.println(s == str1.intern());		// false
System.out.println(s == str2.intern());		// false
System.out.println(str1 == str2.intern());	// true
Copy the code