preface

Recently, I thought of a string related interview question I was asked in the interview. Even though I was almost exhausted, it was still easy to get confused when it came to the interview. This is a complete solution to the hidden danger. It’s time for gold, silver and silver again. Before you get started, take a look at some of the interviewer’s favorite questions.

Is String thread safe? Why is that?

String Pool? How is it implemented underneath?

String a = new String(“java”); How many objects will be created?

Why is String designed to be immutable?

Why did the underlying implementation of String switch from char[] to byte[] in JDK9?

Has the String#intern() method been used? What are the internal effects of the underlying logic?

In conclusion, after soul-searching, the Java String investigation focuses on these four main points:

  • immutability
  • String constant pool
  • String#intern()
  • Changes to the underlying implementation

Before really entering the main body, I have picked several questions that are often found in the written interview, attached with the answers, combined with the following will be a complete explanation.

Title 1:

String a = "a" + "b";
String b = "ab";
System.out.println(a == b); //true
Copy the code

Topic 2:

String s1 = "a" + "b";
String s2 = "a";
String s3 = s2 + "b";
System.out.println(s1 == s3); //false
Copy the code

Title 3:

String s1 = "ab";
final String s2 = "a";
final String s3 = "b";
String s4 = s2 + s3;
String s5 = ("a" + s3).intern();
System.out.println(s1 == s4);  //true
System.out.println(s1 == s5);  //false
Copy the code

If there is any doubt about the internal processing mechanism of the three problems, the following sections will take a closer look at each step. Finally, add the difference between ‘==’ and equals() :

‘==’ compares the (heap) memory addresses of objects stored in the variable (stack) memory to determine whether the addresses of two objects are the same, that is, whether they are the same object. Note that:

  • It compares whether the operands at both ends of the operator are the same object.
  • The operands on both sides must be of the same type (possibly between parent and child classes) to compile.
  • Long b=10L and double C =10.0 are the same, because they both refer to the heap at address 10.

‘equals()’ is used to compare whether two objects are equal. Since all classes inherit from java.lang.Object, we define an equals method in the base class of Object. The initial action of this method is to compare the memory addresses of objects. But the String class overrides equals to compare the contents of strings rather than the locations of classes in the heap.

Equals equals equals equals equals equals equals equals equals equals equals equals The String class overrides equals to compare the contents of strings.

Immutability of a String

The official documentation describes String as follows:

Strings are constant; their values cannot be changed after they are created.

Strings are constants, and their values cannot be changed once they are created. The bottom layer of String is implemented using char[] and is decorated with the final keyword.

Modifier class: Indicates that this class will not be inherited. In other words, String has no subclasses! When modifying a member variable: The system does not assign a value to it (for example, if the reference type is NULL). The value needs to be assigned when the variable name is defined, or in the constructor.

But what about this operation,

String s = "a";
s = "a" + "b";
System.out.println(s);   //ab
Copy the code

The printed result is “ab”. It’s immutable, but it’s not an “A”.

Take a look at the bytecode instructions:

 0 ldc #2 <a>   // Load the string with the value a #2 from the constant pool
 2 astore_1    // Assign to variable s at index position 1
 3 ldc #3 <ab>  // Load the string ab #3 from the constant pool
 5 astore_1     //// assigns to the variable s at index position 1
 6 getstatic #4 <java/lang/System.out>
 9 aload_1
10 invokevirtual #5 <java/io/PrintStream.println>
13 return
Copy the code

As can be seen from the bytecode instruction set, the variable S has undergone two assignments, and the addresses of the two assignments are different (#2 and #3 respectively). Each assignment to S opens up a new space in memory and points a new address to S, so there is no problem with the official sentence. The value does not change but the address points change, and the final presentation of S is a new string.

Here’s an example that illustrates String immutability in detail:

public static void main(String[] args) {
        String str = "abc";
        changeString(str);
        System.out.println("str = " + str);

    }
    public static void changeString(String str2) {
        str2 += "aaa";
    }
Copy the code

The printed result is still ABC. The changeString() method still does not change the value of STR. Str2 points to STR and then str2 += “aaa” eventually points to a string like “abcaaa”.

So why is String designed to be immutable?

  1. Operating on strings in a multi-threaded environment is thread-safe and does not worry about tampering in the process;
  2. String The hash value is computed only once after the String is created.
  3. String strings can be created into String pools, and because of their immutability, String pools can be implemented.

Finally, just to recap, immutability of strings means that once a String is created, instances of that String are generated on the heap and cannot be changed. Any external

Methods do not change the string itself, but only create a new string.

Change and understanding of string constant pool

In JDK6, the implementation of the method section in the JVM is the Perm section, where the string constant pool is stored. In JDK7, the string constant pool is moved to the heap, and in JDK8 and later, the permanent generation is removed and replaced by MetaSpace.

The underlying implementation of string constant pooling is HashTable, which also means that if HashTable length is set too small, hash collisions will occur more frequently, and the linked list will be long, and when String#intern() is called, the list will be searched one by one, resulting in a significant performance degradation. In JDK8, the default length is 60013, while in JDK6, the default length is only 1009. You can adjust the length by using -xx :StringTableSize=60013.

Why should the string constant pool be moved from the permanent generation to the heap?

The Java Virtual Machine specification does not require GC to be implemented in the method area. In the HotSpot Virtual machine, GC is implemented in the method area, but it is very low compared to the heap area. In addition, the memory allocation of the method area is relatively small, and strings are placed in the heap so that GC can be collected in a timely manner.

So how do you make space in memory when you actually create strings?

Here are a few examples of how declared strings can be stored in memory in different scenarios.

Example 1:

String a = "123";
String b = new String("123");
System.out.println(a == b);  //false
Copy the code

Strings declared by literals create space in the string constant pool. Strings declared directly by new String(“123”) will first make space in the heap, If “123” already exists in the String Pool, it will not be generated. If it does not exist, it will store another copy in the String Pool, and b will still refer to the memory address in the heap, so the above result will be false. This is a classic interview question.

Note: Literals include: 1. text strings 2. eight basic types of values 3. Constants that are declared final, etc.

Example 2:

String a = "123";
String b = a + "aaa";
String c = "123aaa";
String d = "123" + "aaa";
System.out.println(b == c);    //false
System.out.println(c == d);    //true
Copy the code

Did you get it right?

Both b and D are concatenations of strings, but d can be determined at compile time. However, because the variable A is doped in B, the bottom layer will concatenate strings through StringBuilder, and then call toString() method to convert them into strings. Finally, space is created in the heap area (non-string Pool) by creating objects. So b ends up pointing to some area of heap memory (non-string Pool), and C and D both end up pointing to some location in the String Pool.

String d = “123” + “aaa”; String d = “123aaa”; `

String b = a + “aaa”; New a StringBuilder, call Append () to add “aaa”, and finally call StringBuilder#toString to assign to B. As an additional note, calling the StringBuilder#toString() method does not open up additional memory space in the String Pool.

For string concatenation containing variables, the underlying concatenation is done by calling append() of StringBuider. The following is an example of the splicing method:

String a = "123";
String b = a + "aaa";  // Splice mode 1
String c = new String("123") + "aaa";  // Splice mode 2
String d = new String("123") + new String("aaa"); //// Splicing mode 3
Copy the code

Use of the three-string intern() method

String#intern() when this method is called, if a string already exists in the string constant pool equals(), it returns the string in the pool. Otherwise, the string calling the method is added to the pool and its reference is returned.

The object of the intern() method is to ensure that there is only one copy of the string in memory, which saves memory and speeds up the execution of string operations.

Why talk about this alone? As intern() method is more and more examined in the interview process, interviewers will pick up some difficulties to conduct in-depth routi communication with candidates in order to examine the depth and breadth of candidates’ skills in various aspects. Next, summarize the G-point of intern() method, and let’s conduct in-depth communication around this point.

  1. Characteristics of intern() method were investigated

    If a String is not present in String Pool, it will be present after the call, and if it is present in String Pool, it will return a reference to it.

  2. Intern () method in JDK6 and 7 version changes resulting in effects

    In JDK6, when a string calls intern(),

    If the String Pool does not exist, copy it to the String Pool.

    If a String Pool exists, the reference address is returned.

    In JDK7, when a string calls intern(),

    If the String Pool does not exist, store the object’s reference address in the String Pool.

    If a String Pool exists, the reference address is returned.

Here’s an example of a common interview question:

Example 1(quoted from Deep Understanding Java Virtual Machine Version 3) :

String str1 = new StringBuilder("Computer").append("Software").toString();
System.out.println(str1.intern() == str1);
String str2 = new StringBuilder("ja").append("va").toString();
System.out.println(str2.intern() == str2);
Copy the code

This code, when run in JDK 6, will get two false, and when run in JDK 7, will get one true and one false.

The reason for the difference is that in JDK 6, the intern() method copies the first encountered string instance into the string constant pool of the persistent generation and returns a reference to that string instance, while the string object instances created by StringBuilder are stored on the Java heap. So it must not be the same reference, and the result will return false.

In JDK 7 (and some other virtual machines, such as JRockit) implementations of intern() no longer need to copy instances of a string to the permanent generation. Since the string constant pool has been moved to the Java heap, it only needs to record the first instance reference in the constant pool. So the reference returned by intern() is the same as the string instance created by StringBuilder.

In Depth understanding the Java Virtual Machine: Advanced JVM Features and Best Practices (3rd edition) by Zhiming Zhou

Note that the Java String already exists in the String Pool. Loading the sun.misc.Version class at vm startup puts the Java String in the String Pool.

package sun.misc;
import java.io.PrintStream;
public class Version {
    private static final String launcher_name = "java"; .Copy the code

If you want to know the detailed loading process of Java string, you can refer to zhihu’s answer.

www.zhihu.com/question/51…

Four String low-level changes

The underlying implementation of String switches from char[] to byte[]

In JDK9, an important change is that the underlying implementation of String is changed from char[] to byte[], mainly to save memory. Obviously, it also reduces the number of GC counts.

In most Java applications, strings take up the most space, and most use only Latin-1 characters (for some English letters and numbers), which can be stored in a single byte, whereas prior to JDK9, JVMS used char[] for storage. A char takes up two bytes, so half the space is wasted.

JDK9 and later strings support two encodings, Latin-1 and UTF-16. When a String cannot be stored in Latin-1, utF-16 is used.

public final class String
    implements java.io.Serializable.Comparable<String>, CharSequence.Constable.ConstantDesc {

    @Stable
    private final byte[] value;

    /**
     * The identifier of the encoding used to encode the bytes in
     * {@code value}. The supported values in this implementation are
     *
     * LATIN1
     * UTF16
     *
     * @implNote This field is trusted by the VM, and is a subject to
     * constant folding if String instance is constant. Overwriting this
     * field after construction will cause problems.
     */
    private final byte coder;
    @Native static final byte LATIN1 = 0;
    @Native static final byte UTF16  = 1;
Copy the code

To identify the encoding type of the string, a coder is introduced to indicate whether the string is encoded using Latin-1 or UTF-16.

De-re-optimization of String on the G1 collector

The G1 collector is now widely used in some major companies, but the mainstream use is still JDK8. It wasn’t until JDK9 that the G1 collector became the default garbage collector, so let’s recap the String in this section to summarize.

Having some knowledge of GC and JVM is a good plus in your interview.

As mentioned earlier, in most Java applications, the proportion of String in memory is quite large, and memory is also a major performance bottleneck limiting applications in many scenarios. The repeated use of strings in the heap is unnecessary and can be optimized. In the G1 collector, continuous decrescheduling of strings is added by calling String#equals() to compare.

Concrete implementation:

When the garbage collector works, it accesses the objects that are alive on the heap. Each accessed object is checked for a candidate String to be repealed.

  1. If so, a reference to the object is inserted into the queue for subsequent processing. A de-weight thread runs in the background, processing the queue. Processing an element of the queue means removing the element from the queue and then trying to duplicate the String it references.
  2. Use a Hashtable to record all the unique char arrays used by strings. When de-duplicating, the Hashtable is checked to see if an identical char array already exists on the heap.
  3. If it does, the String will be adjusted to refer to that array, freeing the reference to the original array, and eventually being collected by the garbage collector.
  4. If the lookup fails, the char array is inserted into the Hashtable so that the array can be shared at a later time.

Command line options

  1. UseStringDeduplication(bool) : Enable String deduplication. This function is disabled by default. You must manually enable it.
  2. PrintStringDeduplicationStatistics (bool) : print to heavy statistics in detail
  3. StringDeduplicationAgeThreshold (uintx) : reaching the age of String object is considered to candidates

Reference article or book:

Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (3rd edition) by Zhiming Chou

[String#intern]

www.zhihu.com/question/44…