This article mainly introduces some contents related to String in Java, including the implementation of String class and its invariance, the implementation of String related classes (StringBuilder, StringBuffer) and the usage and implementation of String caching mechanism.

Design and implementation of String class

The core logic of the String class is to implement String objects by encapsulating char arrays, but the implementation details have changed several times as the Java version has evolved.

Java 6

public final class String implements java.io.Serializable.Comparable<String>, CharSequence
{
    /** The value is used for character storage. */
    private final char value[];
    /** The offset is the first index of the storage that is used. */
    private final int offset;
    /** The count is the number of characters in the String. */
    private final int count;
    /** Cache the hash code for the string */
    private int hash; // Default to 0
}
Copy the code

In Java 6, the String class has four member variables: value, a char array, offset, count, and hash. The value array is used to store the sequence of characters, the offset and count attributes are used to locate the position of the string in the value array, and the hash attribute is used to cache the hashCode of the string.

The purpose of using offset and count to locate the value array is to share the value array efficiently and quickly. For example, substring() returns a substring that shares the value array with the original string by recording offset and count, rather than making a new copy. Substring () is implemented as follows:

String(int offset, int count, char value[]) {
	this.value = value;    // Reuse the original array directly
	this.offset = offset;
	this.count = count;
}
public String substring(int beginIndex, int endIndex) {
    / /... Omit some boundary checking code......
    return ((beginIndex == 0) && (endIndex == count)) ? this :
        new String(offset + beginIndex, endIndex - beginIndex, value);
}
Copy the code

This approach, however, is likely to result in memory leaks. For example, in the following code:

String bigStr = new String(new char[100000]);
String subStr = bigStr.substring(0.2);
bigStr = null;
Copy the code

After bigStr is set to null, the value array in it is still referenced by subStr, causing the garbage collector to fail to reclaim it. As a result, we only need 2 characters of space, but we actually use 100000 characters of space.

In Java 6, if you want to avoid this type of memory leak, you can use the following methods:

String subStr = bigStr.substring(0.2) + "";
/ / or
String subStr = new String(bigStr.substring(0.2));
Copy the code

After the statement is executed, the anonymous String returned by the substring method can be collected by the garbage collector because it is not referenced by any other object. It will not continue to refer to the value array in bigStr, thus avoiding a memory leak.

Java 7 & Java 8

public final class String implements java.io.Serializable.Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];
    /** Cache the hash code for the string */
    private int hash; // Default to 0
}
Copy the code

In Java 7-Java 8, Java has made some changes to the String class. The String class no longer has the offset and count variables. The substring() method also no longer shares the value array, but instead copies the array from the specified location, thus solving the memory leak that can occur with this method. Substring () is implemented as follows:

public String(char value[], int offset, int count) {
    / /... Omit some boundary checking code......

    // Copy from the original array
    this.value = Arrays.copyOfRange(value, offset, offset+count);   
}
public String substring(int beginIndex, int endIndex) {
    / /... Omit some boundary checking code......
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
            : new String(value, beginIndex, subLen);
}
Copy the code

Java 9

public final class String implements java.io.Serializable.Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final byte[] value;
    /**  The identifier of the encoding used to encode the bytes in {@code value}. */
    private final byte coder;
    /** Cache the hash code for the string */
    private int hash; // Default to 0
}
Copy the code

In order to save memory, Java 9 optimizes the implementation of String. The value variable is changed from char[] to byte[], and a new coder variable is added. We know that in Java, char takes two bytes, which is a bit wasteful for characters that only take one byte (for example, a-z, a-z), so in Java 9, char[] is changed to byte[] to store character sequences. The new attribute coder is used to indicate whether the value array contains double-byte encoded characters or single-byte encoded characters. The coder attribute can have two values, 0 for Latin-1 (single-byte encoding) and 1 for UTF-16 (double-byte encoding). When creating a string, if it is determined that all characters can be encoded in a single byte, use latin-1 encoding to compress the space, otherwise use UTF-16 encoding. The main constructor implementation is as follows:

String(char[] value, int off, int len, Void sig) {
    if (len == 0) {
        this.value = "".value;
        this.coder = "".coder;
        return;
    }
    if (COMPACT_STRINGS) {
        byte[] val = StringUTF16.compress(value, off, len);  // Try to compress the string and store it in single-byte encoding
        if(val ! =null) {   // Compression is successful and can be stored using single-byte encoding
            this.value = val;
            this.coder = LATIN1;
            return; }}// Otherwise, use double byte encoding for storage
    this.coder = UTF16;
    this.value = StringUTF16.toBytes(value, off, len);
}
Copy the code

The invariance of the String class

Notice that the String class is decorated with final; All properties are declared private; And all properties other than the hash property are final. This guarantees:

  1. StringClass byfinalDecorates, so it cannot be inheritedStringClass changes its semantics;
  2. All properties are declared asprivateSo can’t inStringexternaldirectlyAccess or modify its properties;
  3. In addition tohashAll other properties are usedfinalDecoration to indicate that these properties cannot be modified after initial assignment.

Together, these definitions implement an important feature of the String class — immutability. Once a String has been created, nothing can be done to it. The substring(), concat(), replace(), and other methods of String return the newly created String, not the original String.

The reason the hash property is not final is that the hashCode of a String does not need to be evaluated and assigned immediately when the String is created, but rather when the hashCode() method is called.

Why is the String class designed to be immutable?

  1. ensureStringObject security.StringWidely used asJDKAs parameters, return values such as network connection, opening files, class loading, and so on. ifStringObject is mutable, soStringObjects can be maliciously modified, raising security concerns.
  2. Thread safe.StringThe immutability of a class naturally makes it thread-safe.
  3. To ensure theStringThe object’shashCodeInvariance of.StringThe immutability of thehashCodeValues can be cached after the first calculation and do not need to be repeated thereafter. This makes theStringObject is good forHashMapSuch as the containerKeyAnd is more efficient than other objects.
  4. implementationString constant pool.JavaDesigned for string objectsString constant poolTo share strings and save memory. If the string is mutable, then the string object cannot be shared. Because if you change the value of one object, then the values of the other objects will change accordingly.

The class related to the String class

In addition to String, there are two classes related to String: StringBuffer and StringBuilder. These classes can be considered mutable versions of String, providing various methods for modifying strings. The difference is that a StringBuffer is thread safe and a StringBuilder is not thread safe.

StringBuffer/StringBuilder implementation

Both StringBuffer and StringBuilder are inherited from AbstractStringBuilder. AbstractStringBuilder uses a variable char array (changed to a byte array after Java 9) to implement various modifications to strings. Both StringBuffer and StringBuilder call methods in AbstractStringBuilder to manipulate strings. The difference between them is that the StringBuffer class uses synchronized modifier to modify strings. StringBuilder doesn’t, so StringBuffer is thread safe, StringBuilder is not thread safe.

Using Java 8 as an example, look at the AbstractStringBuilder class implementation:

abstract class AbstractStringBuilder implements Appendable.CharSequence {
    /** The value is used for character storage. */
    char[] value;
    /** The count is the number of characters used. */
    int count;
}
Copy the code

The value array is used to store a sequence of characters, and the count is used to store the number of characters that have been used in the value array. The real content of a string is the sequence of characters between [0,count] in the value array. The reason why we need the count attribute to record the used space is that the value array in AbstractStringBuilder is not reapplied for every change, but preallocated some extra space in advance. This reduces the number of times the array space has to be reallocated (similar to ArrayList).

The strategy for expanding the value array is as follows: When modifying the string, if the current value array does not meet the space requirements, a larger value array will be allocated. The allocated array size is min(the original array size ×2 + 2, the required array size). For more detailed logic, please refer to the following code:

private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

private int newCapacity(int minCapacity) {
    // overflow-conscious code
    int newCapacity = (value.length << 1) + 2;    // The size of the original array ×2 + 2
    if (newCapacity - minCapacity < 0) {     // If less than the required space size, expand to the required space size
        newCapacity = minCapacity;
    }
    return (newCapacity <= 0 || MAX_ARRAY_SIZE - newCapacity < 0)? hugeCapacity(minCapacity) : newCapacity; }private int hugeCapacity(int minCapacity) {
    if (Integer.MAX_VALUE - minCapacity < 0) { // overflow
        throw new OutOfMemoryError();
    }
    return (minCapacity > MAX_ARRAY_SIZE)
        ? minCapacity : MAX_ARRAY_SIZE;
}
Copy the code

AbstractStringBuilder also provides a trimToSize method to free up excess space:

public void trimToSize(a) {
    if(count < value.length) { value = Arrays.copyOf(value, count); }}Copy the code

The caching mechanism for String objects

Because strings are so widely used, Java has designed a caching mechanism for strings to improve both time and space efficiency. In the JVM’s runtime data area, there is a String Pool that holds all cached strings. When we say a String is interned, we mean that it is in the String Pool.

We understand the String caching mechanism by answering the following three questions:

  1. What are theStringObjects will be cachedString constant pool?
  2. StringWhere are objects cached and how are they organized?
  3. StringWhen does the object enterString constant pool?

Description: Unless otherwise specified, all JVM implementations mentioned in this article refer to Oracle’s HotSpot VM, The test code was run without any additional JVM parameters, regardless of escape analysis, scalar replacement, dead-code elimination, and other optimizations.

Preliminary knowledge

For a better reading experience, before answering the above three questions, we hope that readers have a brief understanding of the following knowledge points:

  • JVMRuntime data area
  • The class fileThe structure of the
  • JVMStack-based bytecode interpretation execution engine
  • The class loading process
  • JavaSeveral constant pools in

For the sake of completeness, we will briefly introduce two of the more discussed below.

The class loading process

The entire life cycle of a class from the time it is loaded into the memory of the virtual machine to the time it is unloaded from the memory is as follows: There are seven stages: Loading, Verification, Preparation, Resolution, Initialization, Using, and Unloading. Among them, verification, preparation and parsing are collectively called Linking. Load, validation, preparation, initialization and unload the order of the five phases is certain, the class loading process must, in accordance with the order, step by step while parsing stage does not necessarily: in some cases it can start again after the initialization phase, this is to support the Java language runtime binding (also called dynamic binding or late binding).

Several constant pools in Java

We know that source code files with the Java suffix are compiled by Javac into class files (bytecode files) with the class suffix. Part of the class file is the Constant Pool. This Constant Pool stores two main classes of constants:

  • In the codeliteralorConstant expressionThe value of the;
  • Symbolic references, including fully qualified names for classes and interfaces, names and descriptors for fields, and names and descriptors for methods.

2. Run-time Constant Pool In JVM run-time Data Areas, part of the run-time Constant Pool is part of the method area. The run-time Constant Pool is a run-time representation of the Constant Pool for each class or interface in the class file. The contents of the Constant Pool in the class file enter the run-time Constant Pool in the method area after the class is loaded.

The String Pool is the same as the Pool of constants we mentioned earlier that can be used to cache strings. This constant pool is shared globally and is part of the runtime data area.

Which Strings will be cached in the String constant pool?

In Java, there are two types of strings that are cached in the String constant pool: String literals or String constant expressions defined in the code, or when the program actively calls the string.intern () method to cache the current String object into the String constant pool. The following are two ways to make a brief introduction.

1. Implicit cache – Literal or constant string expressions

It’s called implicit caching because we don’t have to actively write caching code, the compiler and JVM do it for us.

String literals The first type of string that will be implicitly cached is a string literal. Literals are the source code representation of a value of type primitive, String, or NULL. Such as:

int i = 100;   // Int literal
double f = 10.2;  // A literal of type double
boolean b = true;   // Boolean type literals
String s = "hello"; // String literals
Object o = null;  // Null literal
Copy the code

A string literal consists of zero or more characters enclosed in double quotes. Java creates strings for String literals during execution and adds them to the pool of String constants. For example, “hello” in the above code is a String literal. In the execution process, we first create a String containing “hello”, cache it in the String constant pool, and then refer s to the String.

For more details on String Literals, see the Java Language specification (JLS-3.10.5.String Literals).

Another type of string that can be implicitly cached is a string constant expression. A constant expression refers to an expression that represents a value of a simple type or a String. A constant expression is an expression whose value can be determined at compile time. A String constant expression is a constant expression that represents a String. Such as:

int a = 1 + 2;
double d = 10 + 2.01;
boolean b = true & false;
String str1 =  "abc" + 123;

final int num = 456;
String  str2 = "abc" +456;
Copy the code

Java creates a String object for the String constant expression during execution and adds it to the String constant pool. For example, in the code above, we create two strings, “abc123” and “abc456”, which are cached in the String constant pool. Str1 points to the String with the value “abc123” in the constant pool. Str2 will point to a String in the constant pool with a value of “abc456”.

See the Java Language specification (JLS-15.28 Constant Expressions) for more details on Constant Expressions.

2. Active cache – string.intern () method

In addition to being declared as String literals/String constant expressions, strings obtained in other ways can also be actively added to the String constant pool. Such as:

String str = new String("123") + new String("456");
str.intern();
Copy the code

In the above code, after the first sentence, there are two strings in the constant pool with contents “123” and “456”, but no String “123456”, but after the execution of STR. Intern (); After that, the String containing “123456” is added to the String constant pool.

Let’s look at the caching mechanism in detail with the String. Intern () method’s comments:

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned. It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.

Simple translation:

When the INTERN method is called, if the constant pool already contains strings of the same content (as determined by the equals (Object) method, or in the case of strings, the same sequence of characters), then the String Object in the constant pool is returned. Otherwise, the String is added to the constant pool and a reference to the String is returned. Thus, for any two strings s and t, the result of s.inntern () == t.inntern () is true if and only if the result of s.innterals (t) is true.

Where are strings cached and how are they organized?

In HotSpot VM, there is a global table for recording cached Strings called StringTable, which is similar in structure and implementation to Java HashMap or HashSet. It is a hash table that uses the zipper method to resolve hash conflicts. You can simply think of it as HashSet

, noting that it only stores references to strings, not instances of strings. In general, when we say that a string is in the string constant pool we mean that there’s a reference to it in this StringTable, and conversely, if it’s not in there we mean that there’s no reference to it in StringTable.

Real String objects are stored in a different area. In Java 6, String objects in the String constant pool are stored in the permanent generation (HotSpot VM’s implementation of the method area before Java 8), and after Java 6, String objects in the String constant pool are stored in the heap.

Java 7 moves objects from the string constant pool to the heap because in Java 6, objects in the string constant pool are created in the permanent generation, and the size of the permanent generation is not set to be too large. If you use the string cache in large numbers, you may cause an OOM exception on the permanent generation.

When does a String enter the pool of String constants?

For a String that is actively cached into the constant pool by calling the string.intern () method in the program, it is obvious that the String is entered into the constant pool when the intern() method is called.

Let’s focus on the two types of values that are implicitly cached (string literals and string constant expressions). There are two main problems:

  1. We didn’t call itStringClass constructor, so when are they created?
  2. When they enterString constant pool?

Let’s analyze these two problems with the following code example:

public class Main {
    public static void main(String[] args) {
        String str1 = "123" + 123;     // String constant expression
        String str2 = "123456";         / / literal
        String str3 = "123" + 456;   // String constant expression}}Copy the code

Bytecode analysis

After compiling the above code, we use Javap to take a look at the bytecode file. To save space, we only extract the relevant sections: the constant pool table section and the main method information section:

Constant pool:
  #1 = Methodref          #5.#23         // java/lang/Object."<init>":()V
  #2 = String             #24            / / 123123
  #3 = String             #25            / / 123456
   / /... Omit...
  #24 = Utf8               123123
  #25 = Utf8               123456
 
 / /... Omit...

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=1, locals=4, args_size=1
         0: ldc           #2                  // String 123123
         2: astore_1
         3: ldc           #3                  // String 123456
         5: astore_2
         6: ldc           #3                  // String 123456
         8: astore_3
         9: return
      LineNumberTable:
        line 7: 0
        line 8: 3
        line 9: 6
        line 10: 9
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      10     0  args   [Ljava/lang/String;
            3       7     1  str1   Ljava/lang/String;
            6       4     2  str2   Ljava/lang/String;
            9       1     3  str3   Ljava/lang/String;
Copy the code

In the constant pool, there are two types of constants associated with strings, CONSTANT_String and CONSTANT_Utf8. CONSTANT_String is used to represent a constant object of type String. Its contents are only the index value of a constant pool. The member at index must be of type CONSTANT_Utf8. A constant of type CONSTANT_Utf8 is used to store the actual string contents. For example, items 2 and 3 in the constant pool are CONSTANT_String and store indexes 24 and 25, respectively. Items 24 and 25 in the constant pool are CONSTANT_Utf8 and store values “123123” and “123456”, respectively.

In the method information of a class file, the Code attribute is one of the most important parts in the class file, which contains the vm instruction corresponding to the execution statement, exception table, local variable information, etc., where LocalVariableTable is the information of the local variable, and Slot can be understood as the index position in the LocalVariableTable. The function of the LDC instruction is to extract data from the run-time constant pool at the specified index position and push it into the stack; The astore_

instruction pops a value of a reference type off the stack and stores it in the local variable table at the location specified by

. You can see that the bytecode instructions for all three assignment statements are the same:

ldc           #<index>   // First push the String in the constant pool to the stack
astore_<n>   // Then pop the String from the stack and save it to the specified location of the local variable
Copy the code

Operation process analysis

Again with the above code in mind, let’s examine the creation and caching timing of literal and constant string expressions in conjunction with the compilation to execution process.

First, the first step is for Javac to compile the source code into a class file. During source compilation, the two types of value string literals (“123456”) and string constant expressions (“123” + 456) mentioned above are stored in the constant pool of the compiled class file. The constant type is CONSTANT_String. Two points to note:

  • String constant expressionThe true value will be calculated at compile timeclassOf the fileConstant poolIn the. Such as in the source code above"123" + 123This expression is inclassThe constant pool representation of a file is123123."123" + 456This expression is inclassThe constant pool representation of a file is123456;
  • The value of the sameString literalsorString constant expressioninThe class fileThere is only one constant item in the constant pool ofCONSTANT_StringThe type andCONSTANT_Utf8Each has only one term). For example, in the above source code, although the two constants are declared as"123456"and"123" + 456But in the endclassThe file’s constant pool has only one value of123456theCONSTANT_Utf8Constant term and a correspondingCONSTANT_StringConstants.

During the JVM runtime, when the Main class is loaded, the JVM creates a runtime constant pool based on the class file’s constant pool. The contents of the class file’s constant pool are entered into the method area’s runtime constant pool when the class is loaded. Symbolic references in the constant pool of class files are converted to real values during the resolve phase of class loading. In HotSpot, however, the Resolution of symbolic references is not necessarily performed immediately upon class loading, but is deferred until the first execution of the relevant instruction (jLS-5.4.3.resolution). This is done with “lazy” or “late” resolution.

  • For some basic types of constant terms, such asCONSTANT_Integer_info.CONSTANT_Float_info.CONSTANT_Long_info.CONSTANT_Double_infoDuring the class loading phaseclassFile constant pool value toRuntime constant pool, respectivelyC++In theint.float.long.doubleType;
  • forCONSTANT_Utf8Is converted to during the parse phase of class loadingSymbolObject (HotSpot VMOne of the layersC++Object). At the same timeHotSpotuseSymbolTable(the structure andStringTableSimilar) to cacheSymbolObject, so after the class is loaded,SymbolTableShould have allCONSTANT_Utf8Constant correspondingSymbolObject;
  • And forCONSTANT_StringType, since its content is a symbolic reference (point to)CONSTANT_Utf8The index value of a constant of type), so it needs to be parsed, which is converted tojava.lang.StringObject corresponding tooop(can be understood asJavaObjects in theHotSpot VMLayer) and useStringTableTo cache. butCONSTANT_StringConstants of type, as mentioned aboveDelay resolutionThat is, parsing is not performed immediately upon class loading, but when the relevant instruction is executed for the first time (generallyldcInstruction) is actually parsed.

As mentioned above, the JVM performs real parsing when the instruction is first executed. For the above code, look at the bytecode and you can see that the LDC instruction uses symbolic references, so parsing is required when executing the LDC instruction. So what does the LDC directive actually do?

The LDC directive looks for the constant item corresponding to the specified index from the run-time constant pool and pushes it onto the stack. If the item is not resolved, it needs to be parsed to convert the symbolic reference to a concrete value before it is pushed onto the stack. If the unparsed item is a constant of type String, we first look for a String object with the same content in the String constant pool. If so, we push the object directly from the String constant pool. If not, a new String is created and added to the String constant pool, and the created new object is pushed onto the stack. If you declare multiple String literals or String constant expressions of the same content in your code, you will only create a String when the LDC instruction is executed for the first time, and then the constant at the corresponding position will be parsed when the same LDC instruction is executed and pushed directly onto the stack.

To summarize:

  1. During compilation, source codeString literalsorString constant expressionConversion toThe class fileOf the constant poolCONSTANT_StringConstants.
  2. During the class loading phase,The class filetheConstant poolIn theCONSTANT_StringThe constant term is storedRuntime constant pool, but the saved content is still a symbolic reference, unparsed.
  3. In the instruction execution phase, when the first executionldcInstruction,Runtime constant poolIn theCONSTANT_StringThe item is not parsed yet, it will actually be parsed, and it will be created during parsingStringObject and addstringThe constant pool.

Cache critical source analysis

As you can see, the LDC directive is very similar to the logic of the string.intern () method when parsing a String constant:

  1. ldcParsing in instructionsStringConstant: first fromString constant poolTo find out whether there is the same contentStringObject, pushes it onto the stack if it exists, creates a new object if it does not existString constant poolAnd push it on the stack.
  2. String.intern()Method: Start withString constant poolTo find out whether there is the same contentStringObject, returns a reference to the object if it exists, or adds itself if it does notString constant poolAnd return.

In fact, on the HotSpot internal implementation, the LDC instruction calls the same internal method as the native method corresponding to string.intern (). We in its eight source code, for example, analyse the process simple, code is as follows (source location: SRC/share/vm/classfile/SymbolTable CPP) :


// The string.intern () method calls this method
// The "oop String "argument represents the string in which the intern() method was called
oop StringTable::intern(oop string, TRAPS)
{
  if (string == NULL) return NULL;
  ResourceMark rm(THREAD);
  int length;
  Handle h_string (THREAD, string);
  jchar* chars = java_lang_String::as_unicode_string(string, length, CHECK_NULL);    // Convert a String to a sequence of characters
  oop result = intern(h_string, chars, length, CHECK_NULL);
  return result;
}

// This method is called when the LDC instruction is executed
// The parameter "Symbol* Symbol "is a Symbol object in the run-time constant pool corresponding to the LDC directive's parameter (index position)
oop StringTable::intern(Symbol* symbol, TRAPS) {
  if (symbol == NULL) return NULL;
  ResourceMark rm(THREAD);
  int length;
  jchar* chars = symbol->as_unicode(length);   // Convert the Symbol object to a sequence of characters
  Handle string;
  oop result = intern(string, chars, length, CHECK_NULL);
  return result;
}

// Both methods call this method
oop StringTable::intern(Handle string_or_null, jchar* name, int len, TRAPS) {
  // Try to find it from the string constant pool
  unsigned int hashValue = hash_string(name, len);
  int index = the_table()->hash_to_index(hashValue);
  oop found_string = the_table()->lookup(index, name, len, hashValue);

  // Return if found
  if(found_string ! = NULL) { ensure_string_alive(found_string);return found_string;
  }

   / /... Omit part of the code......
   
  Handle string;
  // Try to reuse the original string. If it cannot be reused, a new string is created
  // The implementation here is a bit different in JDK 6. Only when string_or_null already exists in the permanent generation will it be reused
  if(! string_or_null.is_null()) { string = string_or_null; }else {
    string = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
  }

  / /... Omit part of the code......

  oop added_or_found;
  {
    MutexLocker ml(StringTable_lock, THREAD);
    // Add string to StringTable
    added_or_found = the_table()->basic_add(index, string, name, len,
                                  hashValue, CHECK_NULL);
  }
  ensure_string_alive(added_or_found);
  return added_or_found;
}
Copy the code

Case analysis

Note: Because the string constant pool was moved from the permanent generation to the heap after Java 6, there may be some code in Which Java 6 behaves differently from later versions. So the following code is tested using Java 6 and Java 7 separately. If not specified, the results are the same on both versions, and if they are different, they are indicated separately.


final int a = 4;
int b = 4;
String s1 = "123" + a + "567";
String s2 = "123" + b + "567";
String s3 = "1234567";
System.out.println(s1 == s2);
System.out.println(s1 == s3);
System.out.println(s2 == s3);
Copy the code

Results:

false
true
false
Copy the code

Explanation:

  1. Third row, becauseaIs defined as constant, so"123" + a + "567"Is aConstant expressionIs compiled to"1234567", so it will be inString constant poolCreated in the"1234567".s1Point to theString constant poolIn the"1234567";
  2. In the fourth row,bIs defined as a variable,"123"and"567"isString literalsSo first of all in theString constant poolCreated in the"123"and"567"And then throughStringBuilderImplicit splices are created in the heap"1234567".s2Pointing to the heap"1234567";
  3. The fifth line,"1234567"Is aString literalsBecause at this pointString constant poolIs already there"1234567", sos3Pointing to stringString constant poolIn the"1234567".

String s1 = new String("123");
String s2 = s1.intern();
String s3 = "123";
System.out.println(s1 == s2); 
System.out.println(s1 == s3); 
System.out.println(s2 == s3);
Copy the code

Results:

false
false
true
Copy the code

Explanation:

  1. The first line,"123"Is aString literalsSo first of all in theString constant poolIs created in"123"Object, and then useStringThe constructor of creates one in the heap"123"Object,s1Pointing to the heap"123";
  2. Second row, becauseString constant poolIt’s already there"123", sos2Point to theString constant poolIn the"123";
  3. The third row, again becauseString constant poolIt’s already there"123", sos3Point to theString constant poolIn the"123".

String s1 = String.valueOf("123");
String s2 = s1.intern();
String s3 = "123";
System.out.println(s1 == s2); 
System.out.println(s1 == s3); 
System.out.println(s2 == s3); 
Copy the code

Results:

true
true
true
Copy the code

Explanation: The difference is that the string.valueof () method returns a String as a value and does not create a new object on the heap, so s1 also points to “123” in the String constant pool. All three variables refer to the same object.


String s1 = new String("123") + new String("456"); 
String s2 = s1.intern();
String s3 = "123456";
System.out.println(s1 == s2); 
System.out.println(s1 == s3); 
System.out.println(s2 == s3);
Copy the code

The above code has different results in Java 6 and Java 7. In Java 6:

false
false
true
Copy the code

Explanation:

  1. The first line,"123"and"456"isString literalsSo first of all in theString constant poolCreated in the"123"and"456".+Operator throughStringBuilderImplicit splices are created in the heap"123456".s1Pointing to the heap"123456";
  2. The second row is going to be"123456"The cache toString constant poolbecauseJava 6In theString constant poolObjects in the permanent generation are created, so will be inString constant pool(permanent generation) create one"123456"There is one in the heap and one in the permanent generation"123456".s2Point to theString constant pool(permanent generation)"123456";
  3. The third row,"123456"isString literalsBecause at this pointString constant poolExists in (permanent generation)"123456", sos3Point to theString constant pool(permanent generation)"123456".

In Java 7:

true
true
true
Copy the code

Explanation: The difference with Java 6 is that since objects in the String constant pool are created on the heap in Java 7, the second line String s2 = s1.intern(); Instead of creating a new String, we directly add a reference to s1 to StringTable, so all three objects point to “123456” in the constant pool, which is the object created in the heap in the first row.

In Java 7, s1 == s2 and the result is true. We assume that if “123456” was not resolved lazily, but was resolved and entered the constant pool when the class was loaded, s1.intern() would return the value “123456” that exists in the constant pool, instead of adding the “123456” object in the heap that S1 points to to the constant pool. So s2 should be equal to s3 instead of s1.


String s1 = new String("123") + new String("456");
String s2 = "123456";
String s3 = s1.intern();
System.out.println(s1 == s2); 
System.out.println(s1 == s3); 
System.out.println(s2 == s3);
Copy the code

Results:

false
false
true
Copy the code

Explanation:

  1. The first line,"123"and"456"isString literalsSo first of all in theString constant poolCreated in the"123"and"456".+Operator throughStringBuilderImplicit splices are created in the heap"123456".s1Pointing to the heap"123456";
  2. The second line,"123456"Is a string literal, which does not exist in the string constant pool"123456"And so onString constant poolCreated in the"123456".s2Point to theString constant poolIn the"123456";
  3. Line 3, because the string constant pool already exists at this point"123456", sos3Point to theString constant poolIn the"123456".

reference

  1. Java substring() method memory leak issue and fix
  2. java – substring method in String class causes memory leak – Stack Overflow
  3. Jls-3.10.5. String Literals
  4. Jls-15.28 Constant Expressions
  5. String. Intern in Java 6, 7 and 8 — String pooling
  6. When does a “literal” in a new String(” literals “) enter the String constant pool? – Wooden girl’s answer – Zhihu
  7. Drill down into String#intern
  8. JLS – 5.4.3. Resolution
  9. String s = new String(“xyz”); How many String instances have you created?
  10. JVM Internals
  11. Inside the JVM
  12. Java virtual machine principle diagram