As Java programmers, we are all too familiar with strings. The String class is used quite often. It is the core class in the Java language. Under the java.lang package, it is mainly used for string comparison, lookup, concatenation, and so on. So what is the underlying implementation, and what are the optimizations for String as the JDK continues to update? Follow this article into its “heart”, unlock its mysterious veil.

Basic features of String

To learn more about a class, start with its source code:

public final class String implements java.io.Serializable.Comparable<String>, CharSequence {

    /** The value is used for character storage. */ 
    private final char value[]; 
    
    /** Cache the hash code for the string */ 
    private int hash; // Default to 0 
//. } -- taken from JDK18.
Copy the code

At least the following points can be seen from the above source code:

  • The String class is decorated with the final keyword to indicate that the String class cannot be inherited.
  • String implements the Comparable interface, which means strings can compare sizes
  • The String class is stored through an array of char[], and the char[] array is final, meaning that its value cannot be modified once created

The most important thing about String is its immutability:

  • When you reassign a value to a string, you need to reassign the memory area without changing the original value.
  • When concatenating an existing string, you also need to respecify the memory assignment without changing the original value.
  • When you use the replace() method of the String class to modify the specified String, you also need to reassign the memory space without changing the original value.

In short: once a string is in memory, it cannot be changed!

The storage structure of String changes

Since the beginning of String internals have used arrays of char to represent a single String, which makes sense because a String is, of course, composed of multiple characters. However, with JDK1.9, this changed:

public final class String implements java.io.Serializable.Comparable<String>, CharSequence.Constable.ConstantDesc {
    @Stable
    private final byte[] value;
   
    private final byte coder;

    private inthash; . } -- taken from JDK19.
Copy the code

As of JDK1.9, strings use byte[] byte arrays to store values, and a member variable coder is added. At the same time, StringBuilder, StringBuffer, and so on have been tweaked accordingly. Here’s the official explanation:

Motivation

The current implementation of the String class stores characters in a char array, using two bytes (sixteen bits) for each character. Data gathered from many different applications indicates that strings are a major component of heap usage and, moreover, that most String objects contain only Latin-1 characters. Such characters require only one byte of storage, hence half of the space in the internal char arrays of such String objects is going unused.

motivation

The current implementation of the String class stores characters in a char array, using two bytes (16 bits) per character. Data collected from many different applications shows that strings are a major part of heap usage, and that most string objects contain only Latin-1 characters. Such characters require only one byte of storage space, so half the space in the internal character array of such string objects is left idle.

In programs, most strings contain only alphanumeric characters, encoded in Latin-1, each character takes one byte. Then using byte can reduce the memory usage of strings to a certain extent.

If a string contains characters that are outside the scope of Latin-1, it cannot be expressed using Latin-1. The JDK uses UTF-16 encoding, which takes up the same space as the older version (which used char[]). The new member variable coder is used to indicate whether the current string is latin-1 or UTF16 encoded.

String memory allocation

String pools were stored in the permanent generation prior to JDK1.7, when String pools were stored in the heap. JDK1.8 has removed the permanent generation, using a meta-space to implement the method area, and does not make any changes to the location of the string constant pool. The string constant pool is also moved to the heap for storage so that GC can be performed to clean up strings that are no longer in use. The process involved is quite complicated and will not be described here.

Let’s talk about string constant pools

String constant pool

In fact, the idea of pooling is very common in Java. A constant pool is like a cache provided by the Java system. When the same content is in memory, it will not be created, but will use the existing content, which can save more memory space. Runtime constant pools as well as string constant pools are maintained in the Java Virtual machine.

The string constant pool is a fixed size HashTable. HashTable stores objects in the form of array + linked list. Before storing objects, hash algorithms are used to retrieve hash values and store them into the table, which also ensures that no strings with the same contents can be stored in the string constant pool. You can use -xx :StringTableSize to set the size of the constant pool, which defaults to 60013 after JDK1.7. When are strings stored in a constant pool?

  • Strings declared directly in double quotes are stored directly in the constant pool
String s = "Hello";
Copy the code
  • A new String object is also maintained in the constant pool
String s = new String("Hello");
Copy the code

This means that new a String creates two objects (if the String has not been declared before), one in the heap and one in the String constant pool.

  • Call the String intern() method
/**
 * When the intern method is invoked, if the pool already contains a
 * string equal to this {@code String} object as determined by
 * the {@link #equals(Object)} method, then the string from the pool is
 * returned. Otherwise, this {@code String} object is added to the
 * pool and a reference to this {@code String} object is returned.
 */
public native String intern(a); - from JDK9Copy the code

In human terms, when you call intern(), you return a reference to the String in the pool if the pool already contains a String equal to the identified String. Otherwise, add the String object to the pool and return a reference to it. All string literals and string valued constant expressions are entities.

Concatenation of strings

We all like to use “+” to concatenate two strings, but using “+” to concatenate strings creates a lot of strings, which takes up some memory. So everyone would recommend using StringBuilder or StringBuffer to concatenate strings. However, when a simple “+” can solve the problem, we need to create another object, which many people are not willing to do. In fact, official Java developers have been optimizing the “+” in an effort to reduce the underlying memory consumption. So what happens to a simple “+”?

public class Demo {
    public static void main(String[] args) {
        String s1 = "Hello"
        String s2 = "world"; String s3 = s1 + s2; System.out.println(s3); }}Copy the code

Decompiling such a simple piece of code yields its bytecode file

What we find in Java is that we create a StringBuilder object and call its append() method to create a new string, and call the toString method of StringBuilder to create the string, We know that The toString method of StringBuilder essentially creates a new String object, but that object is not added to the constant pool. To think that a simple + requires so many steps! That’s why we should use less +.

This operation has been optimized since JDK9, where instead of using StringBuilder to concatenate two strings, the makeConcatWithConstants() method of StringConcatFactory is used to concatenate strings dynamically.

  Code:
    stack=2, locals=4, args_size=1
       0: ldc           #2                  // String Hello
       2: astore_1
       3: ldc           #3                  // String world
       5: astore_2
       6: aload_1
       7: aload_2
       8: invokedynamic #4.0              // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String; Ljava/lang/String;) Ljava/lang/String;
      13: astore_3
      14: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
      17: aload_3
      18: invokevirtual #6                  // Method java/io/PrintStream.println:(Ljava/lang/String;) V
      21: return
Copy the code

This improves the efficiency of + to a certain extent. But even so. When faced with a lot of string concatenation + operation, use StringBuilder/StringBuffer append () method, the efficiency of the latter is far better than the former. Of course, Java is constantly being updated, optimized, and enhanced, and maybe + will be comparable to Append in the near future. That’s all for today’s sharing. Thanks for reading!