The implementation of a String object

String is one of the most frequently used objects in Java, so The Java company is constantly optimizing the implementation of String objects to improve the performance of String objects. See the following figure for a look at the optimization process of String objects.

1. In Java6 and previous versions

A String is an object that encapsulates a char array and has four member variables: char array, offset, count, and hash.

A String is a String that locates an array of char[] using the offset and count attributes. Doing so can efficiently and quickly share array objects while saving memory space, but this approach is likely to result in memory leaks.

2. Start from Java7 to Java8

Since version 7,Java has made some changes to the String class. We no longer have offset and count in the String class. The advantage of this is that strings take up slightly less memory, and the String.substring method no longer shares char[], thus eliminating the memory leak that might result from using this method.

3. Start with Java9

Why do I need to change a char[] array to a byte[] array? We know that char is two bytes, and if storing one byte is a bit wasteful, the Java company saves space by using one byte instead to store strings. This avoids waste when storing a byte of characters.

In Java9, a new property coder is maintained, which identifies the encoding format and is used to determine how to calculate the string length when calculating the string length or calling the indexOf() function. The coder attribute defaults to 0 and 1, with 0 representing Latin-1 (single-byte encoding) and 1 representing UTF-16 encoding. The coder property is 0 if String determines that the String contains only Latin-1, and 1 otherwise.

How a String object is created

1. Through string constants

String STR = “pingtouge”. When a String is created using this form, the JVM checks for the existence of the object in the String constant pool and returns the reference address if it exists. If it does not, the String object is created in the String constant pool and returns the reference. The advantage of using this method is that it saves memory by avoiding the repeated creation of strings with the same value

The String() constructor

String STR = new String(“pingtouge”). The process of creating String objects using this method is complex and consists of two stages. First, at compile time, the String pingtouge is added to the constant structure, and the String is created in the constant pool when the class is loaded. Then, when calling new(), the JVM will call the String constructor, reference pingtouge in the constant pool, create a String in heap memory and return the address of the reference in the heap.

Now that we know the two ways to create strings, let’s examine the following code to understand them better. Does STR equal str1 in this code?

  String str = "pingtouge";
  String str1 = new String("pingtouge");
  system.out.println(str==str1)
Copy the code

The JVM will search the constant pool to see if pingtouge exists, and the answer is no. So the JVM will create the string object in the constant pool and return a reference to its address, so STR points to the pingtouge string object’s address reference in the constant pool.

String str1 = new String(” Pingtouge “), which uses a constructor to create String objects. Based on our understanding of the construction of String objects, str1 should obtain a reference to the pingtouge String in the heap. Since STR points to a reference to the pingtouge string object in the constant pool and STR1 points to a reference to the Pingtouge string in the heap, STR must not equal STR1.

Immutability of String objects

From the moment we learned about strings, I think we all knew that strings are immutable. So how does immutable do that? What are the benefits of Java doing this? Let’s briefly explore, first to look at a String object source:

public final class String
    implements java.io.Serializable.Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;
    }
Copy the code

As can be seen from the source code, the String class uses the final modifier. We know that when a class is modified by the final modifier, it means that the class cannot be inherited, so the String class cannot be inherited. That’s the first thing about String immutable

Further down, the char value[] array used to store strings is private and final, and we know that for a variable of a final primitive data type, its value cannot be changed once initialized. This is the second thing about String immutability.

There are three main reasons why The Java Company made strings immutable:

  • 1. Ensure the security of String objects. Assuming strings are mutable, strings can be maliciously modified.
  • 2. Ensure that the hash attribute value will not change frequently to ensure uniqueness, so that the corresponding key-value caching function can be realized by containers like HashMap.
  • String constant pool can be implemented

Optimization of String objects

Strings are one of the most commonly used Java types, so manipulating strings is inevitable. If used incorrectly, performance can make a huge difference. So what do we need to pay attention to when we manipulate strings?

Elegant concatenation string

String splicing is to String manipulation using one of the most frequent operation, because we know that String object immutability, joining together so we are doing as little as possible when using + String concatenation or subconscious thought cannot use + String concatenation, think for String concatenation using + will produce many useless objects. Is that really the case? Let’s do an experiment. We use + to concatenate the following string.

String str8 = "ping" +"tou"+"ge";
Copy the code

How many objects does this code generate? If we follow our understanding, the ping object will be created first, then the Pingtou object, and finally the Pingtouge object. Three objects will be created. Is it really so? The compiler optimizes the String concatenation into a String str8 = “pingtouge”; Object. In addition to optimizations for constant String concatenation, the compiler also optimizes dynamic concatenation of strings with + signs to improve String performance, such as the following code:

String str = "pingtouge";

for(int i=0; i<1000; i++) {
      str = str + i;
}

Copy the code

The compiler will optimize it for us


String str = "pingtouge";

for(int i=0; i<1000; i++) {
        	  str = (new StringBuilder(String.valueOf(str))).append(i).toString();
}

Copy the code

We can see that The Java company has made a lot of optimizations in this area to prevent String performance from plummeting due to programmer carelessness. Although the Java company has made corresponding optimizations in the compiler area, we can still see the shortcomings of the Java company optimization. Although StringBuilder is used for string concatenation, each loop generates a new Instance of StringBuilder, which also degrades system performance.

Therefore, when we do string stitching, we need to optimize from the code level. When we do dynamic string stitching, if thread safety is not involved, we use StringBuilder to stitching to improve system performance. If thread safety is involved, We use StringBuffer for string concatenation

Use the clever intern() method

     * <p>
     * When the intern method is invoked, if the pool already contains a
     * string equal to this {@code String} object as determined by
     * the {@link #equals(Object)} method, then the string from the pool is
     * returned. Otherwise, this {@code String} object is added to the
     * pool and a reference to this {@code String} object is returned.
     * <p>
     public native String intern();
Copy the code

Intern () returns a reference to a string in the constant pool if the string already exists in the constant pool. Otherwise, add the object to the constant pool and return the reference.

A Twitter engineer at QCon shared an example of how they optimized String objects using the String.Intern () method to reduce storage from 20 gigabytes to just a few hundred megabytes. Intern () : String. Intern () : String. Intern () : String.

    public static void main(String[] args) {
        String str = new String("pingtouge");
        String str1 = new String("pingtouge");
        System.out.println("No intern() method used:"+(str==str1));
        System.out.println("Without intern() method, STR:"+str);
        System.out.println("Without intern() method,str1:"+str1);

        String str2= new String("pingtouge").intern();
        String str3 = new String("pingtouge").intern();
        System.out.println("Use intern() method:"+(str2==str3));
        System.out.println("Using intern() method,str2:"+str2);
        System.out.println("Using intern() method,str3:"+str3);

    }
Copy the code

String.intern()
String.intern()

String.intern()The method is good, but we should use it in combination with the scene, because the constant pool implementation is similar to oneHashTableThe implementation of,HashTableThe larger the stored data, the time complexity of traversal increases. If the data is too large, the entire string constant pool will be overloaded.

Flexible string splitting

String splitting is one of the common operations of string operations. For string splitting, most people use the Split() method. In most cases, the Split() method uses regular expression, which has no problems in itself, but the performance of regular expression is very unstable. Improper use can cause backtracking problems, which can lead to high CPU levels. The Split() method does not use regular expressions in two cases:

  • The incoming parameters of length 1, and does not contain “. $| () [{^? * + \ “regex metacharacters, will not use regular expressions
  • If the argument is passed in length 2, the first character is a backslash, and the second character is not an ASCII number or ASCII letter, the regular expression will not be used

If string.indexof () does not meet the splitting requirements, use Split(). Backtracking is important when using the Split() method to Split strings.

Article insufficient place, hope everybody gives directions a lot, common study, common progress

The resources

  • Java performance tuning combat Liu Chao

The last

Play a small advertisement, welcome to scan the code to pay attention to the wechat public number: “The technical blog of the flathead brother”, progress together.