The String object is the object we use most frequently in our daily work, and its performance issues are also the ones we tend to ignore. As the most important data type in Java language, String object occupies the largest space in memory. Using String efficiently can improve the overall performance of the system.

In this article, we will take a closer look at String objects from three aspects: implementation, features, and optimization in practice.

How is a String object implemented

In recent changes to the Java version, a number of optimizations have been made to String objects to save memory space and improve String performance in the system. Take a look at the String optimization process in the Java version iteration;

  • In Java6 and earlier versions, a String is an encapsulated implementation of a char array with four main member variables: char array, offset, count, and hash.
  • In Java versions 7 and 8, Java made changes to the String class to eliminate offset and count, which reduced the memory footprint of String objects slightly. At the same time, string.substring () no longer shares char[], which addresses the memory leak that might result from using this method.
  • Beginning with Java9, char[] has been changed to byte[], and a new property coder, which is an encoding identifier, is maintained.

Why change from char[] to byte[]? We all know that a char character takes up 16 bits and 2 bytes, which is a bit wasteful to store characters in a single byte encoding. The String class in Java9 uses an 8-bit, 1-byte array to store strings in order to save memory space.

The new coder property is used to calculate the length of the string or to use the indexOf() function. We need this field to determine how to calculate the length of the string. The coder attribute defaults to 0 and 1, 0 for Latin-1 (single-byte encoding) and 1 for UTF-16. The coder property is 0 if String determines that the String contains only Latin-1, and 1 otherwise.

Immutability of String objects

We find that in the String implementation, not only is the String class of the implementation code modified by the final keyword, but the traversal of charp[] is also modified by final. The final modifier means that the String class cannot be inherited, while charp[] is private and final, which means that the String object cannot be changed. The Java implementation of this feature is called String immutability, meaning that strings cannot be modified once they have been created.

What are the benefits of String immutability?

  1. This ensures the security of String objects. If strings are mutable, then strings can be maliciously modified.
  2. This ensures that the hash attribute value does not change frequently and ensures uniqueness, so that the corresponding key-value caching function can be implemented only for containers like HashMap.
  3. String constant pooling can be implemented. There are two ways to create string objects in Java, one as string constants, and the other as string variables as new.

When we create a string object using a string constant, the JVM first checks that the object is in the string constant pool, returns a reference to the object if it is, and otherwise creates a new string object, saves it in the string constant pool, and uses the reference. This saves memory by reducing the number of string objects with the same value being created repeatedly.

When we create with a new form, such as String STR = new String(” ABC “), the “ABC” constant String will be placed in the constant structure when the class file is compiled, the “ABC” constant String will be created in the constant pool when the class is loaded, and when we call new, JVM commands will call the String constructor, reference the “ABC” String in the constant pool, create a String in the heap, and finally STR will reference the String.

Optimization of String objects

Above we have learned about the implementation principle and characteristics of String objects, the following will combine with the actual scenario, see what we need to pay attention to in the actual use of String objects.

####1. String stitching

Concatenation of strings is common in programming. As we said earlier, strings are immutable, and if we add strings together to concatenate the desired strings, we can create multiple objects. For example:

String str = "ab" + "cd" + "ef";
Copy the code

Ab objects are generated, abcd objects are generated, and abcDEF objects are generated. Theoretically, this is inefficient. But in practice, we find that only one object is generated. Why is this? If you look at the compiled code, you can see that the compiler optimizes the code as follows:

String str = "abcdef";
Copy the code

String constant summation (string constant summation)

String str = "abcdef";
for (int i = 0; i<1000; i++) { str = str + i; }Copy the code

After compiling, we can see that the compiler has optimized this code as well, and Java favors Using StringBuilder for string concatenation to make the program more efficient.

String str = "abcdef";
for(int i=0; i<1000; i++) {
            str = (new StringBuilder(String.valueOf(str))).append(i).toString();
}
Copy the code

We can see that the compiler optimizes StringBuilder even when the ‘+’ sign is used for string concatenation. However, if you look closely, you can see that the compiler optimizes the code to generate a new StringBuilder object every time it loops, which also degrades system performance.

When we do String concatenation, we recommend using StringBuilder explicitly to improve system performance. If it is multithreaded programming and String concatenation is thread safe, we can use StringBuffer. Because StringBuffer is thread-safe and involves lock contention, it is worse than StringBuilder in terms of performance.

####2. Use String. Intern to save memory

Use the String intern method every time an assignment is made. If the constant pool has the same value, the object is reused and the object reference is returned.

String a =new String("abc").intern();
String b = new String("abc").intern();
        
if(a==b) {
    System.out.print("a==b");
}
Copy the code

Output,a==b, analyze;

When we create a variable, calling new Sting() creates a String in the heap, and the char array in the String will reference the strings in the constant pool. After calling the intern method, the constant pool looks for references equal to the string object and returns references if there are.

When we create the b variable, calling new Sting() creates a String in the heap, and the char array in the String will reference the strings in the constant pool. After calling the intern method, the constant pool looks for references equal to the string object and returns references if there are.

The two objects in the heap, since there are no references to them, will be garbage collected. So a and B refer to the same object.

If a string object is created at run time, it will be created directly in the heap memory, not in the constant pool. So dynamically created string objects call intern, which in JDK1.6 creates runtime constants in the constant pool and returns string references, and after JDK1.7 puts references to string constants in the heap into the constant pool. If there is a string reference in the constant pool, return the string reference in the constant pool to the same address as the previous string.

To summarize the String creation and memory allocation situation, use a graph:

The important thing about using the Intern method is that it must be combined with the actual scenario. Because the constant pool implementation is similar to a HashTable implementation, the larger the HashTable stores, the greater the traversal time complexity. If the data is too large, the entire string constant pool will be overloaded.

####3. How to use the string splitting method

The Split() method is commonly used to Split strings. The Split() method uses regular expression to achieve its powerful splitting ability, but the performance of regular expression is very unstable, and improper use will cause backtracking problems, which may lead to high CPU.

In daily use, string.indexof () can be used instead of Split() to Split a String. If the Split() method does not meet the requirements, attention should be paid to backtracking when using Split().

The last

To end this sharing, we have a quick question: Are the two objects matched in each group equal?

String str1= "abc";
String str2= new String("abc");
String str3= str2.intern();
assertSame(str1==str2);
assertSame(str2==str3);
assertSame(str1==str3)
Copy the code