The implementation of a String object
String is one of the most frequently used objects in Java, so The Java company is constantly optimizing the implementation of String objects to improve the performance of String objects. See the following figure for a look at the optimization process of String objects.
1. In Java6 and previous versions
A String is an object that encapsulates a char array and has four member variables: char array, offset, count, and hash.
A String is a String that locates an array of char[] using the offset and count attributes. Doing so can efficiently and quickly share array objects while saving memory space, but this approach is likely to result in memory leaks.
2. Start from Java7 to Java8
Since version 7, Java has made some changes to the String class. We no longer have offset and count in the String class. The advantage of this is that strings take up slightly less memory, and the String.substring method no longer shares char[], thus eliminating the memory leak that might result from using this method.
3. Start with Java9
Why do I need to change a char[] array to a byte[] array? We know that char is two bytes, and if storing one byte is a bit wasteful, the Java company saves space by using one byte instead to store strings. This avoids waste when storing a byte of characters.
In Java9, a new property coder is maintained, which identifies the encoding format and is used to determine how to calculate the string length when calculating the string length or calling the indexOf() function. The coder attribute defaults to 0 and 1, with 0 representing Latin-1 (single-byte encoding) and 1 representing UTF-16 encoding. The coder property is 0 if String determines that the String contains only Latin-1, and 1 otherwise.
How a String object is created
1. Through string constants
String STR = “pingtouge”. When a String is created using this form, the JVM checks for the existence of the object in the String constant pool and returns the reference address if it exists. If it does not, the String object is created in the String constant pool and returns the reference. The advantage of using this method is that it saves memory by avoiding the repeated creation of strings with the same value.
The String() constructor
String STR = new String(“pingtouge”). The process of creating String objects using this method is complex and consists of two stages. First, at compile time, the String pingtouge is added to the constant structure, and the String is created in the constant pool when the class is loaded. Then, when calling new(), the JVM will call the String constructor, reference pingtouge in the constant pool, create a String in heap memory and return the address of the reference in the heap.
Now that we know the two ways to create strings, let’s examine the following code to understand them better. Does STR equal str1 in this code?
String str = "pingtouge";
String str1 = new String("pingtouge");
system.out.println(str==str1)
Copy the code
The JVM will search the constant pool to see if pingtouge exists, and the answer is no. So the JVM will create the string object in the constant pool and return a reference to its address, so STR points to the pingtouge string object’s address reference in the constant pool.
String str1 = new String(” Pingtouge “), which uses a constructor to create String objects. Based on our understanding of the construction of String objects, str1 should obtain a reference to the pingtouge String in the heap. Since STR points to a reference to the pingtouge string object in the constant pool and STR1 points to a reference to the Pingtouge string in the heap, STR must not equal STR1.
Immutability of String objects
From the moment we learned about strings, I think we all knew that strings are immutable. So how does immutable do that? What are the benefits of Java doing this? Let’s briefly explore, first to look at a String object source:
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final char value[];
/** Cache the hash code for the string */
private int hash; // use serialVersionUID from JDK 1.0.2for interoperability */
private static final long serialVersionUID = -6849794470754667710L;
}
Copy the code
As can be seen from the source code, the String class uses the final modifier. We know that when a class is modified by the final modifier, it means that the class cannot be inherited, so the String class cannot be inherited. That’s the first thing about String immutable
Further down, the char value[] array used to store strings is private and final, and we know that for a variable of a final primitive data type, its value cannot be changed once initialized. This is the second thing about String immutability.
There are three main reasons why The Java Company made strings immutable:
1. Ensure the security of String objects. Assuming strings are mutable, strings can be maliciously modified.
2. Ensure that the hash attribute value will not change frequently to ensure uniqueness, so that the corresponding key-value caching function can be realized by containers like HashMap.
String constant pool can be implemented
Optimization of String objects
Strings are one of the most commonly used Java types, so manipulating strings is inevitable. If used incorrectly, performance can make a huge difference. So what do we need to pay attention to when we manipulate strings?
Elegant concatenation string
String splicing is to String manipulation using one of the most frequent operation, because we know that String object immutability, joining together so we are doing as little as possible when using + String concatenation or subconscious thought cannot use + String concatenation, think for String concatenation using + will produce many useless objects. Is that really the case? Let’s do an experiment. We use + to concatenate the following string.
String str8 = "ping" +"tou"+"ge";
Copy the code
How many objects does this code generate? If we follow our understanding, the ping object will be created first, then the Pingtou object, and finally the Pingtouge object. Three objects will be created. Is it really so? The compiler optimizes the String concatenation into a String str8 = “pingtouge”; Object. In addition to optimizations for constant String concatenation, the compiler also optimizes dynamic concatenation of strings with + signs to improve String performance, such as the following code:
String str = "pingtouge";
for(int i=0; i<1000; i++) {
str = str + i;
}
Copy the code
The compiler optimizes for us to look like this:
String str = "pingtouge";
for(int i=0; i<1000; i++) {
str = (new StringBuilder(String.valueOf(str))).append(i).toString();
}
Copy the code
We can see that The Java company has made a lot of optimizations in this area to prevent String performance from plummeting due to programmer carelessness. Although the Java company has made corresponding optimizations in the compiler area, we can still see the shortcomings of the Java company optimization. Although StringBuilder is used for string concatenation, each loop generates a new Instance of StringBuilder, which also degrades system performance.
Therefore, when we do string stitching, we need to optimize from the code level. When we do dynamic string stitching, if thread safety is not involved, we use StringBuilder to stitching to improve system performance. If thread safety is involved, We use StringBuffer for string concatenation
Use the clever intern() method
* <p>
* When the intern method is invoked, if the pool already contains a
* string equal to this {@code String} object as determined by
* the {@link #equals(Object)} method, then the string from the pool is
* returned. Otherwise, this {@code String} object is added to the
* pool and a reference to this {@code String} object is returned.
* <p>
public native String intern();
Copy the code
Intern () returns a reference to a string in the constant pool if the string already exists in the constant pool. Otherwise, add the object to the constant pool and return the reference.
A Twitter engineer at QCon shared an example of how they optimized String objects using the String.Intern () method to reduce storage from 20 gigabytes to just a few hundred megabytes. Intern () : String. Intern () : String. Intern () : String.
public static void main(String[] args) {
String str = new String("pingtouge");
String str1 = new String("pingtouge");
System.out.println("No intern() method used:"+(str==str1));
System.out.println("Without intern() method, STR:"+str);
System.out.println("Without intern() method,str1:"+str1);
String str2= new String("pingtouge").intern();
String str3 = new String("pingtouge").intern();
System.out.println("Use intern() method:"+(str2==str3));
System.out.println("Using intern() method,str2:"+str2);
System.out.println("Using intern() method,str3:"+str3);
}
Copy the code
If string.intern () is not used, String objects constructed with the same value return different object reference addresses. If string.intern () is used, String objects constructed with the same value return the same object reference addresses. It saves us a lot of space.
The string.intern () method is a good one, but it should be used in the context of the scene, because the constant pool implementation is similar to a HashTable implementation. The larger the HashTable stores, the more time it takes to iterate. If the data is too large, the entire string constant pool will be overloaded.
Flexible string splitting
String splitting is one of the common operations of string operations. For string splitting, most people use the Split() method. In most cases, the Split() method uses regular expression, which has no problems in itself, but the performance of regular expression is very unstable. Improper use can cause backtracking problems, which can lead to high CPU levels. The Split() method does not use regular expressions in two cases:
- The incoming parameters of length 1, and does not contain “. $| () [{^? * + \ “regex metacharacters, will not use regular expressions
- If the argument is passed in length 2, the first character is a backslash, and the second character is not an ASCII number or ASCII letter, the regular expression will not be used
If string.indexof () does not meet the splitting requirements, use Split(). Backtracking is important when using the Split() method to Split strings.
Article insufficient place, hope everybody gives directions a lot, common study, common progress