What else can String be optimized for? Are you framing me?

Don’t panic, today I will show you a different String, from the root to the G-spot.

And I shared an example: through performance tuning, we can easily store tens of gigabytes of data in 100 megabytes of memory.

A String is the type of object we touch every day, but its performance is often overlooked.

Love her, can not just play together, to understand String’s heart, do a “heart of the tiger, smell roses” warm man.

Through the following analysis, we uncover her clothes step by step, go to the heart, raise a Level, let String directly take off:

  1. Properties of string objects;
  2. Immutability of String;
  3. Large string construction techniques;
  4. String. Intern Saves memory;
  5. String segmentation techniques;

String body decryption

To learn more, start with the basics…

The String Creator has made a number of optimizations to save memory and improve String performance:

Java 6 and before

The data is stored in an array of char[], and String gets the String by positioning the char[] data with the offset and count attributes.

This can quickly and efficiently locate and share array objects and save memory, but can lead to memory leaks.

Why might sharing a CHAR array cause a memory leak?

String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
}

public String substring(int beginIndex, int endIndex) {
    //check boundary
    return  new String(offset + beginIndex, endIndex - beginIndex, value);
}
Copy the code

When substring() is called, a new string is created, but the value of the string still refers to the same array in memory, as shown below:

If we’re just getting a little bit of a string from subString, and the original string is very large, then the object of the substring is always referenced.

In this case, the String cannot be reclaimed, resulting in memory leaks.

If you have a lot of these operations that retrieve a small part of a larger string by substring, you will run out of memory because of a memory leak.

JDK7, 8

Offset and count are removed to reduce the memory footprint of String objects.

The substring source code:

public String(char value[], int offset, int count) {
    this.value = Arrays.copyOfRange(value, offset, offset + count);
}

public String substring(int beginIndex, int endIndex) {
    int subLen = endIndex - beginIndex;
    return new String(value, beginIndex, subLen);
}
Copy the code

Substring () returns a new String object from new String(), which deeply copies a new character array from array.copyofrange () when creating the new object.

As shown below:

The string. substring method is no longer sharedchar[]Array of data to solve the problem of possible memory leaks.

Java 9

Change char[] field to byte[] and add coder property.

Why did you change it like that, Yard?

A char character consists of 2 bytes and 16 bits. Storing characters (characters in a byte) in a single byte encoding is wasteful.

To save memory, we use an 8-bit byte array of 1 byte to hold strings.

The goddess of thrift, who doesn’t love…

The new coder property is used to calculate the length of the string or to use the indexOf () method, which calculates the length of the string based on the encoding type.

Coder values represent different encoding types:

  • 0: usedLatin-1(single-byte code);
  • 1: usingUTF-16.

Immutability of String

Now that you know the basics of strings, strings have one more sexy feature than their appearance. They are decorated with the final keyword, as are char arrays.

We know that a class is final to indicate that the class is not inheritable, while char[] is final+private to indicate that the String cannot be changed.

Once a String object has been created, it cannot be changed.

The benefits of final decoration

security

When you call other methods, such as some system-level operation instructions, there may be a series of validations.

If a class is mutable, its internal values may change after you validate it, which can cause serious system crashes.

High-performance cache

Immutable strings ensure that hash values are unique, allowing containers like HashMaps to implement key-value caching.

Implement a string constant pool

The string constant pool is implemented because it is immutable.

String constant pool refers to that when creating a string, first check the “constant pool” to see if the string has been created.

If so, no new space is opened to create a string, but a reference to that string from the constant pool is returned directly to the object.

There are two ways to create strings:

  • String str1 = “bytes”;
  • String str2 = new String(” code “);

When a string object is created in code using the first method, the JVM first checks to see if the object is in the string constant pool, and if so, returns a reference to the object.

Otherwise a new string will be created in the constant pool and the reference will be returned.

This saves memory by reducing repeated creation of string objects with the same value.

In the second way, when the class file is compiled, the “code byte” string will be put into the constant structure, and when the class is loaded, the “code byte” will be created in the constant pool.

When new is called, the JVM command will invoke the String constructor and create a String in heap memory that points to the “code byte” String in the “constant pool” and STR to the String that was just created on the heap.

The diagram below:

What are objects and object references?

STR is a literal on the stack. It refers to a String in the heap, not the object itself.

An object is a memory address in memory, and STR is a reference to that memory address.

That is, STR is not an object, but a reference to an object.

What exactly does immutable string mean, code guy?

String str = "Java";
str = "Java,yyds"
Copy the code

The first time I assign “Java” and the second time I assign “Java, YYds”, the value of STR does change. Why do I still say that strings are immutable?

This is because STR is only a reference to a String, not the object itself.

The real object is still in memory, unchanged.

Optimization of actual combat

Understanding the implementation principle and characteristics of String objects, it is time to go deep into the goddess’s heart, combined with the actual scene, how to optimize the use of String objects to the next level.

How are large strings constructed

Since strings are immutable, do we create multiple objects when we frequently concatenate strings?

String str = "A toad hits a frog." + "Long ugly" + "Flower of play";
Copy the code

Don’t you think that Sir Into “toad hit frog” object, regenerated into “toad hit frog ugly” object, and finally generated “toad hit frog ugly flower” object.

In practice, only one object is generated.

Why is that?

The code is ugly, but the compiler optimizes it automatically.

Look at the following example:

String str = "Little Frog";

for(int i=0; i<1000; i++) {
     str += i;
}
Copy the code

After the above code is compiled, you can see that the compiler has also optimized this code.

Java tends to use StringBuilder when concatenating strings to improve program efficiency.

String str = "Little Frog";

for(int i=0; i<1000; i++) {
            str = (new StringBuilder(String.valueOf(str))).append(i).toString();
}
Copy the code

Even so, the StringBuilder object is created repeatedly within the loop.

On the blackboard

So when doing String concatenation, I recommend that you use String Builder explicitly to improve system performance.

If concatenation of strings is thread-safe in multithreaded programming, you can use StringBuffer.

Use intern to save memory

Intern () ¶

Intern () is a local method whose definition says that when you call intern, you return a reference to the string if it is already in the string constant pool.

Otherwise, the string is added to the constant pool and a reference to the string is returned.

If it is not included, the string is added to the constant pool and a reference to the object is returned.

In what cases is intern() appropriate?

Twitter engineers shared an example of string.Intern (), where Every time Twitter posted a message status, it generated an address message. Given the size of Twitter users at the time, it was estimated that the server needed 20 GIGABytes of memory to store the address message.

public class Location {
    private String city;
    private String region;
    private String countryCode;
    private double longitude;
    private double latitude;
}
Copy the code

Considering that many users overlap in address information, such as country, province, city, etc., this part of information can be listed as a separate class to reduce duplication, the code is as follows:

public class SharedLocation {

  private String city;
  private String region;
  private String countryCode;
}

public class Location {

  private SharedLocation sharedLocation;
  double longitude;
  double latitude;
}
Copy the code

Through optimization, the data storage size is reduced to about 20 GB.

But it’s still pretty big for memory to store this data, so what do we do?

Twitter engineers optimized the storage of String objects by using string.Intern () to reduce the storage size of very repetitive address information from 20 gigabytes to a few hundred megabytes.

The core code is as follows:

SharedLocation sharedLocation = new SharedLocation();
sharedLocation.setCity(messageInfo.getCity().intern());
sharedLocation.setCountryCode(messageInfo.getRegion().intern());
sharedLocation.setRegion(messageInfo.getCountryCode().intern());
Copy the code

Here’s a simple example:

String a =new String("abc").intern();
String b = new String("abc").intern();

System.out.print(a==b);
Copy the code

Output: true.

When the class is loaded, a string object with the content “ABC” is created in the constant pool.

When you create a local A variable, calling new Sting() creates a String in the heap, and the char array in the String will reference the strings in the constant pool.

After calling the intern method, the constant pool looks for references equal to the string object and returns references if there are.

When we create the b variable, calling new Sting() creates a String in the heap, and the char array in the String will reference the strings in the constant pool.

After calling the intern method, the constant pool looks for references equal to the string object and returns references to local variables.

The two objects that were just in the heap, since there are no references to them, will be garbage collected.

So a and B refer to the same object.

There are clever ways to split strings

The Split() method uses regular expressions to achieve its powerful splitting power, and the performance of regular expressions is very erratic.

Improper use can cause backtracking problems, which can lead to high CPU levels.

The engine implementation of Java regular expression is NFA (Non Deterministic Finite Automaton, Deterministic Finite Automaton) Automaton. This kind of regular expression engine will have backtracking during character matching, and once backtracking occurs, Depending on the number and complexity of backtracking, it can take a long time, whether it’s minutes or hours.

So we should be careful with the Split() method. We can use string.indexof () instead of Split() to Split strings.

Summary and Reflection

We learned the composition of String from its evolution, constantly changing member variables to save memory.

Its immutability thus realizes the string constant pool, reduces the repeated creation of the same string, and saves memory.

However, because of this feature, we need to display StringBuilder when doing long string concatenation to improve string concatenation performance.

Finally, on the optimization side, we can save memory by using the intern method, which allows variable string objects to reuse objects of the same value in the constant pool.

Finally, I have a question for you. Please leave a comment in the comments section. If you like more, you will get a book from me.

Three objects are created in three different ways, and then matched in pairs. Are the two objects matched in each group equal? The code is as follows:

String str1 = "abc";
String str2 = new String("abc");
String str3 = str2.intern();
assertSame(str1 == str2);
assertSame(str2 == str3);
assertSame(str1 == str3)
Copy the code

Public zhong number background reply: “String” to get the answer.