In-depth understanding of the JVM (viii) – String constant pool

preface

  • The String constant Pool is also called StringTable,String Intern Pool

  • To reduce the number of strings created in the JVM, the virtual machine maintains a pool of string constants

  • When creating a String, the JVM first checks the pool of String constants. If the String’s constant value already exists in the pool, it simply returns a reference to the object in the pool. If not, it instantiates a String and places it in the pool

Version changes

  • In Java6 and before, the string constant pool is stored in the permanent generation.
  • In Java7, Oracle engineers made a big change to the string pool logic, that is, to relocate the string constant pool to the Java heap.
    • All strings are stored in the Heap, just like any other normal object, which allows you to simply resize the Heap when tuning your application.
    • The String constant pool concept was used a lot, but this change is reason enough to reconsider using string.intern () in Java7.
  • Java8 meta, string constants in the heap.
  • String defines final char [] value internally in JDk8 and before to store String data. Byte [];

Why should StringTable be adjusted?

  • PermSize by default is smaller and easier to use on OOM.
  • Permanent generation garbage collection frequency is low, and strings are heavily used.

Basic features of String

  • String: a String, raised by a pair of “”.

    String sl = "hello"; String s2 = new String ("hello");Copy the code
  • String declarations are final and cannot be inherited; Immutable property.

  • String implements the Serializable interface: indicates that the String supports serialization. The Comparable interface, which means strings can be compared in size, and the CharSequence interface, which can be implemented through character arrays, are implemented

  • String defines final char [] value internally in JDk8 and before to store String data. Byte [];

String immutability

  • When reassigning a value to a string, the specified memory region assignment must be overridden. The original value cannot be used to assign the value.
  • When concatenating an existing string, you also need to specify a new memory region assignment. The original value cannot be used for assignment.
  • When you call the replace () method of String to modify a specified character or String, you also need to specify a new memory region assignment. You cannot use the original value for assignment.

Represents the immutability of String

public class StringExer { String str = new String("good"); // the string constant pool also has good int anInt = 1; // The string constant pool has good int anInt = 1; char[] ch = {'t', 'e', 's', 't'}; public void change(String str, char ch[], int i) { System.out.println("======" + str); //good //String is immutable. STR (String without this) on the stack is changed to "test OK ", that is, "test OK "is added to the String constant pool. STR refers to the String constant pool (STR = "test OK "; System.out.println("str======" + str); //str======test ok System.out.println("this.str======" + this.str); //this.str======good ch[0] = 'b'; anInt = 2; } public void change2(String str, char ch[], int i) { System.out.println("======" + str); //======good // add this explicitly to the object STR this. STR = "test OK "; System.out.println("str======" + str); //str======good System.out.println("this.str======" + this.str); //this.str======test ok ch[0] = 'b'; anInt = 2; } public void change3(String a, char ch[], int i) { System.out.println("======" + str); //======good // if a is added to the stack and STR is this, STR = "test OK "; System.out.println("======" + str); //======test ok ch[0] = 'b'; this.anInt = 2; } public static void main(String[] args) { /** * ======good * str======test ok * this.str======good * good * best * 2 */  StringExer ex = new StringExer(); ex.change(ex.str, ex.ch, ex.anInt); System.out.println(ex.str); //good System.out.println(ex.ch); //best System.out.println(ex.anInt); //2 /** * ======good * str======good * this.str======test ok * test ok * best * 2 */ StringExer ex2 = new StringExer(); ex2.change2(ex2.str, ex2.ch, ex2.anInt); System.out.println(ex2.str); //test ok System.out.println(ex2.ch); //best System.out.println(ex2.anInt); //2 /** * ======good * ======test ok * test ok * best * 2 */ StringExer ex3 = new StringExer(); ex3.change3(ex3.str, ex3.ch, ex3.anInt); System.out.println(ex3.str); //test ok System.out.println(ex3.ch); //best System.out.println(ex3.anInt); / / 2}}Copy the code

Char [] to byte []

Change the motivation: openjdk.java.net/jeps/254

The current implementation of the “String” class stores characters in a “char” array, using two bytes (16 bits) per character. Data collected from many different applications shows that strings are the main component used by the heap, and that most “String” objects contain only Latin characters. Such characters require only one byte of storage space, so half of the internal “char” array of such a “String” object is unused.

JDK7

JDK11

  1. Data of type CHAR takes up two bytes in the JVM and uses UTF-16 encoding.

The JVM specification describes it as follows:

char, whose values are 16-bit unsigned integers representing Unicode code points in the Basic Multilingual Plane, Encoded with UTF-16, and whose default value is the null code point (‘\u0000’).

So using char[] to represent a String results in two bytes being used even if a character in a String can be represented in a single byte, and in practice single-byte characters are used most frequently in development.

  1. Optimization for byte []

It is not enough to optimize for byte[], the key is to provide isO-8859-1 /Latin-1 encoding possibilities (Latin-1 is ISO-8859-1).

The Latin-1 encoding represents characters in a single byte, saving half the space of utF-16 with two bytes.

The String class has a decoder bit, which is used to indicate whether the encoding is UTF-16 or Latin-1.

  1. Java automatically sets an encoding based on the contents of the string, either UTF-16 or Latin-1.
  • Using LATIN1 encoding, it takes 4 bytes, whereas the original char[] takes 8 bytes
String name="jack";
Copy the code
  • Even if byte[] is used to represent a String, it still uses UTF-16 encoding. As before, there is no space saving (the Latin-1 encoding set supports only a limited number of characters, which does not support Chinese characters, so utF-16 is retained).
String name=" xiaoming ";Copy the code

Conclusion:

Instead of using char[] to store strings, use byte[] to save space.

StringBuffer and StringBuilder, virtual machines and so on have been modified accordingly

String constant pool (StringTable)

  • The string constant pool does not store the same string.

  • The string constant pool is a fixed-size Hashtable. If you put too many strings in String Poo1, you will have a Hash conflict that will make the list very long, and the immediate effect of having a long list is that performance will suffer when you call string.intern.

  • -xx :StringTableSize: Sets the length of the StringTable

  • In JDk6, StringTable is fixed at 1009 in length, so if there are too many strings in the constant pool, the efficiency will decline quickly. StringTableSize setting does not require

  • In JDk7, the default length of StringTable is 60013. There is no requirement for StringTableSize

  • The default value is 60013, and 1009 is the minimum value that can be set. If the value is smaller than this value, an error message is displayed indicating that the setting fails.

String memory allocation

In The Java language, there are eight basic data types and one more special type, String. These types provide the concept of a constant pool in order to make them faster and more memory efficient on the run.

The constant pool is similar – a cache provided at the Java system level. The constant pools of the eight basic data types are all system coordinated, and the String constant pool is special. There are two main ways to use it.

  • Strings declared directly in double quotes are stored directly in the constant pool.

String info = "hello" ;

  • If a String object is not declared in double quotes, you can use the intern() method provided by String.

  • Assign a string literal (as opposed to new, which is in the heap), with the string value declared in the string constant pool.

Proof string in heap OOM

/** * jdk8中 : * -xx :PermSize=6m -xx :MaxPermSize=6m -xms6m -xmx6m ** jdk8中 : * -xx :MaxPermSize=6m -xms6m -xmx6m  * -XX:MetaspaceSize=9m -XX:MaxMetaspaceSize=9m -Xms6m -Xmx6m */ public class StringTest3 { public static void Set<String> Set = new HashSet<String>(); // The range of values is sufficient to produce an OOM from a 6MB PermSize or heap. long i = 0; while(true){ set.add(String.valueOf(i++).intern()); }}}Copy the code

IDEA Displays the character uniqueness of the string constant pool

public class StringTest4 { public static void main(String[] args) { System.out.println(); //2121 System.out.println("1"); //2122 System.out.println("2"); System.out.println("3"); System.out.println("4"); System.out.println("5"); System.out.println("6"); System.out.println("7"); System.out.println("8"); System.out.println("9"); System.out.println("10"); //2131 // The following strings "1" through "10" will not reload system.out.println ("1"); //2132 System.out.println("2"); //2132 System.out.println("3"); System.out.println("4"); System.out.println("5"); System.out.println("6"); System.out.println("7"); System.out.println("8"); System.out.println("9"); System.out.println("10"); / / 2132}}Copy the code

String concatenation case analysis

“+” concatenation string low-level analysis

public void test3() { String s1 = "a"; String s2 = "b"; String s3 = "ab"; String s4 = s1 + s2; System.out.println(s3 == s4); //false }Copy the code

The underlying execution of s1 + s2 is as follows :(the variable s is my temporary definition)

1 StringBuilder s = new StringBuilder();

2 s.append(“a”)

3 s.append(“b”)

4 s.string () (note here: approximately equal to new String(“ab”))

Add: StringBuilder is used after JDk5.0, StringBuffer is used before JDk5.0

Compile-time optimization scenarios and run-time determination of differences

  1. Compile-time optimization scenario: The bottom layer is a pool of string constants
  • Literals + literals

String s1 = "a" + "b" + "c";

Public void test1() {// literal + literal String s1 = "a" + "b" + "c"; // The javac compiler is optimized at compile time: equivalent to "ABC" String s2 = "ABC "; Class * String s1 = "ABC "; class * String s1 =" ABC "; * String s2 = "abc" */ System.out.println(s1 == s2); //true System.out.println(s1.equals(s2)); //true }Copy the code

You can also see this by viewing the compiled bytecode file directly with IDEA

  • Constant + constant (literal)

    public void test4() { final String s1 = "a"; final String s2 = "b"; String s3 = "ab"; String s4 = s1 + s2; System.out.println(s3 == s4); //true }Copy the code

String concatenation + does not necessarily use a StringBuilder! If both left and right sides of the concatenation symbol are string constants or constant references, compile-time optimization is still used, i.e. in a non-StringBuilder manner.

It is recommended to use final when it can be used for structures that modify classes, methods, basic data types, and quantities that reference data types.

  1. Runtime determination: equivalent to the underlying is new an object put into the heap
  • As long as a variable appears before and after the concatenation (as long as it is not a constant: final decorated string)

String s5 = s1 + "hadoop";

String s6 = "javaEE" + s2;

String s7 = s1 + s2;

public void test2() { String s1 = "javaEE"; String s2 = "hadoop"; String s3 = "javaEEhadoop"; String s4 = "javaEE" + "hadoop"; JavaEEhadoop String s5 = s1 + "Hadoop "; javaEEhadoop String s5 = s1 +" Hadoop "; javaEEhadoop String s5 = s1 + "Hadoop "; String s6 = "javaEE" + s2; String s7 = s1 + s2; System.out.println(s3 == s4); //true System.out.println(s3 == s5); //false System.out.println(s3 == s6); //false System.out.println(s3 == s7); //false System.out.println(s5 == s6); //false System.out.println(s5 == s7); //false System.out.println(s6 == s7); //false //intern(): if javaEEhadoop exists in the string constant pool, return the javaEEhadoop address in the string constant pool; // If javaEEhadoop does not exist in the string constant pool, load a copy of javaEEhadoop in the constant pool and return the address of the secondary object. String s8 = s6.intern(); System.out.println(s3 == s8); //true }Copy the code

conclusion

  1. The concatenation of constants and constants results in the constant pool, the principle is compile-time optimization
  2. There are no constants with the same content in the constant pool.
  3. As long as one of them is a variable, the result is in the heap. The principle of variable concatenation is StringBuilder
  4. If the result of concatenation calls the intern() method, it actively puts a string object that is not already in the constant pool into the pool and returns the address of that object.

The use of the intern ()

The intern method queries from the pool of string constants to see if the current string exists:

  • If not, the current string is put into the constant pool and the local string address reference is returned.
  • Returns the address of the string constant pool if it exists

(Return address to myInfo)

String myInfo = new String("I love you").intern();

(No return address: just trying to put it in the string constant pool)

new String("I love you").intern();

If the string. intern method is called on any String, the class instance it returns must be the same as the String instance that appears directly as a constant. Therefore, the following expression. (“a” + “b” + “c”). Intern () == “ABC”

In general terms, Interned String means making sure that only one copy of the String is in memory to save memory and speed up string-manipulation tasks. Note that this value is stored in the String Intern Pool.

Intern () version difference

In JDk1.6, try to put this string object into a string pool.

  • If it is in the string pool, it is not put in. Returns the address of an object in an existing string pool
  • If not, it will put thisobjectMake a copy, put it into the string pool, and return the address of the object in the string pool

Since Jdk1.7, try to put the string object into the string pool.

  • If it is in the string pool, it is not put in. Returns the address of an object in an existing string pool
  • If not, it willThe reference address of an object in the heapMake a copy, put it into the string pool, and return the address referenced in the string pool

How many objects does new String(“a”) create?

Examples do not indicate that all are in the JDK8 environment

  1. How many objects does new String(“a”) create?

LDC: Put string into string constant pool bytecode command to see url

  • Object 1: The string object created in the heap space by the new keyword
  • Object 2: object “ab” in the string constant pool.
  1. New String(“a”) + new String(“b”)

ToString () creates a new String(value, 0, count). The toString() bytecode looks like this:

New String(value, 0, count) and new String(“a”) bytecode instructions are different. New String(“a”) is passed as an explicit ‘a’ argument, which is a String constant, and it puts a ‘a’ in the pool. (Why put a value instead of a reference?) : because the bytecode command “a” is called before the constructor is called, it is equivalent to having the literal “a” before new)

  • Object 1: New StringBuilder()
  • Object 2: New String(“a”)
  • Object 3: “a” in the constant pool
  • Object 4: New String(“b”)
  • Object 5: “b” in the constant pool
  • Object 6: new String(“ab”) (StringBuilder toString() equivalent: variable = “ab”; New String(variable), in the String constant pool, does not generate “ab”)

Intern () Scenario analysis

Example 1

Public static void main(String[] args) {//new String("ab") String s = new String("a") + new String("b"); // in jdk6: create a String "ab" in the String pool; // in jdk8: create a reference to new String("ab") instead of "ab" in the String pool. This reference is mandatory String s2 = s.inic (); System.out.println(s2 == "ab"); //jdk6:true jdk8:true System.out.println(s == "ab"); //jdk6:false jdk8:true System.out.println(s == s2); //jdk8:true } }Copy the code

Note: in jdk8, instead of creating the String “ab” in the String constant pool, we create a reference to the new String(“ab”) in the heap and return this reference

Jdk6 has:

Example 2

Public class StringIntern {public static void main(String[] args) {String s = new String("1"); // Before this method is called, "1" already exists in the string constant pool, do not do s.initn (); String s2 = "1"; String s2 = "1"; System.out.println(s == s2); //jdk6: false jdk7/8: false //s3 variable record address: new String("11"), constant pool has 1 String s3 = new String("1") + new String("1"); // After the last line of code is executed, does the string constant pool exist "11"? Answer: no!! // Generate "11" in the string constant pool. Jdk6: creates a new object "11" with a new address. //jdk7: Instead of creating "11" in the constant, create an address s3.intern() pointing to new String("11") in the heap space; String s4 = "11"; String s4 = "11"; String s4 = "11"; System.out.println(s3 == s4); //jdk6: false jdk7/8: true}}Copy the code

Example 3

Public class StringIntern1 {public static void main(String[] args) {//new String("11") s3 points to the heapobject, S3 = new String("1") + new String("1"); // After the last line of code is executed, does the string constant pool exist "11"? Answer: no!! String s4 = "11"; String s4 = "11"; // The query pool contains 11? String s5 = s3.intern(); String s5 = s3.intern(); System.out.println(s3 == s4); //false System.out.println(s5 == s4); //true } }Copy the code

Meituan: deep analysis of String#intern

An efficiency test

public class StringIntern2 { static final int MAX_COUNT = 1000 * 10000; static final String[] arr = new String[MAX_COUNT]; Public static void main(String[] args) {Integer[] data = new Integer[]{1,2,3,4,5,6,7,8,9,10}; long start = System.currentTimeMillis(); for (int i = 0; i < MAX_COUNT; i++) { arr[i] = new String(String.valueOf(data[i % data.length])); //5651ms //arr[i] = new String(String.valueOf(data[i % data.length])).intern(); //734ms } long end = System.currentTimeMillis(); System.out.println(" time spent: "+ (end-start)); try { Thread.sleep(1000000); } catch (InterruptedException e) { e.printStackTrace(); } System.gc(); }}Copy the code
  1. Do not use the intern ()

Time spent: 5651

  1. Use the intern ()

Time spent: 765

Using intern() runs faster and uses less memory.

Large website platform, need to store a large number of strings in memory. For example, social networking sites, many people store: Beijing, Haidian district and other information. If the string calls the intern() method, the memory size will be significantly reduced.

Garbage collection of string constant pools

/** * String garbage collection: * -Xms15m -Xmx15m -XX:+PrintStringTableStatistics -XX:+PrintGCDetails * */ public class StringGCTest { public static void main(String[] args) { for (int j = 0; j < 100000; j++) { String.valueOf(j).intern(); }}}Copy the code

Operation parameters: – Xms15m – Xmx15m – XX: + PrintStringTableStatistics – XX: + PrintGCDetails

Result GC

The String in G1 is deduplicated

The G1 deduplicates a String

  1. Tests on many Java applications, large and small, yielded the following results:.
  • String objects make up 25% of the heap survival data set
  • The number of duplicate strings in the heap survival data set is 13.5%
  • The average length of a String is 45
  1. The bottleneck for many large-scale Java applications is memory, and tests have shown that in these types of applications, almost 25% of the data sets living on the Java heap are strings. Furthermore, almost half of these strings are repeated, which means :string1.equals(string2)=true. Having duplicate Strings on the heap is necessarily a waste of memory. This project will implement automatic and continuous deduplicating of duplicate Strings in the G1 garbage collector to avoid wasting memory.

  2. To heavy process

  • When the garbage collector is working, it accesses objects that are alive on the heap. Each accessed object is checked to see if it is a candidate String to deduplicate.
  • If so, a reference to this object is inserted into the queue for further processing. A de-duplicated thread runs in the background, processing the queue. Processing an element of the queue means removing the element from the queue and then trying to recreate the String it references.
  • Use a Hashtable to record all the unique char arrays used by strings. When deduplicated, the hashtable is looked up to see if an identical char array already exists on the heap.
  • If so, the String will be adjusted to reference that array, freeing the reference to the original array, and eventually being collected by the garbage collector.
  • If the lookup fails, the char array is inserted into the Hashtable so that it can be shared at a later time.
  1. Set the parameters
  • UseStringDeduplication (bool) : Enables String deduplication. This function is disabled by default. You need to enable it manually.
  • PrintStringDeduplicationStatistics (bool) : print to heavy statistics in detail
  • StringDeduplicationAgeThreshold (uintx) : reaching the age of String object is considered to candidates