Note source: Silicon Valley JVM complete tutorial, millions of playback, the peak of the entire network (Song Hongkang details Java virtual machine)
Update: gitee.com/vectorx/NOT…
Codechina.csdn.net/qq_35925558…
Github.com/uxiahnan/NO…
[TOC]
10. StringTable
10.1. Basic features of String
- String: A String represented by a pair of “”
- String is declared final and cannot be inherited
- String implements the Serializable interface: String supports serialization.
- String implements the Comparable interface: Strings can compare sizes
- String internally defines final char[] value in JDK8 and before to store String data. Byte [] for JDK9
10.1.1. String Stores structural changes in JDK9
JEP 254: Compact Strings (Java.net)
Motivation
The current implementation of the String class stores characters in a char array, using two bytes (sixteen bits) for each character. Data gathered from many different applications indicates that strings are a major component of heap usage and, moreover, that most String objects contain only Latin-1 characters. Such characters require only one byte of storage, hence half of the space in the internal char arrays of such String objects is going unused.
Description
We propose to change the internal representation of the
String
class from a UTF-16char
array to abyte
array plus an encoding-flag field. The newString
class will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string. The encoding flag will indicate which encoding is used.String-related classes such as
AbstractStringBuilder
,StringBuilder
, andStringBuffer
will be updated to use the same representation, as will the HotSpot VM’s intrinsic string operations.This is purely an implementation change, with no changes to existing public interfaces. There are no plans to add any new public APIs or other interfaces.
The prototyping work done to date confirms the expected reduction in memory footprint, substantial reductions of GC activity, and minor performance regressions in some corner cases.
motivation
The current implementation of the String class stores characters in a CHAR array, using two bytes (16 bits) for each character. Data collected from many different applications shows that strings are a major part of heap usage, and that most string objects contain only Latin-1 characters. These characters require only one byte of storage space, so half of the internal character array of these string objects is unused.
instructions
We recommend changing the internal representation of the String class from a UTF-16 character array to a byte array plus an encoded flag field. The new String class stores character encodings in ISO-8859-1/Latin-1 (one byte per character) or UTF-16 (two bytes per character), depending on the contents of the String. The encoding flag will indicate which encoding is used.
String-related classes such as AbstractStringBuilder, StringBuilder, and StringBuffer will be updated to use the same representation, as will HotSpot VM’s inherent string operations.
This is purely an implementation change, with no change to the existing public interface. There are currently no plans to add any new public apis or other interfaces.
The prototyping work done so far confirms the expected reduction in memory footprint, a significant reduction in GC activity, and a slight performance setback in some corner cases.
Conclusion: String is no longer stored in char[], but in byte [], which saves some space
public final class String implements java.io.Serializable.Comparable<String>, CharSequence {
@Stable
private final byte[] value;
}
Copy the code
10.1.2. Basic features of String
String: represents an immutable sequence of characters. Immutability for short.
- When reassigning a value to a string, the specified memory area assignment must be overridden. The original value cannot be used for assignment.
- When concatenating an existing string, you also need to reassign the memory area instead of using the original value.
- When you call string’s replace() method to modify a specified character or string, you also need to reassign the memory region. You cannot use the original value.
Assigns a value to a string literal (as opposed to new), where the string value is declared in the string constant pool.
The string constant pool does not store strings with the same contents
The String Pool of String is a fixed size Hashtable. The default size is 1009. If you put too many strings into a String Pool, it will cause Hash collisions and the linked list will be too long. The immediate effect of a long list is that it will degrade performance when you call string.intern.
Use -xx :StringTablesize to set the length of a StringTable
-
In JDK6, stringTables are fixed, with a length of 1009, so if there are too many strings in the constant pool, the efficiency drops quickly. StringTablesize There is no requirement for setting
-
In JDK7, the default StringTable length is 60013. There is no requirement for setting StringTablesize
-
In JDK8, 1009 is the minimum that can be set to StringTable length
10.2. Memory Allocation for String
There are eight basic data types in the Java language and a special type, String. These types provide a constant pool concept in order to make them faster and more memory efficient during execution.
A constant pool is like a cache provided at the Java system level. The constant pool for the eight basic data types is system-coordinated, with the String constant pool being special. It can be used in two main ways.
-
Strings declared directly in double quotes are stored directly in the constant pool.
-
If a String object is not declared in double quotes, you can use the String supplied intern() method. I’ll focus on that later
Java 6 and prior, string constant pools were stored in the persistent generation
In Java 7, Oracle engineers made a major change to the string pool logic by moving the string constant pool into the Java heap
-
All strings are stored in the Heap, just like any other normal object, which allows you to adjust the Heap size only when tuning applications.
-
The String constant pool concept was used a lot, but this change is reason enough to reconsider using string.intern () in Java 7.
Java8 meta space, string constants in the heap
Why should StringTable be adjusted?
Java SE 7 Features and Enhancements (oracle.com)
Synopsis: In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
Summary: In JDK 7, internal strings are no longer allocated in the permanent generation of the Java heap, but in the main part of the Java heap (called the young generation and the old generation), along with other objects created by the application. This change will result in more data residing in the main Java heap and less data in the permanent generation, so it may be necessary to resize the heap. Most applications will see a relatively small difference in heap usage due to this change, but large applications that load many classes or use the String.Intern () method heavily will see a more noticeable difference.
10.3. Basic operations on String
@Test
public void test1(a) {
System.out.print1n("1"); / / 2321
System.out.println("2");
System.out.println("3");
System.out.println("4");
System.out.println("5");
System.out.println("6");
System.out.println("Seven");
System.out.println("8");
System.out.println("9");
System.out.println("10"); / / 2330
System.out.println("1"); / / 2321
System.out.println("2"); / / 2322
System.out.println("3");
System.out.println("4");
System.out.println("5");
System.out.print1n("6");
System.out.print1n("Seven");
System.out.println("8");
System.out.println("9");
System.out.println("10");/ / 2330
}
Copy the code
The Java language specification requires that identical String literals contain the same sequence of Unicode characters (constants containing the same sequence of code points) and must refer to the same String class instance.
class Memory {
public static void main(String[] args) {//line 1
int i= 1;//line 2
Object obj = new Object();//line 3
Memory mem = new Memory();//Line 4
mem.foo(obj);//Line 5
}//Line 9
private void foo(Object param) {//line 6
String str = param.toString();//line 7
System.out.println(str);
}//Line 8
}
Copy the code
10.4. String concatenation
- Concatenation of constants to constants results in the constant pool, which is optimized at compile time
- There are no variables with the same content in the constant pool
- As long as one of them is a variable, the result is in the heap. The principle for variable concatenation is StringBuilder
- If the result of the concatenation calls intern(), it actively puts string objects that are not already in the constant pool into the pool and returns the object’s address
For example 1
public static void test1(a) {
// These are constants, and the code will be optimized at compile time
String s1 = "ABC "; String s1 =" ABC "; The code is optimized
String s1 = "a" + "b" + "c";
String s2 = "abc";
S1 and s2 actually refer to the same value in the string constant pool
System.out.println(s1 == s2);
}
Copy the code
For example 2
public static void test5(a) {
String s1 = "javaEE";
String s2 = "hadoop";
String s3 = "javaEEhadoop";
String s4 = "javaEE" + "hadoop";
String s5 = s1 + "hadoop";
String s6 = "javaEE" + s2;
String s7 = s1 + s2;
System.out.println(s3 == s4); // true compile-time optimization
System.out.println(s3 == s5); // false s1 is a variable and cannot be optimized at compile time
System.out.println(s3 == s6); // false s2 is a variable and cannot be optimized at compile time
System.out.println(s3 == s7); // false s1 and s2 are variables
System.out.println(s5 == s6); // false S5, s6 different object instances
System.out.println(s5 == s7); // false S5, s7 different object instances
System.out.println(s6 == s7); // false S6, s7 different object instances
String s8 = s6.intern();
System.out.println(s3 == s8); // True intern, s8, like S3, points to "javaEEhadoop" in string constant pool
}
Copy the code
For example, 3
public void test6(a){
String s0 = "beijing";
String s1 = "bei";
String s2 = "jing";
String s3 = s1 + s2;
System.out.println(s0 == s3); // false s3 points to the object instance, s0 points to "Beijing" in the string constant pool
String s7 = "shanxi";
final String s4 = "shan";
final String s5 = "xi";
String s6 = s4 + s5;
System.out.println(s6 == s7); // true s4 and S5 are final, and s6 is determined at compile time
}
Copy the code
- Do not use final, that is, variables. For example, lines S1 and s2 in S3 are concatenated using new StringBuilder
- Use the final modifier, which is a constant. The code is optimized in the compiler.In actual development, use final if possible
For example 4
public void test3(a){
String s1 = "a";
String s2 = "b";
String s3 = "ab";
String s4 = s1 + s2;
System.out.println(s3==s4);
}
Copy the code
The bytecode
When we look at the bytecode in example 4, we can see that s1 + S2 is actually a new StringBuilder object. Append method is used to add S1 and S2, and toString method is called to assign s4
0 ldc #2 <a>
2 astore_1
3 ldc #3 <b>
5 astore_2
6 ldc #4 <ab>
8 astore_3
9 new #5 <java/lang/StringBuilder>
12 dup
13 invokespecial #6 <java/lang/StringBuilder.<init>>
16 aload_1
17 invokevirtual #7 <java/lang/StringBuilder.append>
20 aload_2
21 invokevirtual #7 <java/lang/StringBuilder.append>
24 invokevirtual #8 <java/lang/StringBuilder.toString>
27 astore 4
29 getstatic #9 <java/lang/System.out>
32 aload_3
33 aload 4
35 if_acmpne 42 (+7)
38 iconst_1
39 goto 43 (+4)
42 iconst_0
43 invokevirtual #10 <java/io/PrintStream.println>
46 return
Copy the code
Performance comparison of string concatenation operations
public class Test{ public static void main(String[] args) { int times = 50000; // String long start = System.currentTimeMillis(); testString(times); long end = System.currentTimeMillis(); System.out.println("String: " + (end-start) + "ms"); // StringBuilder start = System.currentTimeMillis(); testStringBuilder(times); end = System.currentTimeMillis(); System.out.println("StringBuilder: " + (end-start) + "ms"); // StringBuffer start = System.currentTimeMillis(); testStringBuffer(times); end = System.currentTimeMillis(); System.out.println("StringBuffer: " + (end-start) + "ms"); } public static void testString(int times) { String str = ""; for (int i = 0; i < times; i++) { str += "test"; } } public static void testStringBuilder(int times) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < times; i++) { sb.append("test"); } } public static void testStringBuffer(int times) { StringBuffer sb = new StringBuffer(); for (int i = 0; i < times; i++) { sb.append("test"); }}}// Result String: 7963msStringBuilder: 1msStringBuffer: 4ms
Copy the code
In this experiment, the time of String stitching is about 8000 times that of StringBuilder.append, and the time of stringbuffer.append () is about 4 times that of StringBuilder.append()
As you can see, append with StringBuilder is no faster than concatenating strings with “+”
In practice, then, we should use StringBuilder for append operations whenever possible, regardless of thread safety, for operations that require multiple or large concatenations
What other things can we do to make strings more efficient?
The StringBuilder empty parameter constructor has an initial size of 16. If you know in advance how many strings you need to concatenate, you should specify capacity directly using the parameter constructor to reduce the number of capacity expansions.
/** * Constructs a string builder with no characters in it and an * initial capacity of 16 characters. */public StringBuilder(a) { super(16); }/** * Constructs a string builder with no characters in it and an * initial capacity specified by the {@code capacity} argument. * * @param capacity the initial capacity. * @throws NegativeArraySizeException if the {@code capacity} * argument is less than {@code0}. * /public StringBuilder(int capacity) { super(capacity); }Copy the code
10.5. Use of intern()
Explanation in the official API documentation
public String intern()
Returns a canonical representation for the string object.
A pool of strings, initially empty, is maintained privately by the class
String
.When the intern method is invoked, if the pool already contains a string equal to this
String
object as determined by theequals(Object)
method, then the string from the pool is returned. Otherwise, thisString
object is added to the pool and a reference to thisString
object is returned.It follows that for any two strings
s
andt
,s.intern() == t.intern()
istrue
if and only ifs.equals(t)
istrue
.All literal strings and String-valued expressions are interned. String literals are defined in Section 3.10.5 Of the Java™ Language Specification.
Returns:
a string that has the same contents as this string, but is guaranteed to be from a pool of unique strings.
When you call intern, if the pool already contains a String equal to the String, as determined by equals(Object), then the pool String is returned. Otherwise, the String is added to the pool and a reference to the String is returned.
It follows that for any two strings s and t, s.intern() == T.intern () is true if and only if S.quals (t) is true.
All literal strings and constant expressions with a string value are interned.
Returns a string with the same contents as this string, but from a unique string pool.
Intern is a native method that calls low-level C methods
public native String intern(a);
Copy the code
If a String is not declared in double quotes, you can use the String supplied intern method, which looks for the existence of the current String from the String constant pool and places the current String in the constant pool if it does not exist.
String myInfo = new string("I love atguigu").intern();
Copy the code
That is, if you call the String.Intern method on any String, the class instance it returns must be exactly the same as the String instance that appears directly as a constant. Therefore, the value of the following expression must be true
("a"+"b"+"c").intern() == "abc"
Copy the code
In layman’s terms, Interned String ensures that there is only one copy of a string in memory. This saves memory and speeds up string manipulation tasks. Notice that this value is going to be stored in the String Intern Pool
Use of INTERN: JDK6 vs JDK7/8
/** * ① String s = new String("1") * creates two objects * a new object in the heap * a String constant "1" in the String constant pool (note: * s2 refers to the address of "1" * s2 refers to the address of "1" * s2 refers to the address of "1" * s2 refers to the address of "1" * s2String s = new String("1"); s.intern(); String s2 ="1"; System.out.println(s==s2);// jdk1.6 false jdk7/8 false/* * ① String s3 = new String("1") + new String("1") * equivalent to new String(" 11") The string "11" is not generated in the constant pool; * * ② s3. Intern () * because there is no "11" in constant pool, S3 = new String("1") + new String("1"); s3 = new String("1"); s3.intern(); String s4 = "11"; System.out.println(s3==s4); / / jdk1.6 false jdk7/8 true
Copy the code
To summarize the use of String intern() :
In JDK1.6, try to put this string object into the string pool.
- If there is one in the string pool, it will not be added. Returns the address of an object in an existing string pool
- If not, will put thisObject is copied, and returns the address of the object in the string pool
As of JDK1.7, try to put this string object into the string pool.
- If there is one in the string pool, it will not be added. Returns the address of an object in an existing string pool
- If not, it willObject reference addressMake a copy, add it to the string pool, and return the reference address in the string pool
Exercise 1
Ex 2
10.5.2. Efficiency test of INTERN: Spatial Angle
We tested that there was a big difference between using intern and not using intern
public class StringIntern2 { static final int MAX_COUNT = 1000 * 10000; static final String[] arr = new String[MAX_COUNT]; public static void main(String[] args) { Integer [] data = new Integer[]{1.2.3.4.5.6.7.8.9.10}; long start = System.currentTimeMillis(); for (int i = 0; i < MAX_COUNT; i++) { // arr[i] = new String(String.valueOf(data[i%data.length])); arr[i] = new String(String.valueOf(data[i%data.length])).intern(); } long end = System.currentTimeMillis(); System.out.println(" + (end-start)); try { Thread.sleep(1000000); } catch (Exception e) { e.getStackTrace(); }}}// Run results do not use intern: 7256ms use intern: 1395ms
Copy the code
Conclusion: For programs that use a lot of existing strings, especially if there are many strings that are already repeated, using intern() can save memory.
Large web platforms require large numbers of strings to be stored in memory. For example, on social networking sites, many people store information about Beijing and Haidian district. If the strings all call intern(), the memory size will be significantly reduced.
10.6. Garbage collection of StringTable
public class StringGCTest { /** * -Xms15m -Xmx15m -XX:+PrintGCDetails */ public static void main(String[] args) { for (int i = 0; i < 100000; i++) { String.valueOf(i).intern(); }}}Copy the code
The results
[GC (Allocation Failure) [PSYoungGen: 4096K->504K(4608K)] 4096K->1689K(15872K), 0.0581583 secs] [Times: user=0.00 sys=0.00, real=0.06 secs] [GC (Allocation Failure) [PSYoungGen: 4600K->504K(4608K)] 5785K->2310K(15872K), 0.0015621 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] [GC (Allocation Failure) [PSYoungGen: 4600K->504K(4608K)] 6406K->2350K(15872K), 0.0034849 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] Heap PSYoungGen total 4608K, used 1919K [0x00000000ffb00000.0x0000000100000000.0x0000000100000000) eden space 4096K, 34% used [0x00000000ffb00000.0x00000000ffc61d30.0x00000000fff00000) from space 512K, 98% used [0x00000000fff00000.0x00000000fff7e010.0x00000000fff80000) to space 512K, 0% used [0x00000000fff80000.0x00000000fff80000.0x0000000100000000) ParOldGen total 11264K, used 1846K [0x00000000ff000000.0x00000000ffb00000.0x00000000ffb00000) object space 11264K, 16% used [0x00000000ff000000.0x00000000ff1cd9b0.0x00000000ffb00000) Metaspace used 3378K, capacity 4496K, committed 4864K, reserved 1056768K class space used 361K.capacity 388K.committed 512K.reserved 1048576K
Copy the code
10.7. De-redo String operations in G1
JEP 192: String Deduplication in G1 (Java.net)
Motivation
Many large-scale Java applications are currently bottlenecked on memory. Measurements have shown that roughly 25% of the Java heap live data set in these types of applications is consumed by String objects. Further, roughly half of those String objects are duplicates, where duplicates means string1.equals(string2) is true. Having duplicate String objects on the heap is, essentially, just a waste of memory. This project will implement automatic and continuous String deduplication in the G1 garbage collector to avoid wasting memory and reduce the memory footprint.
Currently, many large-scale Java applications are running into memory bottlenecks. Measurements show that in these types of applications, about 25% of the Java heap real-time data set is consumed by String’ objects. Furthermore, about half of these “String “objects are duplicated, where repetition means that “string1.equals(string2)” is true. Having duplicate String’ objects on the heap is, in essence, a waste of memory. This project will implement automatic and continuous’ String’ deduplication in the G1 garbage collector to avoid wasting memory and reduce footprint.
Note that when I say repeat, I mean data in the heap, not in the constant pool, because the constant pool itself does not repeat
Background: Tests on a number of Java applications, both large and small, yielded the following results:
- Strings make up 25% of the heap’s live data set
- The number of repeated strings in the heap survivable data set is 13.5%
- The average length of a string is 45
The bottleneck for many large-scale Java applications is memory, and tests show that in these types of applications, about 25% of the surviving data sets in the Java heap are strings. Furthermore, almost half of these strings are duplicated, which means: stringl.equals(string2)= true. Having duplicate strings on the heap is necessarily a waste of memory. This project will implement the automatic and persistent deduplicating of strings in the G1 garbage collector to avoid wasting memory.
implementation
- When the garbage collector works, it accesses the objects that are alive on the heap.Each accessed object is checked for a candidate String to be repealed
- If so, a reference to the object is inserted into the queue for subsequent processing. A de-weight thread runs in the background, processing the queue. Processing an element of the queue means removing the element from the queue and then trying to duplicate the string it references.
- Use a Hashtable to record all the unique char arrays used by strings. When de-duplicating, the hashTable is checked to see if an identical char array already exists on the heap.
- If it does, the String will be adjusted to refer to that array, freeing the reference to the original array, and eventually being collected by the garbage collector.
- If the lookup fails, the char array is inserted into the HashTable so that the array can be shared at a later time.
Command line options
#Enable String deduplication. This function is disabled by default and needs to be manually enabled. UseStringDeduplication(bool)# print detailed to heavy statistics PrintStringDeduplicationStatistics # (bool) Reaching the age of String object is considered to be to heavy candidate StringpeDuplicationAgeThreshold (uintx)
Copy the code