JVM system learning path series demo code address: github.com/mtcarpenter…

Basic features of String

  • String: a string represented by a pair of”
    • String s1 = "str" ; // How literals are defined
    • String s2 = new String("str");
  • stringDeclared final, cannot be inherited
  • StringTo achieve theSerializableInterface: Indicates that the string is serialized. To achieve theComparableInterface: indicatesstringYou can compare the size
  • Strings are defined internally in JDK8 and beforefinal char[] valueUsed to store string data. When JDK9 insteadbyte[]

The current implementation of the String class stores characters in a char array, using two bytes (16 bits) per character. Data collected from many different applications shows that strings are a major part of heap usage, and that most string objects contain only Latin characters. These characters require only one byte of storage space, so half of the internal char array of these string objects will not be used. We propose changing the internal representation of strings clasš from utF-16 character arrays to byte arrays + an encoy-Flag field. The new String class stores characters encoded as ISO-8859-1/Latin-1 (one byte per character) or UTF-16(two bytes per character), depending on the contents of the String. The encoding flag will indicate which encoding to use. Conclusion: String is no longer stored in char[], but in byte [], which saves some space.

/ / before
private final char value[];
/ / after
private final byte[] value
Copy the code

String-based data structures such as StringBuffer and StringBuilder have also been modified. Immutability of String

  • String: represents an immutable sequence of characters. Immutability for short.
    • When reassigning a value to a string, the specified memory area assignment must be overridden. The original value cannot be used for assignment.
    • When concatenating an existing string, you also need to reassign the memory area instead of using the original value.
    • When you call string’s replace() method to modify a specified character or string, you also need to reassign the memory region. You cannot use the original value.

Assigns a value to a string literal (as opposed to new), where the string value is declared in the string constant pool. The sample

/** ** * /** ** /** *@author shkstart  [email protected]
 * @create 2020  23:42
 */
public class StringTest1 {
    @Test
    public void test1(a) {
        String s1 = "abc";// The way literals are defined, "ABC" is stored in the string constant pool
        String s2 = "abc";
        s1 = "hello";

        System.out.println(s1 == s2);// Determine the address: true --> false

        System.out.println(s1);//
        System.out.println(s2);//abc

    }

    @Test
    public void test2(a) {
        String s1 = "abc";
        String s2 = "abc";
        s2 += "def";
        System.out.println(s2);//abcdef
        System.out.println(s1);//abc
    }

    @Test
    public void test3(a) {
        String s1 = "abc";
        String s2 = s1.replace('a'.'m');
        System.out.println(s1);//abc
        System.out.println(s2);//mbc}}Copy the code
false
hello
abc
--------
abcdef
abc
----
abc
mbc

Copy the code

The interview questions

public class StringExer {
    String str = new String("good");
    char[] ch = {'t'.'e'.'s'.'t'};

    public void change(String str, char ch[]) {
        str = "test ok";
        ch[0] = 'b';
    }

    public static void main(String[] args) {
        StringExer ex = new StringExer();
        ex.change(ex.str, ex.ch);
        System.out.println(ex.str);//good
        System.out.println(ex.ch);//best}}// good
// best
Copy the code

Note that the string constant pool does not store strings with the same contents

  • The String Pool of String is a fixed size Hashtable. The default size is 1009. If you put too many strings into a string Pool, it will cause Hash collisions and the linked list will be too long. The immediate effect of a long list is that it will degrade performance when you call string.intern.
  • Use -xx :StringTablesize to set the length of a stringTable
  • In JDK6, stringTables are fixed, with a length of 1009, so if there are too many strings in the constant pool, the efficiency drops quickly. StringTablesize There is no requirement for setting
  • In JDK7, stringTable has a length of 60013 by default,
  • In JDK8, the minimum that StringTable can be set to is 1009

String memory allocation

  • There are eight basic data types in the Java language and a special type, string. These types provide a constant pool concept in order to make them faster and more memory efficient during execution.
  • A constant pool is like a cache provided at the Java system level. The constant pool for the eight basic data types is system-coordinated, with the string constant pool being special. It can be used in two main ways.
    • Strings declared directly in double quotes are stored directly in the constant pool.
      • For example: string info=”atguigu.com”;
    • If a string object is not declared in double quotes, you can use the string supplied intern () method
  • Java 6 and prior, string constant pools were stored in the persistent generation
  • In Java 7, Oracle engineers made a major change to the string pool logic by moving the string constant pool into the Java heap.
    • All strings are stored in the Heap, just like any other normal object, which allows you to adjust the Heap size only when tuning applications.
    • The string constant pool concept was used a lot, but this change is reason enough to reconsider using string.intern() in Java 7.
  • Java8 meta space, string constants in the heap.

Why does StringTable adjust from the permanent generation to the heapIn JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but in the main part of the Java heap (called the young and tendered generations), along with other objects created by the application. This change will result in more data residing in the main Java heap and less data residing in the permanent build, so you may need to resize the heap. Because of this change, most applications will see a relatively small difference in heap usage, but larger applications that load many classes or use a lot of strings will. The intern() method will see even more significant differences.

  • The default for permanent generations is small
  • The garbage collection frequency of permanent generation is low

Basic operations on String

The Java language specification requires that identical String literals contain the same sequence of Unicode characters (constants containing the same sequence of code points) and must refer to the same String class instance.

class Memory {
    public static void main(String[] args) {//line 1
        int i = 1;//line 2
        Object obj = new Object();//line 3
        Memory mem = new Memory();//line 4
        mem.foo(obj);//line 5
    }//line 9

    private void foo(Object param) {//line 6
        String str = param.toString();//line 7
        System.out.println(str);
    }//line 8
}

Copy the code

String concatenation

  • Concatenation of constants to constants results in the constant pool, which is optimized at compile time
  • There are no variables with the same content in the constant pool
  • As long as one of them is a variable, the result is in the heap. The principle for variable concatenation is StringBuilder
  • If the result of the concatenation calls intern(), it actively puts string objects that are not already in the constant pool into the pool and returns the object’s address
public class StringTest5 {
    @Test
    public void test1(a){
        String s1 = "a" + "b" + "c";// compile time optimization: equivalent to "ABC"
        String s2 = "abc"; //" ABC "must be in the string constant pool, assigning this address to S2
        Class * String s1 = "ABC "; * String s2 = "abc" */
        System.out.println(s1 == s2); //true
        System.out.println(s1.equals(s2)); //true
    }

    @Test
    public void test2(a){
        String s1 = "javaEE";
        String s2 = "hadoop";

        String s3 = "javaEEhadoop";
        String s4 = "javaEE" + "hadoop";// Compile time optimization
        // If a variable appears before or after the concatenation symbol, it is equivalent to a new String() in the heap space. The specific content is the concatenation result: javaEEhadoop
        String s5 = s1 + "hadoop";
        String s6 = "javaEE" + s2;
        String s7 = s1 + s2;

        System.out.println(s3 == s4);//true
        System.out.println(s3 == s5);//false
        System.out.println(s3 == s6);//false
        System.out.println(s3 == s7);//false
        System.out.println(s5 == s6);//false
        System.out.println(s5 == s7);//false
        System.out.println(s6 == s7);//false
        //intern(): determine if javaEEhadoop value exists in string constant pool, if so, return javaEEhadoop address in constant pool;
        // If javaEEhadoop does not exist in the string constant pool, load a copy of javaEEhadoop in the constant pool and return the address of the sub-object.
        String s8 = s6.intern();
        System.out.println(s3 == s8);//true
    }

    @Test
    public void test3(a){
        String s1 = "a";
        String s2 = "b";
        String s3 = "ab";
        StringBuilder s = new StringBuilder(); (2) s.a ppend (" a "), (3) s.a ppend (" b "), (4) s.t oString () - > is approximately equal to the new String (" ab ") : StringBuilder is used after jdk5.0 and StringBuffer */ is used before jdk5.0
        String s4 = s1 + s2;//
        System.out.println(s3 == s4);//false
    }
    /* 1. String concatenation does not necessarily use StringBuilder! If the concatenation symbol is left and right with string constants or constant references, compile-time optimization is still used, that is, non-StringBuilder. 2. When final modifies structures of classes, methods, primitive data types, and quantities that reference data types, it is recommended to use final whenever possible. * /
    @Test
    public void test4(a){
        final String s1 = "a";
        final String s2 = "b";
        String s3 = "ab";
        String s4 = s1 + s2;
        System.out.println(s3 == s4);//true
    }
    / / practice:
    @Test
    public void test5(a){
        String s1 = "javaEEhadoop";
        String s2 = "javaEE";
        String s3 = s2 + "hadoop";
        System.out.println(s1 == s3);//false

        final String s4 = "javaEE";/ / s4: constants
        String s5 = s4 + "hadoop";
        System.out.println(s1 == s5);//true

    }

    /* Experience execution efficiency: Adding strings via Append () on StringBuilder is much more efficient than concatenating strings! StringBuilder append(); StringBuilder append(); StringBuilder appEnd (); Create a String builder and String object. Create a String builder and String object. Create a String builder and String object. If you do GC, it takes extra time. Room for improvement: In practice, if you are sure that the string length to be added back and forth is not higher than a specified highLevel value, it is recommended to use the constructor to instantiate: StringBuilder s = new StringBuilder(highLevel); //new char[highLevel] */
    @Test
    public void test6(a){

        long start = System.currentTimeMillis();

// method1(100000); / / 4014
        method2(100000);/ / 7

        long end = System.currentTimeMillis();

        System.out.println("The time spent is:" + (end - start));
    }

    public void method1(int highLevel){
        String src = "";
        for(int i = 0; i < highLevel; i++){ src = src +"a";// Each loop creates a StringBuilder, String
        }
// System.out.println(src);

    }

    public void method2(int highLevel){
        // Just create a StringBuilder
        StringBuilder src = new StringBuilder();
        for (int i = 0; i < highLevel; i++) {
            src.append("a");
        }
// System.out.println(src);}}Copy the code

From the above results, we can know: If a variable appears before and after the concatenation symbol, it is equivalent to a new String() in the heap space, and the specific content of the concatenation result is called intern. If the concatenation result exists, the JavaEEhadoop value in the String constant pool is returned.The underlying principleThe underlying concatenation operation actually uses StringBuilder

S1 + s2 execution details - StringBuilder s =newStringBuilder(); - s.append(s1); - s.append(s2); - s.toString(); - > similar tonew String("ab");



Copy the code

After JDK5, StringBuilder is used, and before JDK5, StringBuffer is used

String StringBuffer StringBuilder
String values are immutable, which results in a new String being generated every time you operate on a String, which is inefficient and wastes a lot of priority memory A StringBuffer is mutable and thread-safe string manipulation class, and any operation on the string it points to does not create a new object. Each StringBuffer object has a certain buffer capacity. When the string size does not exceed its capacity, no new capacity is allocated; when the string size exceeds its capacity, it is automatically increased Mutable class, faster
immutable variable variable
Thread safety Thread insecurity
Multithreaded operation string Single-thread manipulation strings

Note that our left and right sides need to be concatenated with a New StringBuilder if they are variables, but we get them from the constant pool if we use the final modifier. So concatenation symbols with string constants to the left and right or constant references still use compiler optimization. That is, variables decorated by finA will become constants, and classes and methods will not be inherited.

  • In development, when final is available, it is recommended to use
public static void test4(a) {
    final String s1 = "a";
    final String s2 = "b";
    String s3 = "ab";
    String s4 = s1 + s2;
    System.out.println(s3 == s4);
}
// true
Copy the code
  • Compare the performance of the concatenation operation and append
    public static void method1(int highLevel) {
        String src = "";
        for (int i = 0; i < highLevel; i++) {
            src += "a"; // Each loop creates a StringBuilder object}}public static void method2(int highLevel) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < highLevel; i++) {
            sb.append("a"); }}Copy the code

Method 1:4005ms; Method 2:7ms

  • Append () with StringBuilder is far more efficient at adding strings than String concatenation

benefits

  • StringBuilder append, creating only one StringBuilder object from beginning to end
  • For the String concatenation approach, you also need to create a number of StringBuilder objects and strings created when you call toString
  • Because there are so many StringBuilder and String objects in memory, the memory footprint is so large that GC takes more time

Room for improvement

  • We use the null parameter constructor of StringBuilder. The default string size is 16, and then we copy the original string into the new string. We can also default to a larger length to reduce the number of expansions
  • So, in real development, we can be sure that strings that need to be added back and forth are not higher than a certain threshold value, so it is advisable to use the constructor to create a threshold length.

The use of the intern ()

An intern is a native method that calls the underlying C method String pool, which is initially empty and maintained privately by the String class. When the intern method is called, returns strings from the pool if the pool already contains strings equal to the string object as determined by equals(object). Otherwise, the string object is added to the pool and a reference to the string object is returned. If a string object is not declared in double quotes, you can use the string provided intern method: The intern method queries the string constant pool to see if the current string exists, and if it does not, puts the current string into the constant pool.

String myInfo = new string("I love atguigu").intern();
Copy the code

That is, if you call the String.Intern method on any string, the class instance it returns must be exactly the same as the string instance that appears directly as a constant. Therefore, the value of the following expression must be true

("a"+"b"+"c").intern () =="abc"
Copy the code

In layman’s terms, Interned String ensures that there is only one copy of a string in memory. This saves memory and speeds up string manipulation tasks. Note that this value is stored in the String Intern Pool. In the space efficiency test of the Intern, we passed the test and found that there was a big difference between using the Intern and not using it

/** * Public class string2 {static final int MAX_COUNT = 1000 * 10000; static final String[] arr = new String[MAX_COUNT]; Public static void main(String[] args) {Integer[] data = new Integer[]{1,2,3,4,5,6,7,8,9,10}; long start = System.currentTimeMillis(); for (int i = 0; i < MAX_COUNT; i++) { arr[i] = new String(String.valueOf(data[i%data.length])).intern(); } long end = System.currentTimeMillis(); System.out.println(" + (end-start)); try { Thread.sleep(1000000); } catch (Exception e) { e.getStackTrace(); }}}Copy the code

Conclusion:

  • For programs that use a lot of existing strings, especially if there are many strings that are already repeated, using intern() can save memory.

Large web platforms require large numbers of strings to be stored in memory. For example, on social networking sites, many people store information about Beijing and Haidian district. If the strings all call intern(), the memory size will be significantly reduced.

The interview questions

The new String(“ab”) creates several objects

/** * title: * New String("ab") creates many objects? If you look at the bytecode, there are two. * One object is: new keyword created in heap space * the other object is: string constant pool object "ab". Bytecode instruction: LDC * * * Think: * New String("a") + new String("b")? * Object 1: new StringBuilder() * Object 2: new String("a") * Object 3: "A" in the constant pool * Object 4: New String("b") * Object 5: "B" in the constant pool * * The StringBuilder toString () : * Object 6: New String("ab") * For emphasis, the toString() call, in the String constant pool, "Ab" * * @author shkstart [email protected] * @create 2020 20:38 */ public class StringNewTest {public static void main(String[] args) { // String str = new String("ab"); String str = new String("a") + new String("b"); }}Copy the code

Let’s convert it to bytecode to see

 0 new #2 <java/lang/String>
 3 dup
 4 ldc #3 <ab>
 6 invokespecial #4 <java/lang/String.<init>>
 9 astore_1
10 return
Copy the code

There are two objects in there

  • One object is: the new keyword is created in the heap space
  • Another object: an object in the string constant pool

New String(“a”) + new String(“b”) creates several objects

public class StringNewTest { public static void main(String[] args) { String str = new String("a") + new String("b"); }}Copy the code

The bytecode file is

0 new #2 <java/lang/StringBuilder>
 3 dup
 4 invokespecial #3 <java/lang/StringBuilder.<init>>
 7 new #4 <java/lang/String>
10 dup
11 ldc #5 <a>
13 invokespecial #6 <java/lang/String.<init>>
16 invokevirtual #7 <java/lang/StringBuilder.append>
19 new #4 <java/lang/String>
22 dup
23 ldc #8 <b>
25 invokespecial #6 <java/lang/String.<init>>
28 invokevirtual #7 <java/lang/StringBuilder.append>
31 invokevirtual #9 <java/lang/StringBuilder.toString>
34 astore_1
35 return
Copy the code

We created six objects

  • Object 1: new StringBuilder()
  • Object 2: New String(“a”)
  • Object 3: A of the constant pool
  • Object 4: new String(“b”)
  • Object 5: CONSTANT pool B
  • Object 6: toString creates a new String(“ab”)
    • Calling toString does not generate ab in the constant pool

Intern use: JDK6 and JDK7

JDK 1.6

String s = new String("1");  // It is already in the constant pool
s.intern(); // Put the object into the constant pool. But calling this method doesn't make much difference because 1 already exists
String s2 = "1";
System.out.println(s == s2); // false

String s3 = new String("1") + new String("1");
s3.intern();
String s4 = "11";
System.out.println(s3 == s4); // true
Copy the code

The output

false
true
Copy the code

Why is the object different?

  • One is the object created by new, one is the object in the constant pool, obviously not the same object

If it is true, it is true

String s = new String("1");
s = s.intern();
String s2 = "1";
System.out.println(s == s2); // true
Copy the code

In the following case, because the address of s3 variable record is new String(“11”), then after this code is executed, there is no “11” in the constant pool, this is the JDK6 relation, and then after executing s3.intern(), it will generate “11” in the constant pool. Finally, the S4 uses s3’s address

Why is the final output s3 == s4 false? This is because in JDK6 a new object “11” is created, that is, a new address is given, s2 = new address whereas in JDK7, in JDK7, there is no innovation of a new object, but a new object in the constant pool

In the JDK7

String s = new String("1");
s.intern();
String s2 = "1";
System.out.println(s == s2); // true

String s3 = new String("1") + new String("1");
s3.intern();
String s4 = "11";
System.out.println(s3 == s4); // true
Copy the code

extension

String s3 = new String("1") + new String("1");
String s4 = "11";  // A string generated in the constant pool
s3.intern();  // Then S3 will look in the constant pool and do nothing
System.out.println(s3 == s4);
Copy the code

We move the position of S4 up one line and see that the change is large enough to end up with false.

  • If there is one in the string pool, it will not be added. Returns the address of an object in an existing string pool
  • If not, a copy of the object is made, put into the string pool, and the address of the object in the string pool is returned

As of JDK1.7, try to put this string object into the string pool.

  • If there is one in the string pool, it will not be added. Returns the address of an object in an existing string pool
  • If not, a copy of the object’s reference address is made, added to the string pool, and the reference address in the string pool is returned

  • In JDK6, create a string “ab” in the string constant pool
  • In JDK8, instead of creating “ab” in the string constant pool, addresses from the heap are copied into the string pool.

So the above result, in JDK6, is:

true
false
Copy the code

In the JDK 8

true
true
Copy the code

Garbage collection for StringTable

/ * * * String garbage collection: * - Xms15m - Xmx15m - XX: + PrintStringTableStatistics - XX: + PrintGCDetails * /
public class StringGCTest {
    public static void main(String[] args) {
        for (int j = 0; j < 100000; j++) { String.valueOf(j).intern(); }}}Copy the code

G1 String unredo operation

Note that when I say repeat, I mean data in the heap, not in the constant pool, because the constant pool itself does not repeat

  • Background: Tests on a number of Java applications, both large and small, yielded the following results:
    • Strings make up 25% of the heap’s live data set
    • The number of repeated strings in the heap survivable data set is 13.5%
    • The average length of a string is 45
  • The bottleneck for many large-scale Java applications is memory, and tests show that in these types of applications, about 25% of the surviving data sets in the Java heap are strings. Furthermore, almost half of these strings are repeated, which means:String. equals (string2) = true. Having duplicate strings on the heap is necessarily a waste of memory. This project will implement the automatic and persistent deduplicating of strings in the G1 garbage collector to avoid wasting memory.

implementation

  • When the garbage collector works, it accesses the objects that are alive on the heap. Each accessed object is checked for a candidate string to be repealed.
  • If so, a reference to the object is inserted into the queue for subsequent processing. A de-weight thread runs in the background, processing the queue. Processing an element of the queue means removing the element from the queue and then trying to duplicate the string it references.
  • Use a Hashtab1e to record all non-repeating char arrays used by strings. When de-duplicating, the hashTable is checked to see if an identical char array already exists on the heap.
  • If it does, the string will be adjusted to refer to that array, freeing the reference to the original array, and eventually being collected by the garbage collector.
  • If the lookup fails, the char array is inserted into the HashTable so that the array can be shared at a later time.

Enable command line options

  • UsestringDeduplication (bool) : Enable string deduplication. This function is disabled by default. You must manually enable it.
  • Printstringbeduplicationstatistics (bool) : print to heavy statistics in detail
  • StringpeduplicationAgeThreshold (uintx) : reaching the age of string object is considered to candidates

Welcome to pay attention to the public number Shanma carpenter, I am Xiao Chun brother, engaged in Java back-end development, will be a little front-end, through the continuous output of a series of technical articles to literary friends, if this article can help you, welcome everyone to pay attention to, like, share support, we see you next period!