“This is the fifth day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

The java.lang.String class is probably the most commonly used class, but do you really understand how it is implemented? Read this article carefully and you will understand it at a glance.

The String class definition

public final class String implements 
java.io.Serializable.Comparable<String>, CharSequence {}
Copy the code

String is a constant class declared with final, which cannot be inherited by any class. Once a String object is created, the sequence of characters contained in the object is immutable, and all subsequent methods of the class cannot modify the object until it is destroyed. This is something we need to pay special attention to (some of the methods of this class appear to change the string, but are actually creating a new string, as we’ll explain below). The Serializable interface is then implemented, which is a Serializable flag interface, and the Comparable interface, which compares the size of two strings (single character ASCII codes in order), which will be implemented later. Finally, the interface CharSequence is implemented, which represents a collection of ordered characters. The corresponding method will be introduced later.

The field properties

/** is used to store string */
private final char value[];

/** Cache string hash code */
private int hash; // Default to 0

/** Implements the serialization identifier */
private static final long serialVersionUID = -6849794470754667710L;
Copy the code

A String String is actually an array of char.

A constructor

The String class has many constructors. You can create a String by initializing a String, or an array of characters, or an array of bytes, and so on.

String str1 = "abc";// Note the difference between these literal declarations, which will be discussed later in this article
String str2 = new String("abc");
String str3 = new String(new char[] {'a'.'b'.'c'});
Copy the code

The equals (Object anObject) method

public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String aString = (String)anObject;
        if (coder() == aString.coder()) {
            returnisLatin1() ? StringLatin1.equals(value, aString.value) : StringUTF16.equals(value, aString.value); }}return false;
}
Copy the code

The String class overrides equals, which compares whether each character that makes up the String is the same, returning true if they are, and false otherwise.

HashCode () method

public int hashCode(a) {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
            : StringUTF16.hashCode(value);
    }
    return h;
}
Copy the code

The String hashCode algorithm is very simple, mainly in the middle of the for loop, calculated as follows:

S [0]*31^(n-1) + s[1]*31^(n-2) +… + s[n-1]

The s array is the val array in the source code, which is the array of characters that make up the string. There is the number 31. Why choose 31 as the product factor and not declare it as a constant? There are two main reasons:

1, 31 is a moderate prime number, which is one of the preferred primes for hashCode multipliers.

2, 31 can be optimized by the JVM, 31 * I = (I « 5) -i, because shift is faster and less efficient than multiplication.

The charAt (int index) method

public char charAt(int index) {
    if (isLatin1()) {
        return StringLatin1.charAt(value, index);
    } else {
        returnStringUTF16.charAt(value, index); }}Copy the code

We know that a string is made up of an array of characters. This method returns a single character of the specified index via the index (array subscript) passed in.

Intern () method

This is a local method:

public native String intern();

When the intern method is called, if the pool already contains an Equals (Object) String that is the same as the String identified by the String, the equals(Object) String is returned. Otherwise, the String object is added to the pool and a reference to the object is returned.

What does that mean? That is, you call an intern() method of a String that returns a reference to the String if it exists in the constant pool. If not, the object is added to the pool and the reference in the pool is returned.

String str1 = "hello";// Literals only create objects in the constant pool
String str2 = str1.intern();
System.out.println(str1==str2);//true

String str3 = new String("world");// The new keyword only creates objects in the heap
String str4 = str3.intern();
System.out.println(str3 == str4);//false

String str5 = str1 + str2;// A string of concatenated variables creates objects in both the constant pool and the heap
String str6 = str5.intern();// Since there are already objects in the pool, the object itself is returned directly, i.e. the object in the heap
System.out.println(str5 == str6);//true

String str7 = "hello1" + "world1";// a constant concatenated string that creates objects only in the constant pool
String str8 = str7.intern();
System.out.println(str7 == str8);//true
Copy the code

There are a lot of methods in the String class, so let’s take a closer look at the String class immutable.

Selection interview

Analysis of a classic interview question:

public static void main(String[] args) {
    String A = "abc";
    String B = "abc";
    String C = new String("abc");
    System.out.println(A==B);
    System.out.println(A.equals(B));
    System.out.println(A==C);
    System.out.println(A.equals(C));
}
Copy the code

The answers are: true, true, false, true

For the above problem, we can first look at a picture, as follows:

String A= “ABC”, check if there is any “ABC” in the constant pool, create an “ABC” object in the constant pool, and assign A reference from the constant pool to A. The second literal String B= “ABC”, which is detected in the constant pool, assigns the reference directly to B; The third one is an object created with the new keyword, it’s already in the constant pool, it doesn’t have to be created in the constant pool, and then after it’s created in the heap, it assigns a reference to the object in the heap to C, and it points to the constant pool.

It is important to note that in object, equals() is used to compare memory addresses, but String overrides the equals() method to compare contents. It returns true even if the contents of different addresses are the same, which is why a.equals (C) returns true.

Note: Looking at the red arrow in the figure above, string objects created by the new keyword, if present in the constant pool, point to the object created in the heap as a reference to the constant pool.

Let’s look at creating an object using an expression containing variables:

String str1 = "hello";
String str2 = "helloworld";
String str3 = str1+"world";// The compiler cannot determine that it is a constant (it creates a String in the heap)
String str4 = "hello"+"world";// The compiler determines it to be a constant and references it directly to the constant pool

System.out.println(str2==str3);//fasle
System.out.println(str2==str4);//true
System.out.println(str3==str4);//fasle
Copy the code

Str3 contains str1, which the compiler cannot determine is a constant, and creates a String in the heap. Str4 adds two constants and refers directly to objects in the constant pool.

String immutability

The String class is an immutable class in Java.

Simply put, an immutable class is an instance that cannot be modified after it has been created.

The topic of String immutability should be an old one. Strings are different from their siblings from the moment they are born, and good children are wearing a final hat.

So basic types like Byte, int, short and long don’t even play with it.

If you read the source code comments carefully, you will find this sentence:

String is a constant, destined to be immutable from birth.

First, there is a confusing point to add: when final is used to modify a primitive variable, it cannot be reassigned, and therefore cannot be changed. However, for reference type variables, it only holds a reference, and final only guarantees that the address referenced by the reference variable will not change, that is, the same object will always be referenced, but the object can be changed. For example, a final reference to an array must always point to the array to which it was initialized, but the contents of that array can be completely changed.

The String class is decorated with the final keyword, so we consider it an immutable object. But is it really immutable?

Each string is made up of many single characters, and we know the source code is made up of char[] value character arrays.

/** The value is used for character storage. */
private final char value[];

/** Cache the hash code for the string */
private int hash; // Default to 0
Copy the code

When a value is final, the reference is not changed, but the array in the heap to which the value points is the real data. As long as the array in the heap can be manipulated, the data can still be changed. Value is a primitive type, so it must be mutable. Even if it is declared private, it can be changed by reflection.

public static void main(String[] args) throws Exception {
        String str = "Hello World";
        System.out.println("STR before modification :" + str);
        System.out.println("Memory address of STR before modification" + System.identityHashCode(str));
        // Get the value field of the String class
        Field valueField = String.class.getDeclaredField("value");
        // Change the access permission of the value attribute
        valueField.setAccessible(true);
        // Get the value of the STR property
        char[] value = (char[]) valueField.get(str);
        // Change the characters in the array referenced by value
        value[3] = '? ';
        System.out.println("Modified STR :" + str);
        System.out.println("Memory address of STR before modification" + System.identityHashCode(str));
    }
Copy the code
STR before modification :Hello World Memory address of STR before modification1746572565Modified STR :Hel? O World Specifies the memory address of STR before modification1746572565
Copy the code

From the two prints, we can see that the value of STR has been changed, but the memory address of STR remains unchanged. But in our code, we rarely use reflection to manipulate strings, so we assume that strings are immutable.

The benefits of immutability

First of all, we should think from the designer’s point of view, not think that this is bad, that is unreasonable:

  • Multiple variables can be implemented to reference the same string instance in heap memory, avoiding the overhead of creation.
  • We use strings heavily in our program, probably for security reasons.
  • When we use immutable classes, we don’t need to worry about who might modify the internal value. If we use mutable classes, we may need to remember to copy the internal value every time, which will suffer some performance loss.

summary

Interested partners can also read the String source code, mighty 3000+.

When a String is new, it is used to create an object, and the + sign concatenation is also used to avoid the + concatenation. Instead, use StringBuffer or StringBuilder.

Thank you for reading, hope to see three even a wave ah, thank you ~~~