Dig deeper into Java Strings

Since I started to write Java for a year, I have been trying to solve any problems I encounter. I have not taken the initiative to learn the features of the Java language and read the source code of JDK in depth. Since I have decided to make a living by Java in the future, I still have to pay attention to it, give up some time to play games, and learn systematically and deeply.

Java String is one of the most commonly used classes in Java programming and the most basic class provided by the JDK. So I decided to get a head start by digging deeper into the String class.

Class definitions and class members

When you open the String source code in the JDK, you should first look at the definition of the String class.

public final class String
    implements java.io.Serializable.Comparable<String>, CharSequence
Copy the code

Uninheritable and immutable

Anyone who has written Java knows that when the final keyword decorates a class, it means that class is not inheritable. So the String class cannot be inherited externally. At this point we might wonder why String’s designers made it uninheritable. I found relevant questions and discussions on Zhihu, and I think the first answer has made it very clear. As the most fundamental reference data type in Java, String is immutable, so the use of final to prohibit inheritance breaks the immutable nature of String.

To achieve immutability of a class, it is not just a matter of modifying a class with final. As can be seen from the source code, String is actually a wrapper around an array of characters. The array is private, and there is no way to modify the array, so once initialization is complete, the String object cannot be modified.

serialization

As we can see from the class definition above, String implements Serializable interface Serializable, so String supports serialization and deserialization. What is serialization of Java objects? I believe that many Java novices like me have such a question. This paragraph in the article on Serialization and deserialization of Java is well explained.

The Java platform allows us to create reusable Java objects in memory, but in general these objects are only possible when the JVM is running, that is, they don’t have a lifetime longer than the JVM’s lifetime. In a real-world application, however, you might want to be able to save (persist) the specified object after the JVM stops running and re-read the saved object at a later date. Java object serialization helps us do this. With Java object serialization, when an object is saved, its state is saved as a set of bytes that can be assembled into objects in the future. It is important to note that object serialization holds the object’s “state,” its member variables. Thus, object serialization does not care about static variables in a class. In addition to using object serialization when persisting objects, object serialization is used when using RMI(remote method calls), or when passing objects over a network. The Java serialization API provides a standard mechanism for handling object serialization and is easy to use.

In the String source code, we can also see a class member definition that supports serialization.

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;

    /**
     * Class String is special cased within the Serialization Stream Protocol.
     *
     * A String instance is written into an ObjectOutputStream according to
     * <a href="{@docRoot} /.. / platform/serialization/spec/output. The HTML "> * Object serialization Specification, Section 6.2," Stream Elements "< / a > * /
    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];
Copy the code

SerialVersionUID is a serialization version number. Java uses this UID to determine whether the byte stream in deserialization is consistent with the local class. If the byte stream is consistent, deserialization can be performed.

SerialPersistentFields is a much less common definition, presumably related to the class member at serialization time. To figure out what this field means, I googled it, Only a tiny bit of description of the ObjectStreamField class was found in the JDK documentation, A description of a Serializable field from a Serializable class. An array of ObjectStreamFields is used to declare the Serializable fields of a class. This class describes a serialized field of a serialized class. If you define an array of this class, you can declare the fields that the class needs to be serialized. But I still haven’t found the specific usage and function of this class. I took a closer look at the definition of this field, and as serialVersionUID is supposed to define the rules by the same specific field name, I searched directly for the keyword serialPersistentFields and found its purpose. That is, the default serialization customization includes the keyword TRANSIENT and the static field name serialPersistentFields, transient specifies which field is not serialized by default, SerialPersistentFields specifies which fields need to be serialized by default. If serialPersistentFields and TRANSIENT are both defined, transient is ignored. I tested it myself, and it worked.

Now that we know what serialPersistentFields do, the question is, if this static field is used to define the class members involved in serialization, why is the length of the array defined as zero in String? After a search for information, or did not find a clear explanation, look forward to if there is a big guy to see the answer.

Can be sorted

The Comparable

interface has only one method, public int compareTo(T o), which means that the class supports sorting. You can sort a list or array of objects for this class using methods such as collections.sort or arrays.sort.

We can also see a static variable in String,

 public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                         = new CaseInsensitiveComparator();
private static class CaseInsensitiveComparator
            implements Comparator<String>, java.io.Serializable {
        // use serialVersionUID from JDK 1.2.2 for interoperability
        private static final long serialVersionUID = 8575799808933029326L;

        public int compare(String s1, String s2) {
            int n1 = s1.length();
            int n2 = s2.length();
            int min = Math.min(n1, n2);
            for (int i = 0; i < min; i++) {
                char c1 = s1.charAt(i);
                char c2 = s2.charAt(i);
                if(c1 ! = c2) { c1 = Character.toUpperCase(c1); c2 = Character.toUpperCase(c2);if(c1 ! = c2) { c1 = Character.toLowerCase(c1); c2 = Character.toLowerCase(c2);if(c1 ! = c2) {// No overflow because of numeric promotion
                            returnc1 - c2; }}}}return n1 - n2;
        }

        /** Replaces the de-serialized object. */
        private Object readResolve(a) { returnCASE_INSENSITIVE_ORDER; }}Copy the code

As you can see from the source code above, this static member is an instance of a class that implements the Comparator interface, which compares the sizes of two case-insensitive strings.

So what’s the difference and connection between Comparable and Comparator? And why should String implement both of them?

Comparable is an internal implementation of a class. A class can only implement Comparable once. Comparator is an external implementation that allows you to add more sorting capabilities to a class without changing the class itself. We can also implement a Comparator for String. See the Distinction between Comparable and Comparator in the Comparable article.

String implements the intent of the two comparison methods and is practically straightforward. The Implementation of the Comparable interface provides a standard sorting scheme for classes, and String provides a Comparator into the public static class member to satisfy most of the case-ignoring sorting requirements. If there are other requirements, we have to fulfill them ourselves.

Class method

The methods of String can be broadly divided into the following categories.

A constructor
Function method
The factory method
Intern method

As for String method parsing, this article has been parsing good enough, so I will not repeat it here. However, the final intern method is worth studying.

Intern method

String constant pool

As one of Java’s basic types, String can be used to create objects in the form of literals, such as String s = “hello”. Of course, you can use new to create a String, but I rarely see it written this way. Over time, I got used to the first method, but I didn’t realize that there was a lot of science behind it. The following code shows the difference.

public class StringConstPool {
    public static void main(String[] args) {
        String s1 = "hello world";
        String s2 = new String("hello world");
        String s3 = "hello world";
        String s4 = new String("hello world");
        String s5 = "hello " + "world";
        String s6 = "hel" + "lo world";
        String s7 = "hello";
        String s8 = s7 + " world";
        
        System.out.println("s1 == s2: " + String.valueOf(s1 == s2) );
        System.out.println("s1.equals(s2): " + String.valueOf(s1.equals(s2)));
        System.out.println("s1 == s3: " + String.valueOf(s1 == s3));
        System.out.println("s1.equals(s3): " + String.valueOf(s1.equals(s3)));
        System.out.println("s2 == s4: " + String.valueOf(s2 == s4));
        System.out.println("s2.equals(s4): " + String.valueOf(s2.equals(s4)));
        System.out.println("s5 == s6: " + String.valueOf(s5 == s6));
        System.out.println("s1 == s8: "+ String.valueOf(s1 == s8)); }}/* output s1 == s2: false s1.equals(s2): true s1 == s3: true s1.equals(s3): true s2 == s4: false s2.equls(s4): true s5 == s6: true s1 == s8: false */

Copy the code

As you can see from the output of this code, the equals comparison results are all true because of the value of String’s equals comparison. (The default equals implementation of an Object is a comparison reference, which String overrides.) == compares references to two objects and returns true if the references are the same, false otherwise. S1 ==s2: false and s2==s4: false show that new an object must generate a new reference return. S1 ==s3: true proves that the same literals used to create objects will get the same references.

S5 == s6 is actually the same thing as s1 == s3 in the JVM’s eyes, because this simple operation of constants is already done at compile time. We can decompile the class file using Javap to see what happens after compilation.

➜ ~ javap -c stringConstPool. class Compiled from "stringConstPool. Java" public class io.github.jshanet.thinkinginjava.constpool.StringConstPool { public io.github.jshanet.thinkinginjava.constpool.StringConstPool(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]); Code: 0: ldc #2 // String hello world 2: astore_1 3: return }Copy the code

It doesn’t matter if you can’t read the assembly, because the comments are already clear……

The case of s1 == s8 is slightly more complicated. S8 is computed by a variable, so it cannot be evaluated directly at compile time. Java can’t overload operators, so there’s no clue in the JDK source code. Everything goes on decompile, and let’s decompile to see if the compiler actually has an effect on that.

public class io.github.jshanet.thinkinginjava.constpool.StringConstPool { public io.github.jshanet.thinkinginjava.constpool.StringConstPool(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]); Code: 0: ldc #2 // String hello 2: astore_1 3: new #3 // class java/lang/StringBuilder 6: dup 7: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V 10: aload_1 11: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;) Ljava/lang/StringBuilder; 14: ldc #6 // String world 16: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;) Ljava/lang/StringBuilder; 19: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 22: astore_2 23: return }Copy the code

As you can see from the decompilation, the String variable operations are actually implemented by StringBuilder after compilation, S8 = s7 + “world” code equivalent to (new StringBuilder(s7)).appEnd (” world”).toString(). Stringbuilder is a mutable class that aggregates two Strings into a new String using the Append method and toString, so it’s easy to understand why s1 == s8: false.

The above effect is due to the existence of a string constant pool. As the distribution of the string object and other objects is to pay cost, time and space and the string is the most commonly used in the program, the JVM in order to improve performance and reduce memory footprint, the introduction of string constant pool, in the use of literal creation object, the JVM will first to check the constant pool, if the pool has a ready-made object is returned to its reference directly, If not, create an object and put it in the pool. Because of the immutable nature of strings, the JVM does not have to worry about multiple variables referencing the same object changing its state. The global string constant pool created by the runtime instance has a table that always maintains a reference for each string object in the pool, so these objects are not GC.

Intern method

The topic intern method is not mentioned in this article. What does the intern method do? First take a look at the source code.

    /**
     * Returns a canonical representation for the string object.
     * <p>
     * A pool of strings, initially empty, is maintained privately by the
     * class {@code String}.
     * <p>
     * When the intern method is invoked, if the pool already contains a
     * string equal to this {@code String} object as determined by
     * the {@link #equals(Object)} method, then the string from the pool is
     * returned. Otherwise, this {@code String} object is added to the
     * pool and a reference to this {@code String} object is returned.
     * <p>
     * It follows that for any two strings {@code s} and {@code t},
     * {@code s.intern() == t.intern()} is {@code true}
     * if and only if {@code s.equals(t)} is {@codetrue}. * <p> * All literal strings and string-valued constant expressions are * interned. String literals are defined in Section 3.10.5 of the * <cite> the Java&trade; Language Specification</cite>. * *@return  a string that has the same contents as this string, but is
     *          guaranteed to be from a pool of unique strings.
     */
    public native String intern(a);
Copy the code

In the Oracle JDK, the intern method is modified with the native keyword and is not implemented, which means that the implementation is hidden. As you can see from the comment, this method returns the current string if it exists in the constant pool, or drops it into the constant pool if it does not exist. You can already see what this method does in the comments, but let’s use a few examples to prove it.

public class StringConstPool {
    public static void main(String[] args) {
        String s1 = "hello";
        String s2 = new String("hello");
        String s3 = s2.intern();
        System.out.println("s1 == s2: " + String.valueOf(s1 == s2));
        System.out.println("s1 == s3: "+ String.valueOf(s1 == s3)); }}/* output
s1 == s2: false
s1 == s3: true
*/
Copy the code

It’s easy to see that an intern is actually associating ordinary string objects with a constant pool.

Of course, the implementation principles and best practices of intern also need to be understood and studied. This in-depth analysis of String#intern by meituan technical team is very in-depth and detailed.

Class definitions and class members

Uninheritable and immutable

serialization

Can be sorted

Class method

Intern method

String constant pool

Intern method

Related Posts

Write an RPC remote call service demo

How does the JVM in Java implement multithreading? | August more challenges

Learn ES (2) with demand