Original address: mp.weixin.qq.com/s/Pub_K7PSC…

One question that is often asked by an interviewer is: Are objects in Java created in a heap?

Then he talks to the candidate about “escape analysis” in which the JVM assigns instance objects to the “stack.” In fact, this statement is not very strict, at least in HotSpot, does not store the object implementation code in the stack!

What is escape analysis?

First escape analysis is an algorithm used by the Java just-in-time compiler (JIT) to compile Java source code. Escape analysis algorithms can be used to determine whether an object in one method is accessible by other methods or threads.

If the analysis shows that an object is not accessible by other threads, it may be possible to make some deep optimizations to it during compilation, which are explained later.

When executing a Java program, you can turn escape analysis on or off with the following parameters.

Enable escape analysis: -xx :+DoEscapeAnalysis

Close escape analysis: -xx: -doescapeAnalysis

Principle of escape analysis

Escape. HPP in HotSpot source defines several states of an object after escape analysis. (path: SRC/share/vm/opto/escape. The HPP)

1. GlobalEscape

That is, the scope of an object that escapes the current method or thread.

  • Object is a static variable;
  • Object as the return value of the current method;
  • If the Finalize method of a class is overwritten, the instance objects of the finalize method of this class are in the global escape state (therefore, to improve performance, do not overwrite the Finalize method unless absolutely necessary).

2. Parameter Escape (ArgEscape)

That is, an object that is passed as a method parameter, or referenced by a parameter, but is never accessed by another method or thread during the call.

3. No NoEscape

The object in the method, which has not escaped, is further optimized by the Java just-in-time compiler.

Optimization of escape analysis

After escape analysis, if the escape state of an object is GlobalEscape or ArgEscape, the object must be allocated to the “heap” memory, but for NoEscape objects, this is not necessarily the case, and there are several optimizations.

1. Lock elimination

Take the following code for example.

In the lockElimination() method, object A is never accessed by any other method or thread, so a is a non-escape object, which makes synchronized(a) meaningless because a is a different lock object in any thread. So the JVM optimizes the above code to remove the synchronization code as follows:

There is another classic use scenario for lock elimination: StringBuffer.

A StringBuffer is a thread-safe class that uses synchronous methods to efficiently concatenate immutable string objects. The StringBuffer synchronizes all append() methods internally, as shown below:

However, there are many scenarios that don’t require this layer of thread safety, so Java 5 introduced an asynchronous StringBuilder class as an alternative. The append() method in StringBuilder does not use the synchronized flag, as shown below:

The thread calling the append() method of StringBuffer must acquire the object’s internal lock (also known as the monitor lock) to access the method and must release the lock before exiting the method. StringBuilder doesn’t need to do this, so it performs better than StringBuffer — at least at first glance.

However, with the introduction of “escape analysis” in the HotSpot VIRTUAL machine, the lock is automatically removed when the synchronization method of the StringBuffer object is called. To improve StringBuffer performance, use the following code:

A StringBuffer in the getString() method is a local variable inside the method and is not returned to the caller as a method return value, so it is a “NoEscape” object.

Executing the above code results in the following:

Java TestLockEliminate Total cost: 720 ms

We can turn off the lock elimination optimization with the -xx: -eliminatelocks argument and re-execute the above code with the following result:

EliminateLocks TestLockEliminate Java -xx: -eliminatelocks TestLockEliminate

As you can see, the performance decreases and the time takes longer after the closed lock is removed.

2. Object allocation elimination

In addition to lock elimination, the JVM also optimizes object allocation elimination for NoEscape objects. Object allocation elimination refers to the conversion of objects that should be allocated in the heap to be allocated in the stack. At first glance, it sounds incredible, but we can prove it with a case study.

For example, the following code creates the EscapeTest objects T1 and T2, respectively, in a 10 million loop.

Use the following command to execute the above code

java -Xms2g -Xmx2g -XX:+PrintGCDetails -XX:-DoEscapeAnalysis EscapeTest

Turn off escape Analysis with -xx: -doescapeAnalysis, and the code will stop at system.in.read (). Use JPS and jmap to view the EscapeTest object details in memory as follows:

As can be seen, there are 20 million EscapeTest instance objects in the heap memory at this time (10 million for T1 and 10 million for T2 respectively). GC log is as follows:

There are no GC reclaim events, but the Eden area is 96% occupied and all EscapeTest objects are allocated in the “heap”.

If we change the execution command to the following:

java -Xms2g -Xmx2g -XX:+PrintGCDetails -XX:+DoEscapeAnalysis EscapeTest

Start “Escape Analysis” and check the EscapeTest object again as follows:

As you can see, there are only about 300,000 in the heap and the GC logs are as follows:

No GC collection time occurs and the EscapeTest occupies only 8% of the Eden area, indicating that EscapeTest objects are not created in the heap but allocated in the “stack” instead.

Note:

Some readers may wonder: if escape analysis is enabled, objects in the NoEscape state are allocated on the stack.

Why are there still more than 300,000 objects in the heap? This is because I am using the JDK in mixed mode, and check the Java version with java-version. The result is as follows:

Mixed mode means mixed mode.

In Hotspot, the parallel architecture of interpreter and compiler is adopted. The so-called mixed mode is that the interpreter and compiler are used together. When the program is started, the interpreter is used for execution (at the same time, relevant data such as the number of function calls and the number of loop statements are recorded) to save compilation time. During execution with the interpreter, the recording of the data from which a function is run, the discovery of hot code, compilation of hot code by the compiler, and optimization (escape analysis is one such optimization technique).

3. Scalar substitution

In the previous article, I mentioned that after escape analysis, objects in the NoEscape state are allocated in the stack. In practice, however, this is not entirely accurate. Assigning objects directly on the stack is so difficult that it requires modifying a large amount of heap priority allocation code in the JVM, so HotSpot does not implement allocating objects directly on the stack, but instead uses a compromise called “scalar substitution”.

First understand that scalars and aggregate quantities, references to base types and objects can be understood as scalars and cannot be further decomposed. The quantity that can be further decomposed is the quantity of polymerization.

An object is an aggregate quantity, which can be further decomposed into scalars and its member variables into discrete variables. This is called “scalar substitution”. In this way, if an object does not escape, there is no need to create it in the “heap” at all, just some scalars on the stack or register that map the object, saving memory and improving application performance.

For example, here are two ways to calculate the sum:

At first glance, the sumPrimitive() method looks much simpler than the sumMutableWrapper() method, so it must be much faster.

But it turns out that the efficiency of the two methods is about the same. Why is that? In the sumMutableWrapper() method, the MutableWrapper object is non-escapable, meaning that there is no need to create a real MutableWrapper object in the “heap”. The Java just-in-time compiler optimizes it using scalar substitution, resulting in the following:

On closer inspection, the value in the optimized code above is also an intermediate variable, which, after being inlined, will be optimized as follows:

total += i;
Copy the code

In other words, a large chunk of Java source code has a simple one-line operation when it is actually executed. So the sumPrimitive and sumMutableWrapper() methods are almost equally efficient.