Please, stop saying that Java objects are allocated on the heap!

Java as an object-oriented, cross-platform language, its objects, memory and so on has always been relatively difficult knowledge, so, even a Java beginner, must be more or less familiar with the JVM. JVM knowledge is a must for every Java developer, as well as for job interviews.

In the JVM memory structure, the two most common areas are heap memory and stack memory. There are many books and articles on the Internet that describe the difference between heap and stack memory.

1. The heap is an area of memory shared by threads, and the stack is an area of memory exclusively owned by threads.

2. The heap mainly stores object instances, and the stack mainly stores various basic data types and object references.

However, the author can responsibly tell you that neither of the two conclusions is completely correct.

In my previous article “Java heap Memory is shared by threads!” Having explained that heap memory is not entirely thread – shared, this article discusses the second topic.

Object memory allocation

In the Java Virtual Machine Specification, the heap is described as follows:

In the Java virtual machine, the heap is an area of runtime memory that can be shared by individual threads and is allocated for all class instances and array objects.

Interviewer: Are you sure? Article, we introduced, also a Java object on the heap allocation, mainly on the Eden area, if launched TLAB priority on TLAB distribution, a few cases may be directly allocated in old age, the distribution rules of one hundred percent is not fixed, which is depending on the current use of a garbage collector, There is also the setting of memory related parameters in the virtual machine.

However, the following principles are generally followed:

Objects are allocated in Eden area first
- Allocation takes precedence in Eden, and if Eden does not have enough space, a Monitor GC is triggered
Big object goes straight to the old age
- Java objects that require a large amount of contiguity memory space will allocate memory directly in the old generation if the memory required by the object is greater than the value of -xx: PretenureSizeThreshold.

However, although there is such a requirement in the virtual machine specification, various virtual machine vendors may optimize the memory allocation of objects when implementing virtual machines. The most typical of these is the maturity of THE JIT technology in the HotSpot VIRTUAL machine, which makes the allocation of memory on the heap for objects arbitrary.

In understanding the Java Virtual Machine, the author makes a similar point, because JIT technology has made it less absolute that objects allocate memory on the heap. But the book does not expand on what JIT is or what JIT optimization does. So let’s take a closer look:

JIT technology

As we all know, javac allows you to compile Java program source code, convert it into Java bytecode, and the JVM interprets the bytecode into the corresponding machine instructions, read them in, and interpret and translate them. This is what a traditional JVM’s Interpreter does. Obviously, interpreted execution by the Java compiler is bound to be much slower than directly executing executable binary bytecode. To solve this efficiency problem, just-in-time (JIT) technology was introduced.

With the JIT, Java programs are interpreted and executed by the interpreter, and when the JVM detects that a method or block of Code is being run particularly frequently, it considers it “Hot Spot Code.” The JIT then translates some of the “hot code” into local machine-specific machine code, optimizes it, and caches the translated machine code for future use.

Hot spot detection

As mentioned above, to trigger the JIT, you first need to identify hot code. At present, the main HotSpot code identification method is HotSpot Detection, and HotSpot VIRTUAL machine mainly adopts counter – based HotSpot Detection

Counter Based Hot Spot Detection The virtual machine that takes this approach sets up a counter for each method, even a block of code, counts the number of times a method is executed, and if a method exceeds a threshold, it is considered a hot method, triggering JIT compilation.

Compiler optimization

The JIT performs hotspot detection to identify hotspot code, and in addition to caching its bytecode, optimizes the code. Among these optimizations, some important ones are escape analysis, lock elimination, lock expansion, method inlining, null check elimination, type check elimination, common subexpression elimination, etc.

Escape analysis in these optimizations is relevant to this article.

Escape analysis

Escape Analysis is one of the most advanced optimization techniques in Java virtual machines. This is a cross-function global data flow analysis algorithm that can effectively reduce the synchronization load and memory heap allocation stress in Java programs. Through escape analysis, the Hotspot compiler can figure out the scope of a new object’s reference and decide whether to allocate it to the heap.

The basic behavior of escape analysis is analyzing object dynamic scope: when an object is defined in a method, it may be referenced by an external method, such as passing it elsewhere as a call parameter, called method escape.

Such as:

public static String craeteStringBuffer(String s1, String s2) {
    StringBuffer sb = new StringBuffer();
    sb.append(s1);
    sb.append(s2);
    return sb.toString();
}
Copy the code

Sb is an internal method variable, and the code above does not return it directly, so that the StringBuffer cannot be changed by other methods, so that its scope is only inside the method. We can say that this variable has not escaped from the method.

With escape analysis, we can determine whether a variable in a method is likely to be accessed or changed by another thread. Based on this feature, the JIT can do some optimizations:

Synchronous omit
Scalar replacement
On the stack

For synchronization elision, you can refer to my previous “In-depth Understanding of Multithreading (5) — Java Virtual Machine lock optimization techniques” on lock elimination technology introduction. This article mainly analyzes scalar substitution and stack allocation.

Scalar replacement, stack allocation

We say that the JIT, after its escape analysis, may optimize an object if it does not escape out of the method body. The biggest result of this optimization is that it may change the principle that Java objects are allocated memory on the heap.

There are many reasons why objects are allocated on the heap, but the key one that is relevant to this article is that the heap has access shared by threads, so that objects created by one thread can be accessed by other threads.

So, if we create an object inside a method body and it doesn’t escape, is it necessary to allocate the object to the heap?

This is not necessary because the object is not accessed by other threads, and the lifetime is within a method, eliminating the need to allocate memory on the heap and recycle it.

So, with escape analysis, if an object does not escape out of the heap, what can be done to optimize it so that it is less likely to be allocated on the heap?

So that’s allocation on the stack. In HotSopt, on-stack allocation is not currently implemented, but via scalar substitution.

So let’s focus a little bit on what a scalar substitution is, and how to do an assignment on the stack with a scalar substitution.

Scalar replacement

A Scalar is a quantity which cannot be broken down into smaller quantities. Primitive data types in Java are scalars. In contrast, data that can be decomposed is called aggregates. Objects in Java are aggregates because they can be decomposed into other aggregates and scalars.

In THE JIT stage, if an object is found not to be accessed by the outside world after escape analysis, then the OBJECT will be disassembled into several member variables contained in it to be replaced by JIT optimization. This process is called scalar substitution.

public static void main(String[] args) {
   alloc();
}

private static void alloc() {
   Point point = new Point（1,2）;
   System.out.println("point.x="+point.x+"; point.y="+point.y);
}
class Point{
    private int x;
    private int y;
}
Copy the code

In the above code, the point object does not escape the alloc method, and the point object can be disintegrated into scalars. Instead of creating a Point object directly, the JIT will use two scalars int x, int y instead.

private static void alloc() {
   int x = 1;
   int y = 2;
   System.out.println("point.x="+x+"; point.y="+y);
}
Copy the code

As can be seen, the polymerization of Point is replaced by two aggregates after escape analysis and it is found that it does not escape.

By scalar substitution, an object is replaced by multiple member variables. The memory that would otherwise be allocated on the heap is no longer needed and can be allocated to member variables in the local method stack.

The experiment proved that

Talk Is Cheap, Show Me The Code

No Data, No BB;

So let’s do an experiment to see if escape analysis works, and if stack allocation actually happens, and what’s the advantage of stack allocation?

Let’s look at the following code:

public static void main(String[] args) { long a1 = System.currentTimeMillis(); for (int i = 0; i < 1000000; i++) { alloc(); } long a2 = system.currentTimemillis (); System.out.println("cost " + (a2 - a1) + " ms"); Sleep try {thread.sleep (100000); } catch (InterruptedException e1) { e1.printStackTrace(); } } private static void alloc() { User user = new User(); } static class User { }Copy the code

The code is simple: create a million User objects in your code using a for loop.

We define the User object in the alloc method, but we do not refer to it outside the method. That is, the object does not escape from alloc. After JIT escape analysis, its memory allocation can be optimized.

We specify the following JVM parameters and run them:

-Xmx4G -Xms4G -XX:-DoEscapeAnalysis -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError 
Copy the code

-xx: -doescapeAnalysis closes escape analysis.

After the program prints cost XX ms and before the code runs out, we use the jmap command to see how many User objects are currently in the heap:

➜ ~ jmap - 2809 # # num instances bytes histo class name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 1: 524 87282184 [I 2: 1000000 16000000 StackAllocTest$User 3: 6806 2093136 [B 4: 8006 1320872 [C 5: 4188 100512 java.lang.String 6: 581 66304 java.lang.ClassCopy the code

As you can see from the jmap execution results above, the heap created a total of 1 million StackAllocTest$User instances.

With escape analysis turned off (-xx: -doescapeAnalysis), the User object created in the alloc method does not escape outside the method, but is still allocated in heap memory. That is, if there were no JIT compiler optimization and no escape analysis techniques, this would normally be the case. That is, all objects are allocated to heap memory.

Next, we turn on escape analysis and execute the above code.

-Xmx4G -Xms4G -XX:+DoEscapeAnalysis -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError 
Copy the code

After the program prints cost XX ms and before the code runs out, we use the jmap command to see how many User objects are currently in the heap:

➜ ~ jmap - 2859 # # num instances bytes histo class name -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 1: 524 101944280 [I 2: 6806 2093136 [B 3: 83619 1337904 StackAllocTest$User 4: 8006 1320872 [C 5: 4188 100512 java.lang.String 6: 581 66304 java.lang.ClassCopy the code

As you can see from the print above, with escape analysis enabled (-xx :+DoEscapeAnalysis), there are only over 80,000 StackAllocTest$User objects in the heap memory. This means that after JIT optimization, the number of objects allocated in heap memory decreases from 1 million to 80,000.

In addition to the above method of verifying the number of objects through JMAP, the reader can also try to reduce the heap memory and then execute the above code, and analyze according to the number of GC, and also find that after the escape analysis is enabled, the number of GC is significantly reduced during the run. Because many heap allocations are optimized to stack allocations, GC times are significantly reduced.

Escape analysis is immature

In the previous example, when escape analysis was turned on, the number of objects went from 1 million to 80,000, but it was not zero, indicating that JIT optimization is not optimized in all cases.

Papers on escape analysis were published in 1999, but it wasn’t implemented until JDK 1.6, and the technology isn’t fully developed yet.

The fundamental reason is that there is no guarantee that the performance cost of escape analysis will be higher than its cost. Scalar substitution, stack allocation, and lock elimination can be done after escape analysis. However, escape analysis itself also requires a series of complex analysis, which is actually a relatively time-consuming process.

An extreme example would be an escape analysis where no object is escape-free. Then the process of escape analysis is wasted.

Although this technique is not very mature, it is also a very important tool in real-time compiler optimization.

conclusion

Normally, objects are allocated on the heap, but as compiler optimization techniques mature, this is required by the virtual machine specification, but the implementation is somewhat different.

For example, after introducing JIT optimization, the HotSpot VIRTUAL machine will perform escape analysis on objects, and if an object is found not to have escaped the method, then it is possible to allocate memory on the stack by scalar substitution instead of allocating memory on the heap.

Therefore, objects must allocate memory on the heap, which is not true.

Finally, we’ll leave you with a thought question, we’ve talked about TLAB before, and we’ve talked about on-stack allocation today. What do you think are the similarities and differences between these two optimizations?