Lifting the veil on Java memory management

preface

Compared to high-performance languages like C and C++, Java has a feature that such programmers envy: automatic memory management. Java programmers, it seems, no longer need to care about memory or know about it. But is it really the case? Especially for us Android programmers, memory is a dead end, once there is a more complex memory leakage and overflow problems, it is a nightmare. As a result, having a general understanding of Java memory management seems to have become an essential skill for a qualified Android programmer, even though newer Kotlin is jVM-based. Let’s take this opportunity to unveil it together.

object

Java is an object-oriented programming language, all corners of the country has been circulating such a word: everything is object. Therefore, Java memory management can also be understood as object creation and release. So, what exactly is an object? Boyfriend? Girlfriend? Still? What is the relationship between objects and memory? There are too many problems here. Let’s take it one step at a time.

Tips1: This paper takes HotSpot, the common memory area Java heap and common Java objects as examples. Tips2: For those of you who have read deep Understanding of the Java Virtual Machine, please leave it at the top right corner. If you forget, please continue!Copy the code

concept

Boyfriend or girlfriend can be understood as objects. Objects exist in reality, such as mom and dad. At the same time, there is an abstract concept, class: it is an abstraction of objects. That’s about it, I feel like I’m in a college class… Oh, my God.

Objects and Memory

create

What if a programmer has no wife? The new one. Old simple, tall, short, thin, fat, want what have what, this life most do not regret is to be a programmer, although the head is a little cold.

New is the creation of an object, so what exactly is a process? When the JVM encounters a new instruction, it first checks to see if the instruction’s arguments locate a symbolic reference to a class in the constant pool, and to see if the symbolic reference represents a class that has been loaded, parsed, and initialized. If not, the corresponding class loading process must be performed first. After the class loading check is passed, it can be said that the model of an object has been out, but Java is only a programming language, or has to allocate memory. How else do you do it?

distribution

The allocation of object memory is the same as that of many real scenes, such as parking. In some places, there may only be 100 parking Spaces, and those who arrive first stop in the first place, and then stop one by one in order. Such assignments are called “pointer collisions.” And you can park wherever you want, as long as you can plug it in. Such an allocation is called a “free list.” In either case, we rely on our eyes to see when parking is open, so how does the JVM “see”? The former relies on a pointer as an indicator of how far the allocated memory object is moved back, while the latter maintains a list of available memory (slots).

For students who are sensitive to concurrency, they will definitely ask, how can we correctly allocate to the corresponding position when concurrent? Generally, there are two solutions. One is to stop one car at a time to ensure that the previous car stops and the next car starts to stop. For example, if A, B and C stop in area A, then A, B and C can stop in area A every time, regardless of other areas (area refers to the thread). If they invite their friend D, then sorry, you can only stop when other areas stop. So object creation is not an atomic operation, remember, remember.

layout

We already know where to park, so how? Some people like to stop straight, some like to stop sideways, some like to stop backwards. Similarly, how are objects placed in memory? It is divided into three parts: Header, Instance Data, and Padding.

Let’s briefly introduce these three, because it’s too conceptual.

The object header contains two parts of information. The first part is used to store the runtime data of the object itself, such as hash code, GC generation age, lock status flag, lock held by the thread, bias thread ID, bias timestamp, etc. The other part is a type pointer, a pointer to an object’s class metadata that the virtual machine uses to determine which class the object is an example of. If you do not understand the above part of the nouns, I may explain in the follow-up article, after all, I am also learning, if you want to know the students can refer to the relevant information, let’s just remember it as a concept.

Instance data is easier to understand. It is the valid information that objects actually store and the content of various types of fields defined in program code.

Alignment padding is not necessary because memory management systems require that the object’s starting address be a multiple of 8 bytes, in other words, the object’s size be a multiple of 8 bytes. The object header size is an integer multiple of 8 bytes, so when the instance data size is not an integer multiple of 8 bytes, alignment padding is needed to complete.

access

After you park your car and do your job, you’re gonna have to drive home, and you’re gonna have to find your car, right? How to find? Remember where you parked, right? You remember your license plate, right? So how do we access our objects in memory? Let’s take a look at a set of pictures:

The former is called handle access, which has obvious advantages. If the object is moved, it only needs to modify the pointer in the handle, without reference. The latter is called direct pointer access and has the obvious advantage of being fast, without the handle layer. The HotSpot discussed in this article uses the latter.

recycling

What if the car blows up? Buy a new one, of course (manual smirk). So how do we know if a subject is shitty? Before this introduction of two kinds of reference algorithm: the first is the reference counting algorithm, it is very easy to understand, give an object a counter, the initial value is 0, there is a place reference to add 1, invalid on the minus 1, the counter is 0 description is shit; The second reachability analysis algorithm, which is also easy to understand, starts with GC Roots and references objects down. If an object has a path from GC Roots to itself, then the object is still alive, otherwise it is shit. Object567 is shit as shown below:

So which can be used as GC Roots?

The object referenced in the virtual machine stack (the local variable table in the stack frame)
The object referenced by the class static property in the method area
The object referenced by the constant in the method area
Objects referenced by JNI (commonly referred to as Native methods) in the Native method stack

Our HotSpot uses the latter, so why not the former? Because it is difficult to solve the problem of objects referring to each other in a loop. Such as:

ReferenceCountingGC objA = new ReferenceCountingGC();
ReferenceCountingGC objB = new ReferenceCountingGC();
objA.instance = objB;
objB.instance = objA;
objA = null;
objB = null;
Copy the code

Which begs the question, is the unreachable object really shit? Of course not, an object is declared dead at least twice before it is marked. The first tag is to find that the object is unreachable, and finalize() method is not overridden or finalize() method has been called by virtual machine. Then these can be considered as garbage and can be recycled. Is there a big man to answer); The rest of the objects will be placed in the F-quenue queue and the GC will mark them a second time, saving itself when finalize() is executed (by re-associating the method with other objects in the reference chain). It’s best to forget that this method exists. It is expensive to run, uncertain, and cannot guarantee the order of each object. Avoiding this approach is also mentioned in Effective Java.

That’s pretty much the end of simple object analysis. You think that’s the end of it? Too naive.

What are the terms like virtual stack, method area, Java heap, etc?

Runtime data area

International practice, No picture,say a J8!

Seeing this picture, I’m sure you know what I’m up to… I don’t want to, either. Writing about it is an exposit. Oh, my God, it’s embarrassing.

Program counter

A program counter is a small memory space that can be viewed as a line number pointer to the bytecode executed by the current thread. Basic functions such as branch, loop, jump, exception handling, thread recovery, etc. depend on this counter. As you can see from the graph, it is thread private, meaning that each thread has an independent program counter that does not affect each other. And it is the only area where the Java Virtual Machine specification does not specify any OutOfMemoryError cases.

Java virtual machine stack

The virtual stack describes the memory model of Java method execution: each method execution creates a stack frame to store information about local variables, operand stacks, dynamic links, method exits, and so on. The process of each method from invocation to completion corresponds to the process of a stack frame being pushed into and out of the virtual machine stack. If you are careful, you will notice that the local variable table appears in the object access section diagram. The important thing is that when entering a method, how much local variable space the method needs to allocate in the frame is completely determined. In other words, the memory space required for the local variable table is allocated at compile time.

In the Java Virtual Machine specification, two exceptions are specified for this area: a StackOverflowError is thrown if the stack depth of a thread request is greater than the depth allowed by the virtual machine; If the virtual stack can be dynamically extended (as most Java virtual machines currently do, although the Java Virtual Machine specification also allows fixed-length virtual stacks), an OutOfMemoryError is thrown if sufficient memory cannot be allocated when the stack cannot be extended.

Local method stack

The role of the Native method stack is very similar to that of the virtual machine stack, except that the virtual machine stack performs Java (aka bytecode) services for the virtual machine, while the Native method stack serves the Native methods used by the virtual machine. So the same is true of exceptions thrown by the Java virtual machine stack.

The Java heap

You can assume that almost all object instances are allocated on the heap. Isn’t it all? This is an optimization technique. If an object cannot be accessed by other methods or threads in any way, why not just allocate it on the stack?

According to the Java Virtual Machine specification, the Java heap can be in a physically discontiguous memory space as long as it is logically contiguous, which means that OutOfMemoryError will be thrown if there is not logically enough memory to complete the allocation and the heap cannot be expanded.

Methods area

The method area, like the Java heap, is an area of memory shared by individual threads to store data such as class information that has been loaded by the virtual machine, constants, static variables, code compiled by the just-in-time compiler, and so on. But in addition to the fact that, like the Java heap, it does not require contiguous memory and cannot choose fixed size or expandable memory that will throw OutofMemoryErrors, it can also choose not to implement garbage collection.

The runtime data area is almost complete, but there is a new concept called direct memory, which is useful in jdk1.4 NIO, if you are interested. I’m sure you’ve noticed that every area (except the program counter) is running out of memory.

Garbage collection algorithm

The Java heap mentioned above is arguably the largest chunk of memory managed by the virtual machine and is a frequent visitor to GC, hence the name “GC heap”. GC, as the name implies, is garbage collection, and this is one of Java’s great advantages: unused memory can be reclaimed automatically. Since it is garbage collection can have garbage collection device ah, the sweeper also uses a broom.

Mark-clear algorithm

By definition, mark the objects that need to be reclaimed first, and then clear the marked objects once and for all. It is arguably the most basic collection algorithm, and even the algorithms introduced later are improved upon it. In addition to its inefficiency, there is also a serious problem, even if it will produce a large number of discontinuous memory fragments, from the reason we just mentioned Java on OOM, it is very easy to fail to allocate and perform garbage collection a second time, or directly OOM. The execution process is shown in the figure:

Replication algorithm

This algorithm is easy to understand. It divides the available memory into two pieces, uses only one piece at a time, and when it comes to recycling, copies the available objects to the other piece, and then cleans up the original piece at once. It can be said that the efficiency is greatly improved, but the fatal weakness is that the memory is halved.

The execution process of the replication algorithm is shown in the figure:

Mark-collation algorithm

The copy algorithm is efficient in theory, but if you think about it, if you have 100 objects and 98 of them are available, then you have to copy 98 of them, and in the extreme case 100 of them survive, you have to copy all of them, which is unacceptable. This algorithm improves the memory fragmentation caused by the mark-clear algorithm by first moving the available objects to one end and then directly cleaning up the memory beyond the end boundary. The execution process is shown in the figure:

Generational collection algorithm

From what we’ve just analyzed, the copy algorithm seems to be more suitable for long-lived objects, while the remaining two algorithms are more suitable for hundred-year-old objects. The former are called the Cenozoic, and the latter are called the old. Our generational algorithm is just a different algorithm based on the new generation and the old generation.

So, here’s the question, how do old objects come about? In other words, how do you get into the old age? First, analyze a special case: the large object directly into the old age; Then there are the normal steps: When allocating, object A is allocated first in Eden space of the new generation. When Eden space is insufficient to allocate memory, A Minor GC is performed. After that, object A is still alive and can be accommodated by Survivor space, so object A is moved to Survivor space and its age counter is set to 1. Each time object A passes A Minor GC and survives, its age increases by 1, and when it reaches the MaxTenuringThreshold, it will be promoted to the old age (applause). Of course, this is not absolute; if the total size of all objects of the same age in a Survivor space is greater than half of the size in a Survivor space, an object older than or equal to that age can advance directly.

That’s about all I have to say about this article, leaving me with a key question: when and how does the garbage collector actually collect? There’s an awesome term called “Stop The World.”

gossip

First of all, I would like to say that in-depth understanding of the Java Virtual Machine (2nd edition) is a really good book, I have no chance to know this kind of god, nor advertising, students should know. Second, all the content of this article is from the book, even a paragraph verbatim. This is my note for finishing the second part of the book: Automatic memory management mechanisms. A lot of this is conceptual, like why is the earth called earth? This belongs to the established things, but for our Android programmers, it is best to its a general understanding, but not all the students have read the book (bought, also not necessarily see), so I share this article, some of them own understanding, if there is a problem I correct in time, it’s best to buy, carefully read the book, or I’m here to throw out a brick -!

It is also good to learn a little every day. Since it is learning, the object must have been summed up by predecessors, what you should do is to understand it and turn it into your own thing (translate it with your own thoughts, the essence remains unchanged), or it is called exploration. There is a sentence is better than a bad memory, the teacher must have said this sentence, at that time, a word did not enter my ear.

Finally, thanks to those who have supported me all the time!

Here, I wish you all a happy New Year in advance!

portal

Github：github.com/crazysunj/

Blog: crazysunj.com/