preface

For Java programmers, with the help of the Automatic memory management mechanism of the Java virtual machine, there is no need to write the corresponding DELETE /free code for each new operation, which is less prone to memory leaks and overflow problems. However, because Java programmers have delegated memory control to the Java virtual machine, memory leaks and memory spills can be difficult to troubleshoot if you don’t know how the virtual machine is using memory.

This article will summarize Java’s memory management and four reference types.

Java memory management


Java memory management is all about allocating and freeing objects. In Java, the allocation of memory is done by “programs” and the release of memory is done by the Java garbage collector (GC), which does simplify the programmer’s life, but also makes it harder for the JVM. This is one of the reasons why Java programs run slower.

In order to release objects correctly, GC must monitor the running state of each object, including the application, reference, being referenced, and assignment of objects. Monitoring the object state is to release objects accurately and timely, and the fundamental principle of releasing objects is that the object is no longer referenced.

1. Java memory allocation strategy

There are three memory allocation strategies for Java program runtime, namely static allocation, stack allocation and heap allocation. The memory space used by the three methods are static storage area (method area), stack area and heap area respectively.

  • Static storage area (method area) : mainly store static variables. This “memory” is allocated when the program is compiled and remains for the entire duration of the program.

  • Stack area: When a method is executed, local variables in the body of the method (including underlying data types, references to objects) are created on the stack at the end of the method execution. Memory held by these local variables is automatically freed. Because stack allocation operations are built into the processor’s instruction set, it is efficient, but the allocated memory capacity is limited.

  • The heap: Also known as dynamic memory allocation, usually refers to the memory that is newly created when the program is running, that is, the object instance. This “memory” is collected by the Java garbage collector when it is not in use.

Here is an example to illustrate in detail:

public class Sample {
    int s1 = 0;
    Sample mSample1 = new Sample();

    public void method() {
        int s2 = 1;
        Sample mSample2 = new Sample();
    }
}

Sample mSample3 = new Sample();
Copy the code

The Sample local variable s2 and reference variable mSample2 exist on the stack, but the object mSample2 points to exists on the heap. The object entity to which mSample3 points is stored on the heap, including all of its member variables S1 and mSample1, but its reference variables are stored on the stack.

Conclusion:

  • The basic data types and references of local variables are stored in the stack, and the referenced object entities are stored in the heap — because they belong to variables in the method, the life cycle ends with the method

  • The member variables are all stored in the heap (including the primitive data types, references, and referenced object entities) – because they belong to classes, class objects will eventually be used by new

2. Java garbage collector

Method in the Java heap and static storage area (area), an interface of multiple implementation classes need memory may not be the same, a method of multiple branch need memory also may not be the same, we can only know when during the application is running is created which objects, this part of the memory allocation and recovery are dynamic, It is this portion of memory that the garbage collector focuses on.

2.1 Method to determine whether an object is alive

The heap holds almost all object instances in the Java world, and the first thing the garbage collector needs to do before collecting the heap is to determine which of these objects are “alive” and which are “dead.”

Reference counting algorithm

Adds a reference counter to an object that increments by one each time a reference is made to it and decreases by one when the reference is invalid. An object whose counter is 0 at any point in time cannot be used again.

Reference counting algorithm is easy to implement and has high judgment efficiency. It is a good algorithm in most cases. However, reference counting algorithms are not used to manage memory, at least in mainstream Java virtual machines, mainly because it is difficult to solve the problem of circular references between objects.

Accessibility analysis algorithm

In the mainstream commercial programming language (Java, C#) of the mainstream implementation, are called through reachability analysis to determine whether the object is alive. The basic idea of this algorithm is to search down from a series of objects called “GC Roots” as the starting point. The search path is called the reference chain. When an object is not connected to GC Roots by any reference chain, it is proved that the object is not available.

In the Java language, objects that can be used as GC Roots include the following:

  • The object referenced in the virtual machine stack (the local variable table in the stack frame)
  • The object referenced by the class static property in the method area
  • The object referenced by the constant in the method area
  • Objects referenced by JNI (commonly referred to as Native methods) in the Native method stack

2.2 Garbage collection algorithm

2.2.1 Mark – Clear algorithm

The most basic collection algorithm is the Mark-sweep algorithm, which, as its name suggests, is divided into two phases:

  • Mark all objects that need to be reclaimed

  • All marked objects are uniformly recycled after marking is complete

The reason why it is the most basic collection algorithm is that the subsequent collection algorithms are based on this idea and improve on its shortcomings. There are two main disadvantages:

  • Efficiency issues, both marking and cleaning processes are inefficient

  • Space problem, a large number of discrete memory fragments will be generated after the flag is cleared

Too much memory fragmentation can cause another garbage collection action to be triggered prematurely when large objects need to be allocated later in the program’s run and not enough contiguous memory can be found.

2.2.2 Replication Algorithm

To solve the efficiency problem, a collection algorithm called “replication” emerged, which divided the available memory into two equally sized pieces by capacity, using only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again.

In this way, the memory is reclaimed for the whole half region every time, and there is no need to consider the complicated situation of memory fragmentation when allocating memory. As long as the heap top pointer is moved, the memory can be allocated in order, which is simple to implement and efficient to run. But this algorithm comes at the cost of reducing memory by half.

2.2.3 Mark-collation algorithm

When the object survival rate is high, the replication algorithm will carry out more replication operations, and the efficiency will be low. More importantly, if you don’t want to waste 50 percent of the space, you need to have extra space to guarantee against the extreme case where all objects in used memory are 100% alive, so in the old days you can’t just use this algorithm directly.

According to the characteristics of the old era, another “mark-clean” algorithm is proposed. The marking process is still the same as the “mark-clean” algorithm, but the next step is not to clean up the recyclable objects directly, but to move all the surviving objects to one end, and then directly clean up the memory beyond the boundary.

2.2.4 Generational collection algorithm

Current commercial garbage collection of the virtual machine all adopt “generational collection” algorithm, this algorithm has no new ideas, just according to different objects alive cycle could be divided into a few pieces, memory is generally divides into the Java heap generation and old age, so that you can according to the characteristics of the individual s adopt the most proper collection algorithms.

In the new generation, a large number of objects are found dead and only a small number of objects survive in garbage collection, so the replication algorithm is selected, and only a small amount of the replication cost of the surviving objects can be collected. However, in the old age, because of the high survival rate of objects, there is no extra space to guarantee them. A “mark-clean” or “mark-tidy” algorithm must be used to recycle.

Java reference types


Prior to JDK 1.2, the definition of a reference in Java was traditional: if the value stored in a reference type of data represented the starting address of another chunk of memory, that chunk of memory represented a reference. An object can only be referred to or not referred to by this definition, and it is useless to describe some “tasteless” objects.

We want to be able to describe objects that can be kept in memory when there is enough space, and can be discarded if there is too much space after garbage collection. Caching in many systems fits this scenario.

After JDK 1.2, Java expanded the concept of references, References are divided into Strong Reference, Soft Reference, Weak Reference and Phantom Reference. The intensity of these four references gradually decreases once

  • Strong references: References to objects such as Object obj = new Object() that are common in program code. As long as strong references exist, the garbage collector will “never” reclaim the referenced Object

  • Soft references: Used to describe objects that are useful but not necessary. Objects associated with soft references are listed in the collection scope for a second collection before the system is about to run out of memory. An out-of-memory exception is thrown if there is not enough memory for this collection

  • Weak references: Used to describe non-essential objects, but they are weaker than soft references. Objects associated with weak references only survive until the next garbage collection. When the garbage collector works, objects associated only with weak references are reclaimed regardless of whether there is currently enough memory

  • Virtual reference: Also known as ghost reference or phantom reference, this is the weakest type of reference relationship. The existence of a virtual reference does not affect the lifetime of an object, nor can an object instance be obtained through a virtual reference. The sole purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector.

Finally, here’s a picture to summarize the differences:


The resources

In-depth Understanding of the Java Virtual Machine