A good Java programmer must understand how GC works, how to optimize GC performance, and how to interact with GC in a limited way, because there are some applications that require high performance, such as embedded systems, real-time systems, etc., and only by improving memory management efficiency across the board can the performance of the entire application be improved.


A good Java programmer must understand how GC works, how to optimize GC performance, and how to interact with GC in a limited way, because there are some applications that require high performance, such as embedded systems, real-time systems, etc., and only by improving memory management efficiency across the board can the performance of the entire application be improved. This article first briefly introduces the working principle of GC, then discusses several key issues of GC in depth, and finally proposes some Java programming suggestions to improve the performance of Java programs from the perspective of GC.



Fundamentals of GC


Java’s memory management is really the management of objects, including the allocation and release of objects. For programmers, assign objects using the new keyword; When an object is released, all references to the object are set to NULL so that the program can no longer access the object. This object is called “unreachable “.GC is responsible for reclaiming the memory of all unreachable objects. In the case of the GC, when a programmer creates an object, the GC starts monitoring the address, size, and usage of the object. In general, GC records and manages all objects in the heap in a directed graph. This is a way to determine which objects are “reachable” and which are “unreachable “. When the GC determines that some objects are “unreachable”, it is the GC’s responsibility to reclaim those memory Spaces.


However, in order to ensure that GC can be implemented on different platforms, the Java specification does not strictly regulate much of GC’s behavior. For example, there are no clear rules on important issues such as what type of recycling algorithm to use and when to recycle. As a result, implementers of different JVMS often have different implementation algorithms. This also creates a lot of uncertainty for Java programmers. This article examines several issues related to GC work in an effort to reduce the negative impact of this uncertainty on Java programs.



Incremental GC



GC is typically implemented in the JVM by one or a group of processes, which itself occupies heap space as well as CPU at runtime, just like a user program. The application stops running while the GC process is running. As a result, users can feel the Java program stall when GC runs for a long time. On the other hand, if GC runs for too short a time, object recovery may be too low, which means that there are many objects that should be collected that are not collected and still occupy a lot of memory.


Therefore, when designing GC, you have to make a trade-off between pause time and recovery. A good GC implementation allows the user to define the Settings he or she needs, for example, some devices with limited memory are very sensitive to memory usage, and the GC is expected to reclaim memory accurately, regardless of program slowdowns.


Other real – time network games can not allow procedures to have long interruptions. Incremental GC reduces the impact of GC on the user’s program by dividing a long interrupt into many smaller interrupts through a certain collection algorithm. Although incremental GC may not be as efficient as regular GC in terms of overall performance, it can reduce the maximum pause time of the program. HotSpot JVMS provided by the Sun JDK can support incremental GC.HotSpot JVMS default to not use incremental GC. In order to initiate incremental GC, we must add the -xincGC parameter when running the Java program.


The HotSpot JVM incremental GC is implemented using the Train GC algorithm. The basic idea is to group (layer) all the objects in the heap according to their creation and use, place the most frequently used and relevant objects in a team, and adjust the group as the program runs. When THE GC runs, it always reclaims the oldest (recently accessed) objects first, and if the entire group is recyclable, the GC reclaims the entire group. In this way, only a certain percentage of unreachable objects are reclaimed per GC run, ensuring smooth running of the program.



Finalize function in detail



Finalize is a method in the Object class. The access modifier of Finalize is protected. Since all classes are subclasses of Object, the user class can easily access this method. Since Finalize function does not automatically implement chain call, we have to implement it manually, so the last statement of Finalize function is usually super.Finalize (). In this way, we can implement the call of Finalize from bottom to top by releasing our own resources first and then releasing the resources of the parent class.



According to the Java language specification, the JVM guarantees that the object is unreachable until the Finalize function is called, but the JVM does not guarantee that the function will be called. In addition, the specification also guarantees that the Finalize function is run at most once.



Many Java beginners will think of this method as similar to the destructor in C++, where many objects and resources are released. In fact, this is not a very good way. There are three reasons. First, GC has to do a lot of additional work on the object overriding the finalize function in order to support the finalize function. Second, after finalize is executed, the object may become reachable, and GC has to check whether the object is reachable again. Therefore, using Finalize will reduce GC performance. Third, since the time when GC calls Finalize is uncertain, it is also uncertain to release resources in this way.



Finalize is usually used for the release of some important resources that are not easy to control, such as some I/O operations and data connection. The release of these resources is critical to the overall application. In this case, programmers should mainly manage (including release) these resources through the program itself, and finalize function release of resources as a supplement, so as to form a double-insurance management mechanism, instead of just relying on Finalize to release resources.



The following is an example to show that Finalize function can still be reachable after being called, and finalize object can only be run once.


1 class MyObject{ 2 3 Test main; 4 5 public MyObject(Test t) 6 7 {8 9 main=t; // Save Test object 10 11} 12 13 protected void finalize() 14 15 {16 17 main.ref=this; 19 system.out.println (system.out.println)"This is finalize"); // To Test finalize only once 20 21} 22 23} 24 25 class Test {26 27 MyObject ref; 28 29 public static void main(String[] args) { 30 31 Testtest=new Test();
32 
33 test.ref=new MyObject(test); 34 35 test.ref=null; //MyObject is unreachable, Finalize will be called 36 37 system.gc (); 38 and 39if(test.ref! =null) System.out.println("My Object is alive"); 46 47 This is finalize 48 49 MyObject is aliveCopy the code


In this example, it should be noted that although MyObject becomes reachable object in Finalize, Finalize will not be called next time when finalize is collected, because Finalize function can only be called once at most.



How does the program interact with the GC



Java2 enhances memory management by adding a java.lang.ref package that defines three reference classes. The three reference classes are SoftReference, WeakReference, and PhantomReference. By using these reference classes, the programmer can interact with the GC to some extent to improve the efficiency of the GC. These reference classes have a reference strength between reachable and unreachable objects.



Creating a Reference object is also very easy. For example, if you need to create a Soft Reference object, first create an object and use plain Reference (reachable object). Then create a SoftReference to reference that object. Finally, set the normal Reference to null. In this way, the object has only one Soft Reference. Also, we call this object a Soft Reference object.



The main feature of Soft Reference is that it has strong Reference function. Such memory is only reclaimed when there is insufficient memory, so it is usually not reclaimed when there is sufficient memory. In addition, these reference objects are guaranteed to be null before Java throws an OutOfMemory exception. It can be used to realize the Cache of some common pictures, realize the function of Cache, ensure the maximum use of memory without causing OutOfMemory. The usage pseudocode for this reference type is shown below;


3 Image Image =new Image(); // create Image object 4 5... 6 7 // Use image 8 9... 10 11 // After using image, set it to soft reference type and release strong reference; 12 13 SoftReference sr=new SoftReference(image); 14 15 image=null; 16 17... 18 19 // Next use 20 21if(sr! =null) image=sr.get(); 22, 23else{24 25 // Due to GC due to low memory, image has been freed, so need to reload; 26 27 image=new Image(); 28 29 sr=new SoftReference(image); 30 31}Copy the code


The main difference between Weak reference objects and Soft reference objects is that GC uses an algorithm to check whether Soft reference objects are reclaimed, while Weak reference objects are always reclaimed. Weak reference objects are easier and faster to collect by GC. Although the GC must reclaim Weak objects at run time, groups of Weak objects in complex relationships often require several GC runs to complete. Weak reference objects are often used in Map structures to refer to objects with a large amount of data. If the strong reference of the object is NULL, the GC can quickly reclaim the object space.



The Phantom reference has less use and is mainly used to assist the finalize function. Phantom objects refer to objects that finalize and are unreachable, but have not been collected by GC. This object can assist Finalize to carry out some later collection work. We can enhance the flexibility of resource collection mechanism by overwriting Reference’s clear () method.



Some Java coding suggestions



Depending on how GC works, there are a few tricks and ways to make GC run more efficiently and in accordance with the requirements of your application. Here are some programming tips.



1. The basic recommendation is to release references to useless objects as soon as possible. When most programmers use temporary variables, reference variables are automatically set to NULL when they exit the scope. When we use this method, we must pay special attention to some complex object graph, such as array, queue, tree, graph, etc., these objects have a complex mutual reference relationship. For such objects, GC is generally inefficient at collecting them. If the program allows it, assign unused reference objects to NULL as early as possible. This speeds up the GC.



2. Use finalize functions as little as possible. The Finalize function is an opportunity that Java gives programmers to release objects or resources. However, it will increase the workload of GC, so we should use Finalize method to recycle resources as little as possible.



3. If you need to use frequently used images, you can use soft application type. As far as possible, it can save the image in memory for the program to call, without causing OutOfMemory.



4. Pay attention to collection data types, including arrays, trees, graphs, linked lists, and other data structures that are more complex to recycle for GC. Also, note some global variables, as well as some static variables. These variables tend to cause dangling reference, resulting in memory waste.



5. When the program is waiting for a certain amount of time, the programmer can manually execute system.gc () to tell the GC to run, but the Java language specification does not guarantee that the GC will execute. Using incremental GC can shorten the pause time of Java programs.

Welcome Java engineers who have worked for one to five years to join Java Programmer development: 854393687 group provides free Java architecture learning materials (which have high availability, high concurrency, high performance and distributed, Jvm performance tuning, Spring source code, MyBatis, Netty, Redis, Kafka, Mysql, Zookeeper, Tomcat, Docker, Dubbo, multiple knowledge architecture data, such as the Nginx) reasonable use their every minute and second time to learn to improve yourself, don’t use “no time” to hide his ideas on the lazy! While young, hard to fight, to the future of their own account!