First, technical background you have to understand it
JVM garbage collection: X When it comes to garbage collection (GC), most people think of the technology as a companion to the Java language. In fact, GC is older than Java, with dynamic memory allocation and garbage collection techniques being used in Lisp as early as 1960. Experts in the design and optimization of the C++ language should take note
Second, which memory needs to be reclaimed
We all know that the MEMORY structure of a JVM consists of five regions: program counters, virtual machine stacks, local method stacks, heap areas, and method areas. Among them, the program counter, virtual machine stack, local method stack three areas are born and die with the thread, so the memory allocation and reclamation of these areas are deterministic, there is no need to think too much about reclamation, because when the method ends or the thread ends, the memory will naturally follow the reclamation. The Java heap and method areas are different, different! This part of memory allocation and collection is dynamic, which is what the garbage collector needs to focus on.
Before the garbage collector can collect the heap and method areas, it must first determine which objects in these areas can be collected, and which can not be collected for the time being, which will use the algorithm to determine whether the object is alive! (The interviewer asks you a lot, right?)
####2.1 Reference counting algorithm
2.1.1 Algorithm analysis
Reference counting was an early strategy in the garbage collector. In this approach, there is a reference count for each object instance in the heap. When an object is created, the instance of the object is assigned to a variable whose count is set to 1. When any other variable is assigned a reference to this object, the count increases by 1 (a = b, then the counter of the object instance referenced by B +1), but when a reference to an object instance expires or is set to a new value, the object instance’s reference counter decreases by 1. Any object instance that references a counter of 0 can be garbage collected. When an object instance is garbage collected, the reference counter of any object instances it references decreases by one.
2.1.2 the pros and cons
Advantages: Reference counting collectors can be quickly executed, interwoven with program execution. This is good for real-time environments where programs need to be free from prolonged interruptions. Disadvantages: Circular references cannot be detected. If the parent object has a reference to the child object, the child object in turn references the parent object. Thus, their reference count can never be zero.
2.1.3 isn’t that boring? Let’s write a code to calm things down
This code is used to verify that the reference counting algorithm cannot detect circular references. Object1 and object2 are assigned to null, meaning that object1 and object2 point to objects that cannot be accessed, but because they refer to each other, their reference counter is not zero, and the garbage collector will never reclaim them.
2.2 Accessibility analysis algorithm
Accessibility analysis algorithm from the graph theory is introduced into discrete mathematics, program all the reference relationship as a diagram, starting from a node GC ROOT, find the corresponding reference node, found after this node, continue to look for reference from the node, when all the reference node to find after, the rest of the nodes is considered not by reference to the node, That is, useless nodes. Useless nodes are considered recyclable objects.
In the Java language, objects that can be used as GC Roots include the following:
A) objects referenced in the virtual machine stack (local variable table in the stack frame); B) objects referenced by class static attributes in the method area; C) objects referenced by constants in the method area; D) Objects referenced by JNI (Native method) in the Native method stack.
####2.3 How much do you know about references in Java
Whether the number of references is determined by reference counting algorithm or whether the reference chain of an object is reachable by reachability analysis algorithm, determining whether an object is alive or not is related to “reference”. In The Java language, references are divided into four types: strong reference, soft reference, weak reference, and virtual reference. The strength of these four kinds of references gradually decreases.
Strong reference
References to objects such as Object obj = new Object() are common in program code. As long as strong references exist, the garbage collector will never reclaim the referenced Object.
Soft references
A term used to describe objects that are useful but not necessary. Objects associated with soft references are listed in the collection scope for a second collection before the system is about to run out of memory. An out-of-memory exception is thrown if there is not enough memory after this collection.
A weak reference
It is also used to describe non-essential objects, but it is weaker than soft references, and objects associated with weak references only survive until the next garbage collection occurs. When the garbage collector works, objects associated only with weak references are reclaimed regardless of whether there is currently enough memory.
Phantom reference
Also called ghost quote or phantom quote, it is the weakest type of quote.
The existence of a virtual reference does not affect the lifetime of an object, nor can an object instance be obtained through a virtual reference. Its purpose is to receive a system notification when the object is reclaimed by the collector.
Don’t be intimidated by the concept, and don’t worry about it. The purpose of this list is to show that both reference-counting algorithms and reachability analysis algorithms are based on strong references.
2.4 Last struggle of the subject before death (retrieval)
Even in the reachability analysis algorithm, unreachable objects are not necessarily dead. At this time, they are temporarily in the stage of “probation”. To truly declare an object dead, at least two marking processes are needed.
** If an object does not have a chain of references connected to GC Roots after reachability analysis, it will be marked for the first time;
** Second tag: ** The first tag is followed by a filter to see if the object needs to execute the Finalize () method. If the reference chain is not re-established in the Finalize () method, it will be marked a second time.
The object marked successfully the second time will actually be recycled, and if the object reassociates the reference chain in the Finalize () method, it will escape the collection and live on. The apes can still keep up, heh heh.
####2.5 How to determine whether recycling is required in method area
Apes, the method store content needs to be recycled judgment is not the same. The main contents of the method area are: discarded constants and useless classes. Deprecated constants can also be determined by reference reachability, but for useless classes, the following three conditions must be met:
All instances of the class have been reclaimed, that is, there are no instances of the class in the Java heap; The ClassLoader carrying this class has been reclaimed. The java.lang.Class object corresponding to this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere through reflection.
In terms of the principle of class loading, which was the main character in ali’s interview, the interviewer asked me, for example, “Can I define String?” The answer was no, because the JVM performs parental delegation when it loads a class
Common garbage collection algorithms
The mark-clearance algorithm uses GC Roots to scan and mark the surviving objects. After marking, unmarked objects in the whole space are scanned for recycling, as shown in the figure below. The mark-clear algorithm does not need to move objects, but only processes the non-viable objects, which is extremely efficient in the case of a large number of viable objects. However, because the mark-clear algorithm directly reclaims the non-viable objects, it will cause memory fragmentation.
####3.2 Copying Algorithms
Replication algorithm is proposed to overcome the overhead of handle and solve the problem of memory fragmentation. It at the beginning of the heap is divided into more than one object plane and the free surface, the program from the object surface space for object allocation, when the object is full, the garbage collection is from the root set based on copying algorithm (GC Roots) active objects in the scan, and copy each event object to the free surface (making active objects between the memory of the no free hole). So the free plane becomes the object plane, the original object plane becomes the free plane, and the program allocates memory in the new object plane.
####3.3 Mark-Compact Algorithm
The mark-collation algorithm marks objects in the same way as the mark-clear algorithm, but the clearing is different. After the space occupied by the non-viable objects is reclaimed, all the viable objects are moved to the left free space and the corresponding Pointers are updated. Mark-tidy algorithm is based on mark-clear algorithm, but also carries on the object movement, so the cost is higher, but it solves the problem of memory fragmentation. See the following figure for the specific process:
####3.4 Generational Collection algorithm
The generational collection algorithm is currently used by most JVM garbage collectors. Its core idea is to divide memory into several different regions according to the life cycle of the object. Under normal conditions, the reactor is divided into the Tenured Generation and the Young Generation, and beyond the reactor is the Permanet Generation. The characteristics of the old generation are that only a small number of objects need to be recycled in each garbage collection, while the characteristics of the new generation are that a large number of objects need to be recycled in each garbage collection. Therefore, the most suitable collection algorithm can be adopted according to the characteristics of different generations.
3.4.1 Recycling algorithms for Young Generation (Recycling is mainly dominated by Copying)
- A) All newly generated objects are first placed in the young generation. The goal of the younger generation is to collect objects with short life spans as quickly as possible.
- B) New generation memory is divided into one Eden zone and two survivor zones (survivor0,survivor1) in a ratio of 8:1:1. One Eden zone and two Survivor zones (generally). Most objects are generated in the Eden zone. The Eden zone and survivable objects in survivable zone are copied to a survivable zone in survivable zone. Then the Eden zone is emptied. When the survivable zone in survivable zone is full, the Eden zone and survivable objects in survivable zone are copied to another survivable zone in survivable zone. Then empty Eden and the survivor0 zone, at which point the survivor0 zone is empty, then swap the survivor0 zone with the Survivor1 zone, leaving the Survivor1 zone empty, and repeat.
- C) If survivable objects in survivor1 are insufficient to store Eden and survivable objects in Survivor0, survivable objects are directly stored to the old age. If the old generation is also Full, a Full GC(Major GC) will be triggered, which means that both the new generation and the old generation will collect.
- D) Cenozoic GC is also called MinorGC, and MinorGC occurs with a high frequency (it may not be triggered until Eden is full).
3.4.2 Recycling algorithm of Old Generation (Mainly mark-compact)
A) Objects that survive N garbage collections in the young generation are put into the old generation. Therefore, you can think of the tenured generation as holding objects with long life cycles. B) The memory is much larger than that of the new generation (approximately 1:2). When the memory of the old generation is Full, Major GC is triggered, i.e., Full GC, which occurs at a low frequency and has a longer survival time and a high survival rate.
3.4.3 Recycling algorithm of Permanent Generation
Used to store static files, such as Java classes, methods, etc. Persistent generation has no significant impact on garbage collection, but some applications may generate or call classes dynamically, such as Hibernate, etc. In such cases, a large persistent generation space needs to be set up to store the classes added during the run. Persistent generation is also known as method region. For specific recycling, see section 2.5 above.
4. Common garbage collectors
The following is a picture of all the collectors that the HotSpot VIRTUAL machine contains.
Serial collector (replication algorithm) A new generation of single-threaded collectors, marking and cleaning are single-threaded, the advantage is simple and efficient. This is the default GC mode at the client level. You can use -xx :+UseSerialGC to specify the GC mode forcibly.
Serial Old collector single-threaded collector, an older version of the Serial collector.
The ParNew collector (stop-copy algorithm) is a new generation of collectors, which can be considered a multi-threaded version of the Serial collector and performs better than Serial on multi-core CPUS.
The Parallel collector, which seeks high throughput and efficient CPU utilization. Throughput is typically 99% and throughput = user thread time /(user thread time +GC thread time). Suitable for scenarios with low requirements on interaction, such as background applications. -xx :+UseParallelGC specifies the number of threads to be collected. -xx :ParallelGCThreads=4
The Parallel Collector, an older version of the Parallel Collector, is throughput first.
CMS(Concurrent Mark Sweep) collector (mark-clean algorithm) is a high concurrency, low pause, pursuit of the shortest GC recovery pause time, high CPU usage, fast response time, short pause time, multi-core CPU pursuit of high response time choice.
5. When is GC triggered (one of the most common interview questions)
Because objects are processed in generations, garbage collection regions and times are different. There are two types of GC: Scavenge GC and Full GC.
# # # # 5.1 Scavenge GC
Normally, when a new object is created and Eden fails to apply for space, the Scavenge GC is triggered to clean up the Eden exploiture, and the surviving objects are moved to the Survivor zone. Then the two zones of Survivor are collated. In this way, GC is performed on the Eden area of the young generation without affecting the old generation. Since most objects start from Eden, and Eden is not allocated very large, GC in Eden will occur frequently. Therefore, a fast and efficient algorithm is generally needed to make Eden free as soon as possible.
# # # # 5.2 Full GC
Clean up the entire heap, including Young, Tenured, and Perm. Full GC is slower than Scavenge due to the need to recycle the entire heap, so minimize the amount of Full GC you do. A large part of the process of tuning the JVM is tuning the Full GC. There are several possible reasons for Full GC:
A) Tenured generations are full; B) Persistent generation (Perm) is full; C) system.gc () is displayed and called; D) Dynamic changes of Heap allocation strategies in each domain after the last GC;
Welcome Java engineers who have worked for one to five years to join Java architecture development: 277763288
Group provides free Java architecture learning materials (which have high availability, high concurrency, high performance and distributed, Jvm performance tuning, Spring source code, MyBatis, Netty, Redis, Kafka, Mysql, Zookeeper, Tomcat, Docker, Dubbo, multiple knowledge architecture data, such as the Nginx) reasonable use their every minute and second time to learn to improve yourself, don’t use “no time” to hide his ideas on the lazy! While young, hard to fight, to the future of their own account!