There was a discussion in the community about whether the JVM really needs GC. They believe that the goal of the application’s reclamation is to build a GC that only handles memory allocation and does not perform any actual memory reclamation. That is, sequential JVM pauses occur only when the available Java heap is exhausted.
You need to understand why GC is needed in the first place. As the business that the application deals with becomes larger and more complex, and the number of users increases, it is impossible to keep the application running without GC. The GC that often causes STW can’t keep up with the actual requirements, so there are constant attempts to optimize GC.
The community’s need, and the industry’s goal, is to minimize disruption to the normal execution of applications. Oracle released the G1 GC in JDK7 to reduce the likelihood of application pauses. Let’s look at what the G1 GC does in this article.
Remember Doraemon? He and Kang have a desk, and the desk drawer is actually a time travel tunnel that allows us to operate Doraemon’s time machine, back in 1998. On December 8 of that year, enterprise J2EE, the second generation of the Java platform, was released. In order to cooperate with the implementation of enterprise applications, on April 27, 1999, the stage of Java program – Java HotSpot Virtual Machine (hereinafter referred to as HotSpot) was officially released, and from the release of JDK1.3 version, HotSpot becomes the default VIRTUAL machine for the Sun JDK.
The Serial GC came with JDK1.3.1 in 1999. It was the first GC, and it was just the beginning. Since then, JDK1.4 and J2SE1.3 have been released. On February 26, 2002, J2SE1.4 was released, Parallel GC and Concurrent Mark Sweep (CMS) GC followed JDK1.4.2, and Parallel GC became HotSpot default GC after JDK6.
HotSpot has so many garbage collectors, so if someone asks, what is the difference between Serial GC, Parallel GC, and Concurrent Mark Sweep GC? Please remember the following passwords:
-
If you want to minimize memory and parallel overhead, choose Serial GC;
-
If you want to maximize the throughput of your application, Parallel GC;
-
If you want to minimize GC interrupts or pauses, select CMS GC.
So why release Garbage First (G1) GCS when we already have the three powerful GCS above? The reason is that applications are dealing with more and more large and complex businesses, more and more users, without GC can not ensure the normal operation of the application, and often the STW GC can not keep up with the actual needs, so they constantly try to optimize GC.
Why is it called Garbage First (G1)?
Because G1 is a parallel collector, it divides the heap into a number of unrelated regions. Each Region can belong to an older or younger generation, and each age range can be physically discontinuous.
The old age interval itself is designed to serve parallel background threads whose main job is to find unreferenced objects. As a result, some sections have more garbage (unreferenced objects) than others.
Garbage collection actually requires stopping applications, otherwise there is no way to prevent application interference, and then the G1 GC can concentrate on the most garbage sections and empty those sections in a fraction of the time, leaving the completely free sections.
After a bit of detour, it became clear that the G1 was named Garbage First because of its focus on the most Garbage.
The G1 GC is a compression collector designed to collect the maximum amount of garbage. The G1 GC uses the properties of incremental, parallel, and exclusive pause to achieve compression goals by copy. In addition, it also uses parallel and multi-stage parallel marking methods to help reduce the pause time of marking, re-marking, and clearing pauses. Minimizing the pause time is one of its design goals.
The G1 collector is a new garbage collector that was introduced in JDK1.7 and is intended to replace the CMS collector in the long term. The G1 collector has a unique garbage collection strategy, which is quite different from the previous collectors. From the perspective of generation, G1 is still a generational garbage collector. It can distinguish young generation from old generation. The young generation still has Eden region and Survivor region.
Overall, the G1 uses an entirely new partitioning algorithm, which features the following:
-
Parallelism: G1 can have multiple GC threads working at the same time during collection, effectively utilizing multi-core computing power;
-
Concurrency: G1 has the ability to alternate execution with the application, so that some work can be performed at the same time as the application, so that, generally speaking, the application does not completely block during the entire reclamation phase;
-
Generational GC: G1 is still a generational collector, but unlike previous types of collectors, it takes care of both young and old generations. Compare other recyclers, either working in the younger generation or working in the older generation;
-
Declutter: G1 moves objects appropriately during the recycle process, unlike CMS, which simply marks objects for cleanup. After several GC’s, the CMS must do a defragmentation. G1, on the other hand, effectively replicates objects each time it is recycled, reducing space debris and thus speeding up the internal loop.
-
Predictability: Due to partitioning, G1 can only select a part of the region for memory reclamation, which reduces the scope of reclamation, so that the occurrence of global pauses can be well controlled.
With the advent of the G1 GC, GC moved from the traditional continuous heap memory layout design to discrete chunks of memory by introducing the concept of regions, that is, a collection of discrete regions that make up the heap memory. In fact, it is not discontinuous, but gradually changes from the traditional physical continuity to the logical continuity, which is realized through the dynamic allocation of regions. We can assign a Region to any one of Eden, Survivor, old age, large object interval, idle interval, etc., instead of fixed its role. Because the more fixed, the more rigid.
Through the force of the market, the old industry is constantly eliminated, and the limited resources are given to those enterprises with stronger competitiveness and higher profit margins. Silicon Valley, too, has been weeding out obsolete people and drawing in fresh blood from around the world. After more than half a century of development, silicon Valley has formed a culture of excellence in order to survive. With this in mind, the GC takes on the task of eliminating garbage and preserving good assets.
The G1 GC is designed to recycle garbage by collecting the largest number of in-heap regions during the collection pause phase. The only exception is in the Cleanup step of the parallel tag phase. If the G1 GC finds that all intervals are made up of garbage that can be collected during the Cleanup step, it immediately reclaims those intervals and inserts them into an idle interval queue based on the LinkedList implementation. Reserve for later. Thus, releasing these intervals does not require waiting for the next garbage collection interruption; it is performed in real time, with the sweep phase acting as the final control. This is a major difference between the G1 GC and previous generations of GC.
The G1 GC garbage collection cycle consists of three main types:
-
Young generation cycle
-
Multi-step parallel marking loop
-
Mixed collection cycle
-
Full GC
During the young generation collection period, the G1 GC suspends the application thread and moves the live object from the young generation to either Survivor or old, or possibly both. For a mixed payback period, the G1 GC moves live objects from the old period to the free period, which becomes part of the old period.
In order to increase the recycling rate of GC, the HotSpot in GC has its own different design schemes, interval concept in the field of software design, architecture is not a new word, relational database, the column type database save, take the speed of the first to use this concept to improve data, software architecture design is also widely used such partition concept to speed up the data exchange, calculation.
Why do I have the idea of interval? You must have seen the TV show “Big House”, right? The white family, a well-known medical skill family in Beijing, is the protagonist of this TV series. There are three white brothers, no division before, by the don in charge of the whole family, the Don seems to be a smart man, the essence is a confused person, otherwise it will not get later White House broken people scattered. Before the separation of the three bai brothers, the eldest one was very honest, while the second one was very cowardly and female-like. Although they knew the truth in their belly, they dared not come out to make decisions. Old three young bastard, every time to go out to purchase medicinal materials to steal the silver in the home, resulting in accounts chaos. Eldest brother in order to family harmony, has been in private inverted silver, so that the Don can see a normal account. Sooner or later, there will be problems within the family when such a family gets together. It would be better to split up. You don’t need to calculate the money in the family. This is the original concept of a Region.
Let’s go back to technology and look at the way HBase RegionServer is designed. In HBase, all user data and metadata requests are located by the Region and are sent to the RegionServer. The RegionServer performs data read and write operations. RegionServer is a service that an HBase cluster runs on each working node. It is key to the HBase system. On the one hand, it maintains Region status and provides Region management and services. On the other hand, it interacts with the Master, uploads load information of the Region, and participates in the distributed coordination and management of the Master.
HRegionServer communicates with HMaster and Client through RPC. HRegionServer periodically reports the load status of nodes to HMaster, including RS memory usage and online Region. In this process, HRegionServer plays the role of the RPC client and HMaster plays the role of the RPC server. HRegionServer provides built-in RpcServer to update, read, and delete data, and Region operations such as Flush, Compaction, Open, Close, and Load files.
Region is the basic unit of HBase data storage and management. HBase uses rowkeys to horizontally divide tables into multiple Hregions. From the perspective of HMaster, each HRegion records its StartKey and EndKey (the StartKey of the first HRegion is empty, and the EndKey of the last HRegion is empty). Since rowkeys are sorted, the Client can use the HMaster to quickly determine which HRegion each RowKey belongs to. HRegion is allocated to HRegionServer by HMaster. HRegionServer starts and manages HRegion, communicates with Client, and reads data (using HDFS). Each HRegionServer can manage about 1000 HRegions simultaneously.
Take a look at partitioning design in terms of software system architecture. Taking task scheduling as an example, suppose we have a central scheduling service. As the volume of data increases, the central scheduling service must encounter performance bottlenecks because all requests end up pointing to it. To solve this performance bottleneck, we can split the task scheduling into multiple services that can handle the task scheduling. The question then arises: Does the source data processed by each task scheduling service need to be identical?
According to huawei company issued patent inventions, they have a data source for each task scheduling service operations, namely according to the number of task scheduling in the source data, such as three task scheduling services, so the source data according to the line number for more than 3 to take the way, if you run after a period of time, the task scheduling service increase or decrease in the quantity, So this mod partition needs to be redone, it needs to be redivided according to the number of tasks scheduled at that time.
Back in the G1. In G1, the heap is evenly divided into regions of equal size. Each Region has an associated Remembered Set (RS). The data structure of RS is a Hash Table and the data in RS is Card tables (each 512 bytes in the heap is mapped to the Card Table with 1byte).
RS is simply a pointer to a Region object. When data in a Region changes, it is first reflected to one or more cards in the Card Table. RS scans the internal Card Table to learn the memory usage and living objects of the Region. In the process of using a Region, if a Region is filled up, the memory allocation thread selects a new Region. The free Region is organized into a LinkedList data structure so that new regions can be found quickly.
A JVM without GC is unthinkable, and we can only avoid a lot of garbage by constantly optimizing its use and tweaking our applications, rather than believing that GC is causing application problems.
Here I recommend an architecture learning exchange group. Exchange learning group number: 478030634 inside will share some senior architects recorded video video: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principle, JVM performance optimization, distributed architecture and so on these become architects necessary knowledge system. I can also get free learning resources, which I benefit a lot from now
We think the article is still a little help to you, you can click on the following TWO-DIMENSIONAL code to pay attention to. The “Java Rotten Pigskin” public account is not only about Java technology knowledge, but also about interviews and a lot of architecture. Everyone pay attention to it! Pay attention to rotten pig skin, you will learn more…………..
Original: https://blog.csdn.net/yunzhaji3762/article/details/82193351