Compared with other languages, such as C/C ++, we all know that the Java virtual machine for the garbage generated in the program, the virtual machine will automatically help us to remove the management, while the language platform like C/C ++ requires the programmer to manually release the memory.

While this strategy of automatically helping us to recycle garbage takes some of the flexibility out of it, it saves a lot of work for the code writers and improves a lot of security. (As in C/C++, you can run out of memory if you create a large number of objects and forget to release them due to negligence).

What is garbage?

The virtual machine automatically helps us with garbage removal, so what kind of objects can we call garbage objects?

Suppose you create an object

Man m = new Man();
Copy the code

You’re referring to this object with a variable, and obviously with this object, you can use the variable M to manipulate this object, but after a while, you execute

m = null;
Copy the code

And there are no new variables to point to the object you just created. Do you think it’s useful at this point for an object that doesn’t have any variables to point to?

Obviously, for an object that is not pointed to by a variable, it is useless. It can only drift in the heap with the wind.

Therefore, for such an object, we can call it garbage, which will be disposed of by the garbage collector.

How do you know it’s already garbage?

If you write your own code, you probably know when the object should be discarded, and you can always make it junk.

But you are you, and the virtual machine is not that smart. So how does the virtual machine know?

As stated above, an object is garbage when no variable references it. Based on this principle, we can do something like this:

We can set a counter for this object with an initial value of 0. If a variable points to it, the counter increases by 1. If the variable no longer points to it, the counter decreases by 1. Then we can tell that if this counter is 0, it is junk, otherwise it is useful.

For this method, we call reference counting.

Well, let’s start by praising reference counting:

  1. Simple implementation.
  2. Efficient (a problem that can be solved by an if statement is difficult to solve efficiently).

I’m sorry, but here’s the fatal flaw.

In fact, this reference-counting method is very difficult to solve if it encounters objects that reference each other.

Let’s start with some code:

Man m1 = new Man(); Man m2 = new Man(); M1. instance = m2; // Suppose Man has instance m2.instance = m1; m1 = null; m2 = null; System.gc(); // Objects should be recycledCopy the code

M1 and m2 both point to null, so the two objects are useless and should be reclaimed. However, the two objects are not reclaimed because they have an instance of each other.

Is that a fatal flaw?

Therefore, the virtual machine does not use this reference-counting method.

Accessibility analysis

Do we have any other methods besides this method?

The answer is yes, it has to be. This method is known as accessibility analysis. Here’s how it works:

At the beginning of the program, a reference root node (GC Roots) is established and a reference graph is built. When we need to determine who is garbage, we can traverse from the root node. If the node is not traversed, it is garbage, otherwise it is useful. The diagram below:

This method can solve the problem of loop reference to each other, but it is not as efficient as the reference counting method, because you need to traverse the graph.

To summarize the algorithm to determine whether the object is garbage:

  1. Reference counting.
  2. Accessibility analysis.

When to do garbage collection

Some people might think this is a strange question and think it’s not a good idea to recycle when you see garbage. To which I can only say:

  1. You see a little trash in your room and you clean it up? Or do you wait until a certain point in time or when the garbage has accumulated to a certain amount?
  2. The virtual machine is not smart enough to immediately recognize that this object is junk. It has to go through all the objects to figure out which ones are junk.

So, you can’t just have a virtual machine go through all the objects in a few seconds (let’s say a few seconds is a lot of time).

The garbage collector suspends all threads while garbage collection is in progress. The program cannot respond to external requests. (Because if you think about it, when you’re doing accessibility analysis, those references are constantly changing, it’s not so hard).

And the frequent garbage collection, for some programs, is very affect the user experience, for example, you are playing a game, the system always pause, afraid that you are going to delete the game.

So, garbage collection is not triggered until a certain percentage of memory has been used. As for what the proportion is, it is probably artificial.

How to recycle?

When we mark what is garbage and want to recycle, how should we recycle it?

There may be some people feel strange, this is not simple, see it is garbage, not directly recycled.

In fact, this is not unreasonable, simple, direct recycling.

Yes, there is an algorithm that looks at what we’ve flagged as garbage and then recycles it. This algorithm is called the mark-clear algorithm.

The mark-clear algorithm works by marking all the objects that need to be recycled, and then collecting all the marked objects.

But don’t be complacent, because this method, while simple and violent, has a fatal drawback:

After the flag is cleared, a large number of discrete memory fragments will be generated. If there are too many discrete memory fragments, some large objects may not be stored. This will lead to the following two problems:

  1. Some memory is wasted.
  2. If the object is not saved, garbage collection will be triggered again.

Replication algorithm

In order to solve this problem, another algorithm has emerged – the copy algorithm. That is, it divides the available memory into two pieces by capacity. It then uses one block at a time, and when the block is running out, it triggers garbage collection, which copies all surviving objects into another block of memory, and then cleans up the entire block.

That way, there is no fragmentation problem.

We have to give it credit: not only did it solve the problem, but it was simple to implement and efficient to run.

But it also has its drawbacks. The drawbacks are obvious, see? If there are very few surviving objects each time, isn’t another chunk of memory almost unused? So it’s possible that the other half of the memory is almost useless. Memory is so precious, this is a serious problem.

Optimization strategy: I can tell you that there are studies that show that 98% of the objects die overnight, that is to say, very few objects survive each time. If we all know that there are very, very few surviving objects, why do we have a 1:1 ratio? Therefore, HotShot virtual machines are allocated in a 8:1 ratio by default. This way, there won’t be a lot of unused memory problems.

You might say, well, what if I run out of memory at 1/9? Isn’t there room for the living objects? In fact, when you run out of memory, you can borrow it from other places, such as old memory.

Here is an illustration of the new generation and the old age: to put it bluntly, the new generation is the object that has just been created, while the old age is the object that has lived for a long time. In other words, there are some objects that do live longer, for which we allocate additional memory for aging, and for garbage collection, we do not have to look for garbage objects every time, because the probability of these objects being garbage is relatively small.

Here are two other algorithms:

  1. Mark-clean: This algorithm is similar to the mark-clean algorithm, except that after the garbage is removed, the surviving objects are moved in one direction to defragment.
  2. Generation collection algorithm: the so-called generation is to divide objects into the old generation and the new generation, like the above. In the new generation, there are generally more dead objects in each garbage collection, while the old generation is less. Based on this relationship, we can adopt different algorithms to target.

Summarize several algorithms of garbage collection:

  1. Mark-clear algorithm.
  2. Copy algorithm.
  3. Mark-collation algorithm.
  4. Generational collection algorithm.

Finally, there are several garbage collectors

For garbage collection, do you want to run the rest of the program while doing garbage collection? Or do you want to collect all the junk before you execute the rest of the program? While the end result is the same amount of CPU time, there are differences between the two methods.

Here’s a brief overview of several garbage collectors to see which one they use.

(1). Serial collector

Serial, as you can see from the English word, is a single-threaded collector. That is, it must suspend all other threads while it does garbage collection. Obviously, sometimes garbage collection pauses are long, which can be very frustrating for users.

(2).ParNew

This collector is similar to Serial in that it suspends all other threads for garbage collection, but it can work on multiple threads for garbage collection.

8. Parallel avenge

Parallel means parallel. Garbage collection can also be multithreaded, but it is different from ParNew. It tightly controls the ratio of time spent garbage collection to time spent executing other code. Let’s look at a noun: throughput.

Throughput = run user code time/(run user code time + garbage collection time).

That is, the Parallet Scavenge collector controls the throughput, which can be set manually.

The next two collectors are highlighted

** (4).CMS (Concurrent Mark Sweep) collector **

CMS collector is based on the “mark-clear” algorithm, its operation process is more complex than the previous several collectors, the whole process is divided into four steps, including:

  1. CMS Initial Mark
  2. CMS Concurrent Mark
  3. Re-marking (CMS Remark)
  4. CMS Concurrent sweep

The two steps of initial marking and re-marking still need to suspend other threads. But the other two steps can be executed concurrently with other threads. Initial marking only marks the objects that GCRoots can be directly associated with, which is very fast. Concurrent marking refers to the process of GCRoots Tracing (i.e. traversing the whole graph to find out the objects that are not present).

The re-marking stage is to correct the mark record of the part of the object whose mark changes due to the continuous operation of the user program during the concurrent marking period. The pause time of this stage is generally a little longer than the initial marking stage, but much shorter than the concurrent marking time.

Since the collector thread can work with the user thread during the longest concurrent markup and concurrent cleanup, the CMS collector’s memory reclamation process is almost concurrent with the user thread in general.

(5). The G1 collector

This is probably the best collector. The collector has the following characteristics:

  1. Parallelism and concurrency: The G1 takes full advantage of the multi-CPU, multi-core hardware of modern calculators, using concurrent or parallel methods to shorten the advantage of suspending other threads.
  2. Generational collection: It is similar to separating the new generation from the old generation.
  3. Space integration: adopt the characteristics of copy algorithm + mark-integration algorithm to recycle garbage. That is, the whole adopts the mark-collation algorithm and the part adopts the copy algorithm.
  4. Predictable pauses: This is great, that is, it allows the user to explicitly specify that no more than N milliseconds will be spent on garbage collection within a time segment of M milliseconds.

Its execution process is as follows:

  1. Initial tag.
  2. Concurrent tokens.
  3. Final tag.
  4. Filter recycling.

This process is similar to a CMS in that it suspends other threads at the initial tag and the final tag, but the other two processes can be executed concurrently with other threads.

What we said just now the G1 collector strengths, such as predictable pauses, it also makes the screening of recycling, waste of time is predictable pauses, that is to say, the pause time is the users themselves can control, it also makes the general case, at the time of screening of recycling, we will suspend another thread of execution, to use all the time in screening of recycling.

That’s all for now.

After the