Golang’s GC has been criticized since its inception, but over the years, It has improved a lot and become very good.
Here are the milestones of the Golang GC algorithm:
- V1.1 STW
- V1.3 Mark STW, Sweep parallel
- V1.5 three-color labeling
- V1.8 hybrid write barrier
There are three classical GC algorithms: Reference counting, Mark and Sweep, and Copy and Collection.
Golang’s GC algorithm is mainly based on the Mark and Sweep algorithm, and made improvements on this basis. Therefore, this article mainly introduces the Mark and Sweep algorithm. Reference counting and copy and collection can be customized.
Mark And Sweep algorithm
This algorithm has two main steps:
- Mark (Mark phase)
- Remove (Sweep phase)
The first step is to find the reachable object and mark it. The second step is to reclaim unmarked objects.
It’s very simple to do, but there’s one caveat: Mark and Sweep requires the program to pause while executing! Stop the world. Which means the program is stuck somewhere during this time. So the Chinese translation is caton.
Let’s look at the diagram:
Start tag, program pause. The relationship between a program and an object looks like this:
Then the tag starts, and Process finds all its reachable objects and marks them. As shown in the figure below:
After the tag is finished, we can start to clear the unmarked objects:
Then the garbage is removed and it looks like this.
Finally, stop the pause and let the program continue running. The cycle then repeats the process until the end of the process lifecycle.
What’s wrong with the Mark And Sweep algorithm?
This algorithm is very simple, but there are still some problems:
- STW, stop the world; Let the program pause, the program appears to stall.
- The tag requires the entire heap to be scanned
- Clearing data creates heap debris
The most important problem here is that the Mark-and-sweep algorithm pauses the entire program.
How does Go confront this problem?
Three color concurrent labeling
Let’s take a look at the general flow of Golang’s three-color notation.
First: The objects created by the program are marked white.
Gc start: Scan all reachable objects and mark them in gray
Find the reference object in the gray object and mark it as gray, and mark the gray object itself as black
Monitor memory changes in the object and continue with the previous step until the greyed object does not exist
At this point, GC reclaims the white object.
Finally, change all the black objects to white and repeat the above process.
Ok, so that’s the general process. Let’s Go back to the question: How does Go solve the STW (stop the world) problem in the Mark and Sweep algorithm?
How can GC and user logic operate in parallel?
The STW(Stop the World) operation of the Mark and Sweep algorithm means that Runtime freezes all threads. All freezes mean that user logic is suspended. None of the objects will be modified, so it’s perfectly safe to scan.
How does Go shorten this process? The mark and Sweep algorithm consists of two parts of logic: mark and sweep. We know that there are only black and white objects left in Golang’s trichromatic notation. The black object is the object that will be used again after the program is restored. If the black object is not touched and only the white object is cleared, the program logic will not be affected. So: the clear operation and user logic can be concurrent.
Token operations are also concurrent with user logic. User logic often generates objects or changes references to objects. How can tokens and user logic be concurrent?
What does GC do when process generates new objects? Won’t it be messy?
Let’s look at the following diagram. In this state, the process program is regenerated as an object, and we expect it to look like this:
But this is obviously wrong, because following the steps of the three-color notation, the newly generated object A will eventually be cleared, which will affect the program logic.
Golang solves this problem by introducing a write barrier. Write barrier: The write operations before the barrier are perceived by other components of the system before the write operations after the barrier. Generally speaking: during the gc run, you can monitor the memory changes of an object and re-mark the object. (Which is actually a super-short STW, and then tags the object)
In this case, all newly generated objects are gray! As below:
So what does Golang do when the reference to a gray or black object is changed to a white object?
As you can see below, a black object refers to a white object that was once marked.
At this point, the write barrier mechanism is triggered, sending a signal to the GC, which rescan the object and greys it.
Therefore, once GC starts, either the object is created or the reference to the object changes, it greys out first.
References:
- Golang’s Real-time GC in Theory and Practice
- Golang’s realtime garbage collector
- Golang garbage collection profile