Garbage collection is the process of clearing out unused data from limited memory to make room for new data. The challenges are twofold: identifying the data that is no longer useful, and how and how often to clean it up.

While watching Li Bing explain how the V8 engine works, the design optimization of the V8 garbage collection system is more complicated and interesting than expected. Here are some notes I put together after some extra research:

Rubbish that needs to be recycled

First, distinguish between v8’s two types of memory space:

  1. Stack space – call stack. Stores execution context, raw data types (string, number, and so on), and object references.
  2. Heap – The type and value of an object that is stored.

There are other divisions like code space, but I don’t want to go into details that I found inconsistent.

About the pile of wordy two sentences to help understand:

  • A tree structure for rough prioritization.
  • The highest or lowest priority value always remains at the root. You need to adjust the tree structure to add and delete nodes.
  • The maximum number of child nodes is fixed.
  • It actually uses contiguous paging space in storage. That’s why the heap usually has a string representation, as shown below:As with the previous point, in this string you can calculate the position of each node based on the hierarchy and number of child nodes.

So when we talk about garbage collection, we’re talking about cleaning up the heap space.

The overall v8 garbage collection process

V8’s were designed to find that it wasn’t productive to treat all memory The same way, so they were optimized under something called The Generational Hypothesis. The hypothesis classifies the data objects in memory into classes:

  • Immortal objects — only a few
  • And objects that only exist for a short time — most

Accordingly, the heap space is divided into two Spaces:

  • New Space – small space (about 1~8MB can be set and changed), high cleaning frequency, use sub-garbage collector.
  • Old space – Large space, low cleanup frequency, main garbage collector.

The two collectors use different algorithms, but the overall process can be summarized as follows:

  1. Check for live and recyclable objects in the tag space;
  2. Reclaim memory occupied by objects;
  3. Reduce memory fragmentation (remember that computer memory is continuous paging space, and object storage usually takes up contiguous space, too much fragmentation for large objects will not be used).

Minor GC for New areas

The deputy garbage collector uses a method called Scavenger and an Algorithm called Cheney’s Algorithm.

This method splits the new area in half into From space and To space. New objects attempt to write to the object area, and if the area is found to be full, garbage cleaning is triggered.

The specific process is as follows:

Main Garbage Collector for Old areas (Major GC)

The main garbage collector uses mark-sweap:

That is:

  1. Flag – Start at the root to traverse. Objects that still need to be referenced by the call stack and their referenced objects are marked as active, and the rest as garbage. The details are not yet clear.
  2. Clean – Marks empty blocks of memory occupied by objects marked as garbage.
  3. Collate – The remaining surviving objects move to one end and collate together.

So these operations are actually very slow. JS also uses only one main thread, so doing such a slow garbage collection operation will cause any other operations to fail to continue. V8 therefore introduces other optimizations, such as:

  • Incremental recycle – Cut the entire operation into countless small tasks, interspersed with user JS code. (This is similar to React Fibre task scheduling)
  • Concurrent tagging process – tagging with multiple worker threads, avoiding the main thread.
  • Concurrent cleanup and cleanup processes – like tags, use multiple helper threads instead of the main thread.
  • Lazy triggers the garbage cleanup process – delayed cleanup.

summary

It’s not easy to dump a garbage to the program 😄!

In fact, there are many details in the data so far only skim, waiting to be dug.

The resources

  • Memory Management in Programming
  • Visualizing memory management in V8 Engine (JavaScript, NodeJS, Deno, WebAssembly)
  • How browsers work and practice — Garbage collection