Previously: I took the advantage of my spare time before Spring Festival to write a JavaScript interpreter that supports ES5 syntax from zero. The name is ES: I wrote a JavaScript interpreter that supports ES5 syntax from zero.

Today is the last day of the Spring Festival holiday (thank your company for giving us an extra day off! I also rushed to write an unsatisfactory garbage recycling module for ES. Here I would like to share with you the selection ideas at that time and some gains in the implementation process. Because the content is more, after writing an article, we are more tired, so we split it into two pieces. This article will cover some of the basics of garbage collection design, and the next one will cover some of the more interesting aspects of the implementation.

I. Introduction to garbage recycling

First, a brief description of what garbage collection is and what common options are available. Friends who have a certain understanding of this convenience can jump directly to the third section. Due to the limited capacity and space, I can only mention it briefly. If you want to enter the door in a more systematic way, you are recommended to read these two books:

  • The Garbage Collection Handbook
  • Garbage collection algorithm and implementation

The former is a more comprehensive theory, while the latter is an early design of some classic VM (after all, the original was 10 years old, and the V8 was mentioned in the book while the “new browser” Chrome was still being introduced to everyone). Carry out a more detailed analysis.

Garbage collection is a strategy for automatically managing dynamic memory, and it’s also a nicer name (sort of like ARTIFICIAL intelligence for machine learning…). . In some languages, the programmer needs to manually manage the memory allocated on the heap. For example, in C, malloc allocates memory and frees it with free. In C++, allocate with new and recycle with delete. Manual management is very flexible, but also very complex, to the majority of programmers brought a lot of mental burden, because it may accidentally miss a certain segment of memory release, resulting in memory leaks. To make life easier for everyone, many high-level languages propose that the language’s virtual machine/runtime (VM) do the memory allocation and recycling, freeing up the mind to write requirements. For example we run a javascript function like this:

function order(n) {
  var s = ""
  for (var i = 0; i < n; i++) {
    s += i
  }
  return s
}
Copy the code

The result of order(5) will be the string “01234”, and the string “” 0″, “01”, “012”, “0123” will also be generated during the run. After calling this function, it is obvious that we cannot access these five strings, and such unreachable variables are called garbage. Recycling, as the name suggests, is what to do with them.

There are four basic garbage collection algorithms: Mark and Sweep, Mark and Compact, duplicate garbage collection, and reference counting. All other methods are based on combinations and variations of them. I’ll also briefly describe what they do.

Mark and Sweep

As you can see from the name, mark and sweep are divided into two steps: mark and sweep. . The idea is very simple. We can recycle all unreachable variables by starting from some root node, such as JavaScript’s Global Object, and then going through the heap to recycle all unmarked variables. For example, if the row below is a heap, it can’t allocate a new variable 7 after storing variables 1 to 6 in the heap:

| | 1 | 2 | 3 | 4 5 6 | | | | heap start heap end 7 | | not put...Copy the code

Mark and Sweep first iterates to see which variables can be accessed, such as numbers 3 and 5, and marks them:

Mark
|| 1 |  2  |---3---| 4 |---5---| 6 | |
heap start                           heap end
Copy the code

Then recycle all unmarked variables:

Sweep
|          |---3---|   |---5---|     |
heap start                           heap end
Copy the code

Then clear the mark for the next collection and put 7 in:

||  7  |   |   3   |   |   5   |     |
heap start                           heap end
Copy the code

It can be seen that mark and Sweep has an obvious feature that variables are not moved before and after the collection, and 3 and 5 remain in their original positions. We call this garbage collection algorithm that does not move variables as non-moving GC. The advantage of not moving is that it’s easy to implement, because you don’t need to adjust the pointer, which address we read data from before recycling, and which address we read data from after recycling; The downside is memory fragmentation, which means that if the size of the 7 we want to insert is as large as the following example, the total free space is large enough, but there is not enough contiguous space to allocate.

5 | | | 3 | | | heap start heap end | | 7 still can not let go...Copy the code

The Mark and Compact algorithm follows this idea.

Mark and Compact

As you can see from the name, mark and Compact still have 2 steps: the first step is markup, as above; The second step is to compact, which compacts the remaining variables together. Using the example above, compression moves both 3 and 5 to the bottom of the heap:

Compact
||---3---|---5---|                   |
heap start                           heap end
Copy the code

After this compression, we can drop the larger 7:

||   3   |   5   |     7     |       |
heap start                           heap end
Copy the code

Mark and Compact reduces memory fragmentation and improves heap space utilization through moving GC. However, it generally doesn’t have a good throughput compared to other methods, not because of the memory copy that comes with moving, but because the Mark and Compact implementation needs to traverse the heap more times than Mark and Sweep. This is partly because moving causes us to update all references to the variables being moved. This is usually done by using the Mark and Sweep garbage collection algorithm, when memory fragmentation is high enough, we call Mark and Compact.

Note: for the sake of simplicity, only the sliding order is introduced, and there are some other compression methods, interested friends can go to the above two books.

Copying GC

Copy garbage collection takes a more counterintuitive approach. It splits the heap into two parts, called from space and to space. The VM only allocates memory to to space. If to space is full, it copies accessible variables into FROM space and swaps from space and to space (flip). For example, the initial state is as follows:

Tospace | | 1 | 2 | 3 | 4 5 6 | | | | heap start heap end fromspace | | | | 7 can not let go...Copy the code

Copy accessible 3 and 5 to from space:

Copy
tospace   || 1 |  2  |   3   | 4 |   5   | 6 | |
          heap start                           heap end
fromspace ||   3   |   5   |                   |
Copy the code

Swap and empty from space:

Copy
fromspace |                                    |
          heap start                           heap end
tospace   ||   3   |   5   |                   |
Copy the code

Finally, insert 7:

Copy
fromspace |                                    |
          heap start                           heap end
tospace   ||   3   |   5   |  7  |             |
Copy the code

Obviously, the problem with copying a GC is that half of the heap space is directly sacrificed. In fact, it is this sacrifice that makes the implementation of the replicated GC more efficient and has some advantages in the design of the heap structure. In the case of relatively large heap space, can have better performance. I personally like copying the GC algorithm because it makes it clear that we should always consider trade off in system design, and that sometimes it might be a good idea to just sacrifice half of the memory.

Reference counting

All three algorithms have a feature that they pause the user program for garbage collection when stack space is running out, and then resume the user program. We call this pattern stop-the-world GC.

After all, programmers aren’t stand-ins (at least not all of them…). , so the problem of stop-the-world GC is that there will be obvious lag during GC, which will bring big trouble to applications with high real-time requirements like games. To reduce the latency of gc, incremental GC (which collects only a small part at a time), concurrent GC (which allows garbage collection to run concurrently) and other dark arts have been introduced. Those contents basically have nothing to do with the simplicity in the title of this article, so I will not expand, interested friends can read handbook and the paper ~

Reference counting bypasses this problem directly. So the idea is, we just add a counter to each variable to keep track of how many places it’s referenced, and if that count goes to zero, that means there’s no place to reference it, that means we can’t access it, and then we can recycle. Because each variable is counted separately, memory reclamation can be spread out over the entire program run time. The simultaneous reference counting algorithm can be implemented independently of the runtime. The best example of this is C++ ‘s pointer only.

But as you can imagine, there is no such thing as a free lunch, and reference counting has its problems. For the simplest version of reference counting, it can’t handle circular references, like:

a.b = b; b.a = a;
Copy the code

The reference count for both a and B cannot be cleared…

Another tricky issue is that reference counting converts all read operations to write operations, which conflicts with Linux’s copy on Write mechanism and can cause problems in multi-process scenarios. At the same time, all operations add an addition. Of course, with the painstaking efforts of researchers, there are some solutions to these problems, but it is generally considered that the throughput of reference counting is relatively poor.

Generational GC

The above four basic algorithms are introduced. We will see that they have different strengths, so is there any way to combine them so that they have different strengths? And actually there is, and that combination is called generational GC, or generational GC. This generation is based on the important observation that most variables are temporary and do not survive a SINGLE GC (most objects die young). So we can divide the variables into old and new generations, and each time we recycle the new ones, we can see if the new ones make enough space, and if not, we can recycle the old ones. At the same time, the variables that have survived once or twice are moved from the new space to the old space. It feels like stalingrad is the leader of the army for three days.

With this partition, you can make better use of the basic algorithm. For example, copying gc is good for a relatively empty heap, and not good for long-lived variables because they are copied every time they have passed for generations. The Mark and Sweep, Mark and Compact combination is better for handling old variables, especially mark and Compact, which can squeeze old variables tightly to the bottom of the heap. So many generational GCS will use a replication GC in the new generation and a Combination of Mark and Sweep and Compact in the old generation.

Note: Some of the more abnormal VM will be divided into more generations…

Second, reactor structure design

Many articles cover these algorithms, but they often leave out a key part of garbage collection design: the design of the heap. Since garbage collection means that we allocate and reclaim memory ourselves, it is not natural to use malloc to fetch a bit of memory every time we allocate (perhaps non-moving GC can do this… But the estimated performance will also be more difficult…) . For simplicity, we can think of our heap as a contiguous chunk of pre-allocated memory, and the design of the heap depends on what data structure we use to represent this chunk of memory, or how we allocate it. In fact, there are only two options open to us, sequential allocation and free-list allocation.

Sequential distribution

Sequential, which means that the order is assigned closely one by one, pseudo-code is approximately:

sequentialAllocate(n):
  result = free
  newFree = result + n
  if newFree > heap_end
    return null
  free = newFree
  return result
Copy the code

This allocation works well, but it doesn’t work well for Mark and Sweep, because mark and Sweep creates so many holes in this continuous allocation that we have to use extra space to track where those holes are. Copying the GC is a perfect match, because copying the GC empties fromspace every time, so you just set free to the bottom of the new tospace.

Free – distribution List

Mark and Sweep is good for organizing free memory in a linked list. Take the example above:

|    x1    |   3   | x2|   5   |  x3 |
heap start                           heap end
Copy the code

The free x1, x2, and x3 forms a free list:

x1 ---> x2 ---> x3
Copy the code

When allocating 7, you can iterate through the free list, find the first one that is larger than variable 7, and use that free list node for allocation. If the node is larger, you can still split the rest of the free list and put it back into the free list. The pseudocode is as follows:

firstFitAllocate(n):
  prev = addressOf(head)
  while true:
    curr = next(prev)
    if curr == null:
      return null
    else if size(curr) < n:
      prev = curr
    else:
      return listAllocate(prev, curr, n)

listAllocate(prev, curr, n):
  result = curr
  if shouldSplit(size(curr), n):
    remainder = result + n
    next(remainder) = next(curr)
    size(remainder) = size(curr) — n

    next(prev) = remainder
  else:
    next(prev) = next(curr)
  return result
Copy the code

Note: first fit, Next Fit, Best Fit and other node selection methods are used here.

The complexity of searching in linked lists is high, so a common optimization is to divide the linked list into several different lists according to the size of memory. In this way, when allocating, you only need to find the linked list with the corresponding memory range. In addition, the linked list with relatively small node memory will set the same size of all nodes. This reduces the complexity of the search from O(n) to O(1). This optimization is known as Segregated List (Segragated Storage). Those of you who are familiar with memory pool design may have noticed that this approach is very similar to a slab memory pool.

However, when collecting/releasing slabs, the free list of garbage collection is different from that of slab. For manual memory management, only one node is reclaimed at a time, so you need to consider where the reclaimed node is placed in the linked list and how it is merged with other nodes. However, for the garbage collection scenario, since a large number of garbage collection will be carried out each time, the Free List can be directly rebuilt during the collection, without considering the merging of nodes. In this way, the order of linked list nodes can be easily guaranteed to be consistent with the order of their corresponding memory.

Summary of famous JavaScript VM garbage collection methods

After understanding the basic algorithm of garbage collection and the design way of stack, before deciding on the design of ES, we need to see how previous people have done it. Here’S a quick look at four of the more mainstream JavaScript VMS: V8, JavaScriptCore (JSC), Hermes, and QuickJS.

V8

V8’s garbage collection design is classic: a generational GC, a generational replication algorithm, a generational Mark and sweep and compact. A number of optimizations have been made on this basis, such as the addition of concurrent GC, etc. The impression is that the design should have learned from Java HotSpot VM.

JSC

One notable feature of JSC’s garbage collection is that it is a non-moving GC. Segragate Storage-based Mark and Sweep is used for objects smaller than 8 KB. The big Object blog says it uses Malloc directly, but we don’t know how they recycle it. Maybe it’s also separate Mark and Sweep. As mentioned in the JSC blog post, because JavaScript is not a particularly fast language, gc speed is often not a bottleneck, and the purpose of non-moving GC is to facilitate OSR and various Bytecode /JIT optimizations.

Hermes

Hermes is an interpreter made by Meta for ReactNative to replace the JSC previously used. Hermes’s GC design is similar to V8’s, with generational GC of old and new generation 2. One of the interesting aspects of Hermes is that it is mobile-focused, so there is no way to allocate gigabytes of continuous memory like on a PC, so there is a lot of discussion on the heap design.

QuickJS

Following the QuickJS GC trend, Fabrice Bellard chose reference counting in a big way. The reasons given are to reduce memory usage and increase certainty. I guess it’s to keep it lightweight and to prepare for some embedded applications that need real time? I dare not presume on the thoughts of the great god…

4. What was I thinking when I designed GC for ES

Okay, finally, this is the section that discusses the actual design of ES. Design is always based on requirements, and in the short term ES will be in a happy state, with few third parties using it. Therefore, my main requirement for ES is that I can learn some classic designs in the process of writing. On the premise of this, I can make some convenient assumptions and do not need to consider the peculiar underlying hardware structure of Hermes.

At the same time, because I’m the only one writing it, I try not to have as many tuning areas as possible, so I try to avoid segregate lists that have a lot of magic numbers dividing memory. The alternative is probably 95% at best, but 80% is easier.

Moreover, I hope that this design has some continuity, that is, if I have energy, I can further optimize the design or algorithm on it, so that I can experience more things.

So I decided to implement a generational GC of V8. New space uses replication GC and sequential allocation, while old space uses Mark and Sweep and Compact and free List allocation. The nice thing about this is that I can implement it bit by bit, in stages, rather than having to start writing thousands of lines and then debug them bit by bit. A rough road map:

  1. Implementing independent Copying GC;
  2. Implement separate Mark and Sweep;
  3. Implement 1 + 2 GENERATIONAL GC;
  4. If necessary, join Compact.

So far I’ve implemented 1 and 2, and I’ve tested that GC is not a big bottleneck at the moment, so I might push back 3 a little bit and fully implement this GENERATIONAL GC when the performance of the other parts of the interpreter comes up. As weiqi inside often say, do not finalize the design prematurely, retain a taste in the local.

I personally have some concerns about the impact of moving GC on JIT OSR as mentioned in the JSC blog, but I feel that my knowledge is still a long way off, and that it will come to a head when it comes to reference counting. I really don’t want the whole cpython design to be the same…

Imperceptibly the article has reached 9000 words… I’ve learned a lot in implementing gc 1 and GC 2, which I’ll discuss in the next article.

Finally, I am used to asking for a star~ to support the artisans who are happy in the picture below

zhuzilin/es

Reference

References are public documentation and source code for the two books and the VM mentioned above. Thank you for your sincere sharing!