preface

Hello, I’m Lin Sanxin. Two days ago, I accidentally saw a video about V8 garbage collection mechanism on B website. I was interested to watch it and found it a little difficult to understand. So I wondered if you were as confused as ME about V8 garbage collection mechanism, or if you had seen this knowledge but could not understand it. So, I thought about it for three days, thinking about how I could use the most popular words to talk about the most difficult points.

Common understanding

I’m sure most of you have been asked in interviews, “Tell me about V8 garbage collection.”

At this point, most of you will say something like, “There are two ways to collect garbage, one is by reference, the other is by notation.”

Reference method

It is to judge the number of references to an object. If the number of references is 0, it is reclaimed. If the number of references is greater than 0, it is not reclaimed. Look at the following code

let obj1 = { name: 'Lin SAN Xin'.age: 22 }
let obj2 = obj1
let obj3 = obj1

obj1 = null
obj2 = null
obj3 = null
Copy the code

There is a downside to the reference method. After the following code is executed, both Obj1 and Obj2 are supposed to be reclaimed, but because they reference each other, they each have a reference number of 1, so they are not reclaimed, causing a memory leak

function fn () {
  const obj1 = {}
  const obj2 = {}
  obj1.a = obj2
  obj2.a = obj1
}
fn()
Copy the code

notation

The notation is that reachable objects are marked, and unreachable objects are garbage collected.

So the question is, is unreachable? By what? (The reachable here, not the reachable duck)

Anyway, if you want to judge reachability, you have to say reachability. What is reachability? Start with the pointer to the original root object (window or global) and search down the child node. When the child node is found, it indicates that the reference object of the child node is reachable and marks it. Then the recursive search continues until all the child nodes have been traversed. So it’s not traversed to the node, it’s not marked, and it’s treated as if it’s not referenced anywhere, which proves that it’s an object that needs to be freed and can be collected by the garbage collector.

/ / can reach
var name = 'Lin SAN Xin'
var obj = {
  arr: [1.2.3]}console.log(window.name) / / Lin three hearts
console.log(window.obj) // { arr: [1, 2, 3] }
console.log(window.obj.arr) / / [1, 2, 3]
console.log(window.obj.arr[1]) / / 2

function fn () {
  var age = 22
}
/ / inaccessible
console.log(window.age) // undefined
Copy the code

The general understanding is not enough, because there is more to garbage collection (GC) than just these two algorithms. To learn more about V8 garbage collection, read on!

JavaScript memory Management

In fact, JavaScript memory process is very simple, divided into three steps:

  • 1. Assign toThe userMemory required
  • 2,The userTake that memory and use that memory
  • 3,The userThe memory is no longer needed. Free it and return it to the system

So who are these users? Here’s an example:

var num = ' '
var str = 'Lin SAN Xin'

var obj = { name: 'Lin SAN Xin' }
obj = { name: Fat Man Lin }
Copy the code

Num, STR, obj, obj, obj, obj, num, obj, obj

  • Basic data types: has a fixed size, with values stored inStack memory, which can be accessed directly by the value
  • Reference data type: size is not fixed (attributes can be added),Stack memoryThere’s a pointer in there, pointing toHeap memoryIs accessed by reference

  • Since the size of the underlying data type stored in the stack memory is fixed, the stack memory memory isThe operating system automatically allocates and releases reclaimed
  • Because the size of the heap memory is not fixed, the systemUnable to release the collection automatically, so you need toJS engine to manually free this memory

Why garbage collection

In Chrome, V8 has limited memory usage (about 1.4g /1464MB for 64-bit and about 0.7G/732MB for 32-bit), why limit it?

  • Superficial reason: V8 was originally designed for browsers, so you’re unlikely to encounter scenarios that use a lot of memory
  • Deep reason: LIMITATIONS of V8’s garbage collection mechanism (if cleaning up a lot of memory garbage is time consuming, which causes JavaScript threads to pause execution, performance and application plummets)

Said in front of the stack in the memory, the operating system will automatically for memory allocation and release of memory, and memory in the heap, by JS engines (such as Chrome’s V8) manually release, when we don’t have written in the correct code, will make the garbage collection mechanism of JS engine can’t the right to release the memory (leak), As a result, the browser consumes more and more memory, which degrades JavaScript, applications, and operating system performance.

V8’s garbage collection algorithm

1. Generational recycling

In JavaScript, there are two cases of object lifetime

  • Life cycle is short: after one garbage collection, it is released for recycling
  • It has a long life: after many collections, it still exists and doesn’t leave

So the problem comes, for a short life cycle, recycling is calculated, but for a long life cycle, many recycling can not be recycled, knowing that recycling can not be recycled, but also continue to do recycling useless work, is that not very consumption performance?

For this problem, V8 has done the optimization method of generational recycling, which is generally said: V8 divides the heap into two Spaces, one is called the New generation, the other is called the old generation, the new generation is the place to store the objects with short life cycle, the old generation is the place to store the objects with long life cycle

The Cenozoic is usually only 1-8m large, while the old generation is much larger. For each of these regions, V8 uses a different garbage collector and a different collection algorithm to implement garbage collection more efficiently

  • Auxiliary garbage collector + Scavenge algorithm: Mainly responsible for the new generation of garbage collection
  • Main garbage collector + Mark-sweep && Mark-Compact algorithm: Mainly responsible for garbage collection of the old generation

1.1 the new generation

In JavaScript, the memory allocated for any object declaration will be placed in the new generation first, and since most objects live in memory for a short period of time, a very efficient algorithm is required. In the new generation, the Scavenge algorithm is mainly used for garbage collection. Scavenge is a typical algorithm to sacrifice space for time. It is very applicable to scenarios that take up little space.

The Scavange algorithm divides the new generation of the heap into two parts, called from-space and to-space. It simply copies the live objects from the from-space to the to-space and places them in order. Then, the memory of the inactive objects in from-space is freed. When finished, the from space and to space are interchanged, so that the two areas can be reused in the new generation.

The specific steps are as follows:

  • 1. Mark live and inactive objects
  • 2, copy,from-spaceOf the live object toto-spaceAnd to sort
  • 3, removefrom-spaceThe inactive object in
  • 4, will befrom-spaceandto-spaceTo swap roles so that the nextScavenge algorithmThe garbage collection

So how does the garbage collector know which objects are live and which are not?

Which brings us to one thing: accessibility. What is accessibility? Start with the pointer to the original root object (window or global) and search down the child node. When the child node is found, it indicates that the reference object of the child node is reachable and marks it. Then the recursive search continues until all the child nodes have been traversed. So it’s not traversed to the node, it’s not marked, and it’s treated as if it’s not referenced anywhere, which proves that it’s an object that needs to be freed and can be collected by the garbage collector.

When does the object of the new generation become the object of the old?

In the Cenozoic, there are further subdivisions. It is divided into two regions: the nursery generation and the intermediate generation. When an object is allocated memory for the first time, it is allocated to the nursery generation in the new generation. If the object still exists in the new generation after the next garbage collection, then we move the object to the intermediate generation. After the next garbage collection, if the object is still in the new generation, the secondary garbage collector will move the object into the old generation, a process called promotion

1.2 the old generation

New generation space objects, battle-hardened, left over from the old object, successful promotion in the old generation space, because these objects are after many recycling process but not be recycled, are a group of tenacious vitality, high survival rate of object, so the old generation, recovery algorithm unfavorable use Scavenge algorithm, why, for the following reasons:

  • Scavenge algorithmIs it the replication algorithm, which replicates these highly survivable objects over and over again, which makes no sense, which is extremely inefficient
  • Scavenge algorithmIs space for time algorithm, old generation is a large memory space, if the useScavenge algorithmIt is a waste of space resources.

So the old generation used mark-sweep and Mark-Compact

Mark-Sweep

Mark-sweep is divided into two stages: the marking and the cleaning stage. The former Scavenge algorithm also has marking and cleaning. However, the difference between Mark-Sweep and Scavenge is that the latter needs to be copied and then cleaned. I just do the cleanup.

  • Marking stage: the first scanning of old generation objects is carried out, and the active objects are marked
  • Cleanup phase: The second scan is performed on the old generation objects to clear the unmarked objects, that is, the inactive objects

As I think you can see from the figure above, there is a problem: clearing inactive objects leaves a lot of empty space.

Mark-compact

What’s the harm in mark-Sweep’s garbage collection algorithm leaving a lot of empty spots? If a large object comes in at this time, it needs to allocate a large memory for the object. It first searches for a space in the scattered space, and finds that there is no space suitable for its size, so it has to put it in the last place. This process of searching for space is performance expensive, which is also a disadvantage of Mark-Sweep algorithm

This is when mark-Compact, an enhanced version of Mark-Sweep, was introduced. In addition to Mark-Sweep, the cleaning phase was added. Every time the inactive objects were cleaned, the remaining active objects were cleaned to one side of the memory

2. Stop-the-world

After talking about V8 generational recycling, let’s talk about a problem. The JS engine is used to run JS code, and the JS engine is used for garbage collection. What if the two are in conflict? The answer is that garbage collection takes precedence over code execution, which stops code execution and executes JS code after garbage collection is complete. This process is called total pausing

Because the Cenozoic space is small, and fewer surviving objects, and cooperate with the Scavenge algorithm, the pause time is short. But the old generation is not the same, in some cases when there are more active objects, the pause time will be longer, making the page appear lag phenomenon.

3. The Orinoco optimization

Orinoco is the project name of V8 garbage collector. In order to improve the user experience and solve the complete pause problem, it proposes incremental marking, lazy cleaning, concurrency, and parallel optimization methods.

3.1 Incremental marking

We’ve been emphasizing tag first, clear later, and the incremental markup is optimized during the tag phase. Let me give you a vivid example: there are a lot of garbage on the road, so that passers-by can not walk, need cleaners to clean up before walking. A few days ago, there was less garbage on the road, so the passers-by waited for the cleaners to clean up all the garbage before they passed, but a few days later, the garbage became more and more, the cleaners took too long to clean up, the passers-by could not wait, and said to the cleaners: “you clean a section, I will walk a section, so high efficiency.”

We put the above example, the garbage cleaning process – marking process, passers-by – JS code, one to one to understand. When there is a small amount of garbage, the incremental markup is not optimized, but when there is a certain amount of garbage, the incremental markup is turned on: a point is marked, and the JS code runs for a period, so as to improve efficiency

3.2 Lazy Cleaning

As stated above, incremental marking is only for the marking phase, whereas lazy cleaning is for the cleaning phase. After the increment mark, when it comes time to clean up the inactive object, the garbage collector finds that there is still enough space left for the JS code to run even if it is not cleaned up, so it delays the cleanup, either by letting the JS code execute first, or by only cleaning up part of the garbage, but not all of it. This optimization is called lazy cleaning

The emergence of collation markers and lazy cleanup greatly improves total pausing. But the problem is here: incremental tag is the tag, JS for a period of operation, and that if your front foot just mark an object as active objects, hind feet JS code set this object to the inactive objects, or, in turn, front foot does not mark an object as active objects, hind feet JS code set this object to the active object. In summary: interspersed between markup and code execution, object references may change and markup errors may occur. This requires the use of write barrier techniques to document changes in these reference relationships

3.3 Concurrent (Concurrent)

Concurrent GC allows garbage collection to take place at the same time without suspending the main thread. Both can take place at the same time, with only occasional pauses to allow the garbage collector to do special operations. However, this approach also faces the problem of incremental collection. During garbage collection, as the JavaScript code is executing, references to objects in the heap may change at any time, so write barriers are also required.

3.4 the parallel

Parallel GC allows the main thread and the helper thread to perform the same GC work at the same time, so that the helper thread shares the GC work of the main thread, so that the garbage collection time is equal to the total time divided by the number of participating threads (plus some synchronization overhead).

V8’s current garbage collection mechanism

In 2011, V8 applied the incremental markup mechanism. Until 2018, Chrome64 and Node.js V10 started Concurrent and added Parallel technology on top of Concurrent, which greatly shortened garbage collection time.

Secondary garbage collector

V8 uses the parallel mechanism in the new generation of garbage collection. In the sorting phase, that is, when the active object is copied from the from to space to, multiple helper threads are enabled to organize in parallel. Due to multiple threads to a new generation of heap memory resources of the competition, is likely to have an activity object by multiple threads copy operation problem, in order to solve this problem, V8 in the first thread to copy and copy after the completion of the event object, must be to maintain replicated object pointer forwarding address after this activity, So that other helper threads can find the live object and determine if the live object has been copied.

Main garbage collector

V8 in old generation garbage collection, the Concurrent flag task is enabled if the memory size in the heap exceeds a certain threshold. Each worker thread keeps track of the pointer to the object and the reference to that object for each token. Concurrent tokens are also carried out in the helper process in the background while JavaScript code is executing. When an object pointer in the heap is modified by JavaScript code, Write Barriers technology tracks worker threads as they make concurrent markers.

When concurrent mark complete or dynamically allocated memory to limit of time, the main thread will perform the final mark step quickly, this time the main thread will hang, the main thread will once again to scan the root set to ensure that all the objects are done tag, as auxiliary thread has been tag object, the main thread of the scanning is to check the operation, When the validation is complete, some helper threads will clean up the memory and some helper processes will clean up the memory. Since both are concurrent, it will not affect the execution of the main thread JavaScript code.

The resources

  • Learn more about Chrome V8 garbage collection
  • Do you really know anything about garbage collection

conclusion

Read this and you won’t have to say, “Quote and mark” the next time an interviewer asks you. But can be more comprehensive, more detailed to conquer the interviewer.

There will be a future article on memory leaks in the project, stay tuned!!

Study group please click here, study together, fish together!!