Browser Principles series -V8 engine object storage optimization

Reprint please retain this part of the content, indicate the source. In addition, the front end team of Toutiao is looking forward to your joining us

Before starting this article, let’s consider a question: what is the complexity of accessing an object’s properties in JS? Is O (1)? If it’s O(1), why is it O(1)? Let’s continue with these questions.

How does V8 store JS objects

We know that JS is a dynamic language, which means that in the process of code execution, the type of variables is uncertain, can be changed, very flexible, which is also one of the characteristics of JS language. A problem with this is that when accessing a property of a JS object, we cannot directly calculate the storage location to access from the offset, which is easy to do in static languages. So how does JS store objects?

JS objects are stored in the heap, which is more like a dictionary, with strings as key names. Any object can be used as key values, which can be read and written by key names. However, V8’s implementation of object storage does not fully use dictionary storage, mainly for performance reasons. Because dictionaries are non-linear data structures, query efficiency is lower than linear data structures. V8 uses a complex storage strategy to improve storage and search efficiency. Linear structures are contiguous chunks of memory, such as linear tables and arrays, while nonlinear structures generally occupy discontiguous memory, such as linked lists and trees.

General properties and Sort properties

First let’s look at what general and sort properties are. Here’s a code:

function Foo() {
    this[100] = 'test-100'
    this[1] = 'test-1'
    this["B"] = 'bar-B'
    this[50] = 'test-50'
    this[9] =  'test-9'
    this[8] = 'test-8'
    this[3] = 'test-3'
    this[5] = 'test-5'
    this["A"] = 'bar-A'
    this["C"] = 'bar-C'
}
var bar = new Foo()

for(key in bar){
    console.log(`index:${key}  value:${bar[key]}`)
}
Copy the code

Let’s observe the result of this code:

index:1  value:test-1
index:3  value:test-3
index:5  value:test-5
index:8  value:test-8
index:9  value:test-9
index:50  value:test-50
index:100  value:test-100
index:B  value:bar-B
index:A  value:bar-A
index:C  value:bar-C
Copy the code

Although the properties are set out of order, for example, 100 is first set and then 1 is set, but the output content is very regular, generally reflected in the following two points:

The numeric properties set are printed first, in order of numeric size;
The string attributes will be printed in the same order as before. For example, if we set them in the order of B, A, and C, we will print them in the same order.

This result occurs because the ECMAScript specification defines that numeric attributes should be sorted in ascending order by index value size, and string attributes in ascending order by order of creation. Here we call the numeric properties in the object sort properties, which are called Elements in V8, and the string properties are called general properties, which are called properties in V8.

In V8, in order to effectively improve the performance of storing and accessing these two attributes, two linear data structures are used to store sorted attributes and general attributes respectively, as shown in the following figure:

As shown in the figure: The bar object contains two hidden properties: The elements property refers to the Elements object, where the sort properties are stored in order, and the properties property refers to the properties object, In the Properties object, the general properties are saved in the order they were created.

After splitting the two linear data structures, if an index is performed, V8 completes an index by reading all elements sequentially from the Elements property and then from the properties property.

Fast properties and slow properties

Storing different attributes in elements and properties simplifies the program, but it adds an extra step to finding elements, such as executing the bar.B statement to find the value of B’s property. In V8, the object properties pointed to by the properties property will be found first, and then the property B will be found in the properties object. This method adds an extra step in the search process, thus affecting the efficiency of the element search.

For this reason, V8 adopts a tradeoff strategy to speed up the efficiency of finding properties by storing some general properties directly into the object itself, which we call in-object properties, or fast properties.

However, the number of properties in the object is fixed, the default is 10, although regular properties store an additional layer of indirection, but can be expanded freely. Generally, attributes stored in linear data structures are called “fast attributes”, because attributes can be accessed only through indexes in linear data structures. Although the speed of accessing linear structures is fast, if a large number of attributes are added or deleted from linear structures, the execution efficiency will be very low. This is mainly because of the time and memory overhead. Therefore, if an object has too many attributes, V8 adopts another storage strategy, the “slow attribute” strategy, but the slow attribute objects have independent nonlinear data structures (dictionaries) inside them as property storage containers. All attribute meta-information is no longer stored linearly, but is stored directly in the attribute dictionary.

Let’s put it into practice and look at the following code:

Function Foo(property_num,element_num) {// Add indexable properties for (let I = 0; i < element_num; I++) {this [I] = element ` ${I} - ${I} `} / / add attributes for the let I = 0; i < property_num; I ++) {let PPT = '${I}-property${I}' this[PPT] = PPT} var bar = new Foo(10,10)Copy the code

At this point, the structure in memory is shown as follows:

At this point, the BAR object has 10 general properties, all upgraded to fast and no properties.

We change the argument to call Foo to:

Var bar = new Foo(100,10)Copy the code

At this point, the bar property has more than 10 general properties. Since the fast property has a limit of 10, you can see the properties property.

The data stored in the properties property is not stored linearly, but in a non-linear dictionary format, so the memory layout of the property looks like this:

10 Attributes are stored directly in objects of bar;
Ninety general properties are stored in the properties property as a data structure in a nonlinear dictionary;
The 10 numeric attributes are stored in the Elements property.

Hidden classes

As mentioned above, JS language is a dynamic language, unlike static language, which can query object attribute values by offsets, whether V8 can introduce this high query efficiency method?

The answer is that V8 does. One idea is to make objects in JavaScript static. V8 runs JavaScript on the assumption that objects in JavaScript are static. Specifically, V8 makes two assumptions about each object:

Once the object is created, no new properties are added;
Properties are not deleted after the object is created.

With these two assumptions in place, V8 can deeply optimize objects in JavaScript. How? Specifically, V8 creates a hidden class for each object. The hidden class records some basic layout information about the object, including: all the attributes contained in the object; The offset of each property relative to the object. Going back to the previous screenshot, you can see that each object has a map property under it, which is the hidden class.

With hidden classes, when V8 accesses an attribute in an object, it first looks for the offset of the attribute relative to its object in the hidden class. With the offset and attribute type, V8 can fetch the value of the attribute directly from memory without going through a series of look-up procedures. This greatly improves the efficiency of FINDING objects in V8.

We can combine this code to see how the hidden class works:

let point = {x:100,y:200}
Copy the code

When V8 executes this code, it first creates a hidden class for the Point object. In V8, the hidden class is also called a Map, and each object has a map attribute whose value points to the hidden class in memory. The hidden class describes the layout of the attributes of an object. It mainly contains the names of the attributes and the offsets corresponding to each attribute. For example, the hidden class of the Point object contains the x and Y attributes, with the offsets of x being 4 and Y 8.

In this diagram, the memory layout of point objects is shown on the left and the map of point objects is shown on the right. Once you have the map, when you use point. X again to access the x property, V8 will query the map for the offset of the X property relative to the Point object, and then add the offset to the starting position of the point object to obtain the location of the x property value in memory. So we have this position and we get the value of x, so we don’t have to do a very complicated search.

This is an effort to make a dynamic language static. V8 achieves the efficiency of a static language by introducing hidden classes that mimic the mechanics of a static language like C++. In V8, each object has a Map attribute whose value points to the object’s hidden class. However, if two objects have the same shape, V8 will reuse the same hidden class for them. This has two advantages:

Reducing the number of hidden classes created also indirectly speeds up code execution;
Reduced storage space for hidden classes.

So, when two objects have the same shape, it must satisfy the following two points: 1. 2. Equal number of attributes.

Rebuild the hidden class

About hidden classes before we have two assumptions, after the object creation does not add or delete attributes, but the JS is a dynamic language, in the process of execution, the shape of the object can be changed, if changed the shape of an object, hidden classes with will also change, this means that the V8 to rebuild new hidden object class of the new change, This is a significant cost to V8’s performance. In layman’s terms, adding a new attribute to an object, deleting a new attribute, or changing the data type of an attribute will change the shape of the object, triggering V8 to rebuild a new hidden class for the shape-changed object.

This explains why it is not recommended to use the delete keyword to delete an object’s attributes. Answer a question I’ve had for years

Best practices for object manipulation

We know that frequently changing an object’s property or the data type of its value can cause performance problems with channel reconstruction of hidden classes, so we can infer best practices for manipulating objects.

When initializing objects with literals, try to keep the order of attributes consistent.

Let’s take a look at 🌰 :

// bad
let point = {x:100,y:200};
let point2 = {y:100,x:200};

// good
let point = {x:100,y:200};
let point2 = {x:100,y:200};
Copy the code

Why not recommend the first course? Because the two objects have different shapes, different hidden classes are generated.

Whenever possible, use literals to initialize full object properties once.

V8 needs to reset the hidden class for the object every time it adds an attribute.

Avoid the delete method

Similarly, deleting an object’s attributes causes V8 to rebuild the hidden class.

Inline Cache

First let’s look at a code slice:

function loadX(o) { 
    return o.x
}
var o = { x: 1,y:3}
var o1 = { x: 3 ,y:6}
for (var i = 0; i < 90000; i++) {
    loadX(o)
    loadX(o1)
}
Copy the code

We define a loadX function that takes o and simply returns O.x.

The usual V8 process for getting O.x is this: look for the hidden class of the object **o**, look for the attribute offset of **x** * through the hidden class, and then get the attribute value based on the offset. LoadX is executed repeatedly in this code, so the process for getting O.x needs to be executed repeatedly. Is there any way that we can simplify the process again, preferably find the value of the x property in one step? The answer is yes.

V8 compresses this lookup process with an inline cache strategy to improve object lookup efficiency. So what is inline caching? How exactly does it work?

Inline Cache (IC). When V8 executes a function, it observes key intermediate data at some callsites in the function and caches the data. The next time V8 executes the function, it can use the intermediate data directly, saving the process of retrieving the data again. Therefore, V8 uses IC. Can effectively improve the execution efficiency of some duplicate code.

Let’s take a closer look at the IC workflow using sample code:

IC maintains a FeedBack Vector for each function, which records key intermediate data during the function’s execution. The feedback vector is actually a table structure consisting of a number of items, each called a Slot, into which V8 writes intermediate data executing the loadX function in turn. In the piece of code, return O.x is a call point because it uses objects and attributes. V8 allocates a slot for this call point in the feedback vector of the loadX function. Each slot contains the slot index, slot type, slot state, map address, and attribute offset. When V8 calls loadX again to return o.x, It looks for the offset of the X attribute in the corresponding slot, and V8 can then fetch the VALUE of the O.X attribute directly from memory, greatly improving execution efficiency.

Polymorphism and superstate

Ok, by caching basic information during execution, you can improve the efficiency of the next execution of the function, but this assumes that the shape of the object is fixed for multiple executions. If the shape of the object is not fixed, what does V8 do?

Let’s adjust the loadX function code above. The adjusted code looks like this:

function loadX(o) { 
    return o.x
}
var o = { x: 1,y:3}
var o1 = { x: 3, y:6,z:4}
for (var i = 0; i < 90000; i++) {
    loadX(o)
    loadX(o1)
}
Copy the code

As you can see, objects O and O1 have different shapes, which means that V8 creates different hidden classes for them.

When loadX is executed for the first time, V8 records the hidden class of O in the feedback vector and the offset of attribute X. When loadX is called again, V8 retrieves the hidden class recorded in the feedback vector, compares it with the new o1 hidden class, and finds that it is not a hidden class, so V8 cannot use the offset information recorded in the feedback vector.

In this case, V8 chooses to record the new hidden class in the feedback vector as well, along with the offset of the attribute value. In this case, the first slot in the feedback vector contains two hidden classes and offsets. When V8 executes the O.x statement in the loadX function again, it also looks for the feedback table and finds two hidden classes recorded in the first slot. An additional thing V8 needs to do is to compare the new hidden class to the two hidden classes in the first slot, and if the new hidden class is the same as a hidden class in the first slot, then use the offset of the hit hidden class. What if there are no identical ones? Also add new information to the first slot of the feedback vector.

One slot of a feedback vector can contain information about multiple hidden classes, then:

If a slot contains only one hidden class, we call this state monomorphic.
If a slot contains 2 ~ 4 hidden classes, we call the status as polymorphic;
If there are more than four hidden classes in a slot, this state is called a Magamorphic.

If there is polymorphic or superstate in the feedback vector of loadX, the execution efficiency is definitely lower than that of singleton. For example, when executing to O.x, V8 will query the first slot of the feedback vector and find that there are multiple maps in it. Then V8 will need to remove the hidden class of O. The more hidden classes recorded in the slot, the more times the comparison is performed, which means the lower the execution efficiency.

So we came to the conclusion that we should try to keep singlets, because singlets have better performance than polymorphisms and superstates.

So far we have seen how V8 optimizes the attributes of the objects it accesses. These three optimizations are linked together, and we have to marvel at the wisdom of the designers. Returning to the question at the beginning of this article, what is the complexity of accessing object properties? I think the answer is approximately O(1), but it’s not the same O(1) complexity principle as accessing object properties in static languages.

The last word

Finally, I’d like to advertise that the principles of browsers series has already had four articles, and will soon conclude with V8 Pipelining. I can not avoid mistakes and omissions, I hope you don’t hesitate to give advice.

Review past

Browser Principles series – details of the browser rendering process

Browser principle series -JS execution context details

Browser principles series -JS memory mechanism and garbage collection