V8 underwent a major architectural overhaul, including a redesign of the entire V8 compiler architecture and much of the garbage collector, replacement of crankshafts with TurboFan, and Orinoco’s adoption of parallel garbage collection. This article briefly describes the upgrade.

Many in the Node.js community are excited about the recent V8 update, which includes the entire V8 compiler architecture and most of the garbage collector. TurboFan replaced crankshafts, Orinoco used concurrency for garbage collection, and there were other improvements.

The new and improved V8 engine shipped with Node.js version 8 means that we can write idiomatic declarative JavaScript without worrying about the performance overhead imposed by compiler defects, as the V8 development team explained.

Due to the use of NodeSource in my work, I researched these latest changes, including reviewing blog posts from the V8 team, reading V8 source code, and building tools to verify specific performance metrics.

I have put the information in the V8-Perf repository on Github for your convenience. This material is also the basis for my NodeSummit talk this week and my series of blog posts.

Because this upgrade is a lot of change and a lot of complexity, I’m going to provide a brief introduction in this article and explore this topic in more detail in a future blog post in this series.

Immediately if you want to know more information, please direct access to the v8 – perf (https://github.com/thlorenz/v8-perf).

It is well known that previous V8s suffered from a so-called optimization killer that seemed beyond repair in the engine. The V8 team also had a hard time implementing new JavaScript language features with good performance characteristics.

The main reason is that the V8 architecture has become very difficult to change and extend. Optimization of the compiler Crankshaft was not implemented with a promising language in mind, and the lack of isolation between middle and layer levels of the compiler pipe was also an issue. In some extreme cases, developers must manually write assembly code for these four base architectures.

The V8 team realized that this was not a sustainable system, especially as JavaScript itself grew faster and needed to add many new language features. As a result, they redesigned a new compiler architecture. It is divided into three distinct layers: the front end, the optimization layer, and the back end.

The front end is mainly responsible for generating bytecode run by the Ignition interpreter, while the optimization layer improves the performance of the code by optimizing the TurboFan compiler. The back end performs lower-level tasks such as machine-level optimization, scheduling, generating machine code for the supported architecture, and so on.

Back-end separation alone is 29% less than the original architecture code, even though the new architecture can support nine architectures.

The main goals of this new V8 architecture include:

  • Minor performance jitter

  • Improved startup speed

  • Improving baseline performance

  • Reduce memory usage

  • Support for new language features

The first three goals were related to the Ignition decoder, and the fourth goal was partially achieved through improvements in that section.

First, I’ll focus on architecture and explain it in the context of these goals.

In the past, V8 teams focused on optimizing code performance at the expense of interpreting bytecode performance; This leads to dramatic performance jitters and makes the runtime characteristics of the application very unpredictable in general. If an application’s crankshafts were to trigger somewhere in the code and cause its optimization process, this could result in a significant performance degradation — in some cases, parts of the application could be up to 100 times slower. To avoid this situation, the developers wrote Crankshaft scripts for compiler optimizations.

However, it turns out that optimizing the compiler is not as important as optimizing the interpreter for most Web pages, because the code needs to run quickly, there is no time to wait for the code to load, and because predictive optimization is not easy, optimizing the compiler can even affect performance in some cases.

The solution is to improve the baseline performance of the interpreter bytecode. This is done through an in-line optimization phase at bytecode generation, resulting in highly optimized interpreter code that can execute instructions and interact with the rest of the V8 VM in a low-overhead manner.

Because bytecodes are small, memory usage is reduced, and they run very fast, further optimizations can be deferred. In addition, more information can be gathered through inline caching before tuning is attempted, reducing the cost of de-tuning and re-tuning due to assumptions about how the code will perform.

Running bytecode instead of optimizing code with TurboFan doesn’t have the same harmful effects because performance is closer to optimal code; This means that any performance jitter is much smaller.

With the new V8, most of the time you just need to think about writing declarative JavaScript and using good data structures and algorithms. However, when your application is running hot code, you may want to ensure that it is running at optimum performance.

The TurboFan optimized compiler uses advanced techniques to make hot code run as fast as possible. These technologies include ways to connect massive nodes, innovative scheduling methods, and more on that in a future blog post.

TurboFan relies on input type information gathered through inline caches, which run through the Ignition interpreter. Using this information, you can generate code that is optimal enough to handle a variety of situations.

The less variation the compiler needs to take into account in the type of function input, the smaller and faster the generated code will be. Therefore, you can help TurboFan speed up your code by keeping functions singly or minimally polymorphic.

  • Singomorphism: An input type

  • Polymorphism: Two to four input types

  • Metamorphosis: Five or more input types

Rather than blindly chasing the best performance, I recommend that you first take a look at how the optimized compiler handles your code and examine code that can degrade performance.

To make this easier, I created the Deoptigate project, which aims to provide your functions with the singlet/polymorphic/metamorphosis properties of optimizations, de-optimizations, and processing functions.

Let’s start with a simple example script that I’ll configure using Deoptigate.

I defined two vector functions: Add and subtract.

function add(v1, v2) {
  return {
    x: v1.x + v2.x
  , y: v1.y + v2.y
  , z: v1.z + v2.z
  }
}

function subtract(v1, v2) {
  return {
    x: v1.x - v2.x
  , y: v1.y - v2.y
  , z: v1.z - v2.z
  }
}Copy the code

Next, I execute these functions in the body of the loop using objects of the same type (with the same attributes assigned in the same order).

const ITER = 1E3
let xsum = 0
for (let i = 0; i < ITER; i++) {
  for (let j = 0; j < ITER; j++) {
    xsum += add({ x: i, y: i, z: i }, { x: 1, y: 1, z: 1 }).x
    xsum += subtract({ x: i, y: i, z: i }, { x: 1, y: 1, z: 1 }).x
  }
}Copy the code

The Add and subtract functions should be run at a cost to performance and should be optimized accordingly.

Now execute them again, passing the object to add, and there are no objects of the same type as before, because their attributes are assigned in a different order ({y: I, x: I, z: I}).

Subtract function is passed the same object value as before.

for (let i = 0; i < ITER; i++) {
  for (let j = 0; j < ITER; j++) {
    xsum += add({ y: i, x: i, z: i }, { x: 1, y: 1, z: 1 }).x
    xsum += subtract({ x: i, y: i, z: i }, { x: 1, y: 1, z: 1 }).x
  }
}Copy the code

Run this code and check it using Deoptigate.

node --trace-ic ./vector.js
deoptigateCopy the code

When executing our script using the -trace-ic flag, V8 writes the information we need to the isolate-v8.log file. When Deoptigate runs under this folder, it processes the file and visually displays the contained data.

It is a Web application, so you can open it in a browser for further action.

Deoptigate provides us with a summary of all the files, in our case vector.js. For each file, it displays relevant optimization, de-optimization, and inline cache information. Here green means there is no problem, blue is a secondary problem, and red is a potentially important problem that should be investigated. Simply click the name of the file to expand the details of the file.

The source code for the file is provided on the left, with comments pointing out potential performance issues. On the right, we can learn more about each question. The functions of the two views are in series; Clicking on the comment on the left highlights more details of the comment on the right, and vice versa.

After a quick look, we can see that Subtract shows that there is no potential problem, but add does exist. Clicking the red triangle in the code highlights the relevant anti-optimization information on the right. Note the reason for using Map incorrectly.

Clicking on any of the blue phone ICONS will bring up more information. We find that the function becomes polymorphic. As we can see, this is also due to Map mismatches.

Check for minor alarms at the top of the page for more optimizations, this time including timestamps for the Add function.

We saw that Add was optimized after 32ms. At about 40ms, it was given an input type, which the optimized code didn’t take into account — hence the wrong mapping — and was downgraded to Ignition bytecode at this point, while collecting more inline cache information, and was optimized again shortly after 41ms.

In summary, the Add function is ultimately executed through optimized code, but that code needs to handle two types of input (different mappings), which is larger but not as optimal as before.

In contrast, subtract function is optimized only once, which can be verified by clicking on the green triangle in the function signature.

Some people may wonder why V8 thinks that objects created by {x, y, z} assignments are different from objects created by {y, x, z} assignments, since they have exactly the same properties but are just assigned in a different order.

This is due to the way maps are created when JavaScript objects are initialized, which will be the subject of another article (I’ll explain this in more detail at the NodeSummit conference). I hope to continue to follow my series of blogs.

https://nodesource.com/blog/why-the-new-v8-is-so-damn-fast

ArchSummit global Architect Summit will be held in Beijing International Convention Center on December 7-8, focusing on topics such as micro-service financial architecture, micro-service architecture, data infrastructure platform construction, short video architecture, blockchain, information privacy security and so on. Technical experts from Alibaba, Netflix, Baidu, LinkedIn and others were invited to share.

30% discount registration, immediate reduction of 2040 yuan, any questions welcome to consult Lachel- Ash, ticketing manager, tel/wechat: 17326843116.