Is postMessage Slow?

No, actually, it depends.

What does “slow” mean? I’ve said it before, and I’ll say it again: if you haven’t measured it, you can’t say it’s slow. And even if you measure it and lose context, these performance numbers don’t mean much.

In fact, it is now true that even Web Workers are afraid to consider postMessage() because they are worried about its performance. Of course, that means it does lend itself to research. Some of my last research on this topic also received this feedback. Let’s take a substantive look at the performance of postMessage() with numbers, and then look at what might be causing the crash. You can also take a look at postMessage() and see what happens if it really does cause performance problems.

Are you ready? Let’s get started.

How does postMessage work?

Before we grade, we need to understand what postMessage() is and what parts of it are worth scoring. In addition, we must stop collecting meaningless data and drawing meaningless conclusions.

PostMessage () is actually a feature of the HTML Spec (not ECMA-262! PostMessage () relies on structured cloning to copy messages from one JavaScript scope to another, as I mentioned in my last blog on deep copy. If we look closely at the postMessage() specification, we can see that structured cloning is a two-step process:

Structured cloning algorithm

  1. Run the StructuredDeserialize() algorithm on the message

  2. Create a message event and dispatch a message event with a deserialized message returned from the receiving port.

This is a simplified version of the algorithm, in which we only need to focus on the part of this blog. This description isn’t technically accurate, but it gets the gist. For example, StructuredSerialize() and StructuredDeserialize() aren’t really functions, and they haven’t been exposed by JavaScript so far. So what do these two functions do? Now, you can think of StructuredSerialize() and StructuredDeserialize() as sort of smart versions of json.stringify () and json.parse (). They are smart enough to handle periodic data structures, such as maps, sets, and arrayBuffers. But does this intelligence come at a cost? We’ll come back to that later.

What is not explicitly stated about the two algorithms above is that the serialization algorithm determines the scope of the message sender and the Deserialization algorithm determines the scope of the message receiver. One more thing: in fact, neither Chrome nor Safari runs StructuredDeserialize() until it actually touches the data attribute of the message event, and neither Firefox handles deserialize before distributing the event.

Note: Both implementations are spec compliant and accurate. I had a bug opening deserializing with Mozilla, so I asked the Mozilla developers if they were willing to adjust their implementation so that they could control the ‘performance hit’ caused by the heavy burden of deserializing.

With all this in mind, we now have to choose a benchmark that we can measure end-to-end, which is essentially a measure of how long it takes for a task to be sent to the main thread. However, this number is used to capture the total time taken to execute serialization and deserialization in two different scopes. Remember: All of this work is motivated by the desire to keep the main thread idle and responsive. Of course, we can also measure StructuredDeserialize() separately by limiting the standard to how long it takes Chrome and Safari to get the data property, so we can get Firefox data as well. Of course I haven’t found a way to directly measure StructuredSerialize() by itself, because there’s no way to track execution.

Neither approach is ideal, but with the idea of building a flexible Web app in mind, I decided to use this end-to-end benchmark to set an upper limit for postMessage().

We understand postMessage() conceptually and aim to measure performance, so I’ll use “microbenchmarks”, so be aware of the gap between these numbers and reality.

Benchmark 1: How long it takes to send a message

Depth and breadth range from 1 to 6, and 1000 objects will be generated for each permutation.

This benchmark generates objects with specific “depth” and “breadth” values ranging from 1 to 6. Each combination of depth and breadth sends 1000 objects from the task to the main thread via postMessage. The property name of these objects is a randomly generated hexadecimal number represented as a string, and the property value is a random Boolean in hexadecimal, a random floating-point number, or also a random hexadecimal string. This benchmark will measure the conversion time and calculate the 95th percentile

The results of

This is the result of standard Chrome running on Firefox, Safari and 2018 MacBook Pro, Pixel 3XL and Nokia 2, respectively.

Note: You can find the baseline data, write your own programs to generate it and get a sense of the key points in the data. This is also the first time in my life that I have written Python.

Pixel 3 and the numbers on Safari in particular can be a bit confusing, all browsers will disable SharedArrayBuffer and reduce the accuracy of the timer when they detect crashes. The function I use to measure performance.now() is subject to this limitation. However, since Chrome does site isolation on the desktop version, only it can undo these changes. More specifically, the browser uses the following values to get the performance. Now () precision:

  1. Chrome(desktop): 5µs

  2. Chrome (Android version) : 100 (including s

  3. Firefox(Desktop): 1ms(I treat it as a disabling flag)

  4. Safari(Desktop): 1ms

The complexity of an object shown in the data directly determines the difficulty of serializing and deserializing that object. This is not surprising, though: serialization and deserialization are processes that transform an entire object from one to another. This data further hints at how JSON objects can give us a better idea of how long it will take to transform them.

Benchmark 2: What makes postMessage slow?

To verify this, I modified the baseline: I generated all permutations from 1 to 6 in the horizontal and vertical coordinates, but otherwise all the leaf attributes would have a string value between 16 bytes and 2KB in length.

The results of

There is a strong correlation between conversion time and the length of the string returned by json.stringify. I think this correlation is enough to conclude a rule: a string-like JSON object is roughly proportional to its conversion time. More important, however, is the fact that this correlation is only significant for very large objects, and by “large” I mean over 100 kilobytes. When the object is small, the variance is large and the correlation is weaker.

Evaluation process: About sending a message

We have the data now, but it doesn’t make sense if we don’t put it in context. If we want to make a meaningful conclusion, we need to define slow. Budgeting is a very useful tool here, so I’ll go back to the RAIL Guide again to build our preplan.

In my experience, a key responsibility of a Web task is, at the very least, to ensure that you manage your app’s state objects. When a user interacts with your app, the state often changes. According to the principles in RAIL, we have 100ms to respond to user interactions. This also means that even on the slowest devices, you can use postMessage() to send 100KB objects within your plan.

But if you have JS animations, things change. Since every visual change updates every frame, the RAIL preplan is 16ms in animation. If our task blocks the main thread for longer than that, we’ll have a problem. Focus on the numbers in our benchmark, anything as small as 10KiB doesn’t challenge the smoothness of your animation. As mentioned earlier, this is why we prefer CSS animations to JS-driven animations that need to hog the main thread. CSS animations and transitions run on another thread and are not bothered by blocking on the main thread.

Be sure to send more data

In my experience, postMessage() is not a bottleneck for many apps that are off the main thread architecture. I will admit, however, that when your message is not very large or you need to send a message in a high-flow situation, you need extra preparation. What if native postMessage() is too slow for you?

The patch

State objects, for example, can be very large in their own right, but usually only a few deeply nested properties change. We will encounter this situation in PROXX. A clone of our game Minesweeper, in which the state of the game consists of a two-dimensional array that forms the game grid. Each grid stores the status of whether there is a mine, and whether it has been triggered or flagged.

interface Cell {
  hasMine: boolean;
  flagged: boolean;
  revealed: boolean;
  touchingMines: number;
  touchingFlags: number;
}

Copy the code

This means that the largest 40×40 grid can form a 134KiB JSON object. We definitely want to send the entire state object. Instead of sending the entire object for any one change, we log the change and send a patch instead. We have not used ImmerJS in the past, which is a library that works with immutable objects. It does provide a quick way to generate and inherit such patches.

// worker.js
immer.produce(stateObject, draftState => {
  // Manipulate `draftState` here
}, patches => {
  postMessage(patches);
});

// main.js
worker.addEventListener("message", ({data}) => {
  state = immer.applyPatches(state, data);
  // React to new state
}

Copy the code

The code block patch generated by ImmerJS looks like this:

[
  {
    "op": "remove",
    "path": [ "socials", "gplus" ]
  },
  {
    "op": "add",
    "path": [ "socials", "twitter" ],
    "value": "@DasSurma"
  },
  {
    "op": "replace",
    "path": [ "name" ],
    "value": "Surma"
  }
]

Copy the code

This means that the area you need to transform is actually the same size as the change you made, not the size of your object.

block

As I said before, changes to state objects are actually changes to a very small number of properties. But not always. In fact, PROXX has a very big vision for the patch setup. The first mine in minesweeper affects 80% of the game area, increasing the patch Settings to around 70KiB. For feature phones, this is a heavy burden, especially when we have WebGL animations running with JS.

We asked ourselves an architectural question, can our app support partial updates? Patch Settings are a collection of patches. Instead of sending the patch collection all at once, you can also cut the patch collection into smaller pieces and inherit them sequentially. Send patches 1-10 in the first message, 10-20 in the next, and so on. If you use it well, you can transport your patches efficiently, which allows you to use whatever form you like or know in responsive programming.

Of course, it can also ruin or affect your visuals if you’re not careful. However, if you can master how to cut and document patches, you can avoid unwanted effects. For example, you can ensure that the first block contains all the elements visible to the screen in the patch. And put the remaining patches into the patch set to give the main thread breathing room.

We’re going to block it in PROXX. When the user clicks on an area, the current task traverses the entire grid and turns all areas that need to be changed into a list. If the list size exceeds a critical point, we send the list we already have to the main thread, empty the list and continue traversing the game area. These patch sets are small enough that even a feature phone’s postMessage() consumption is manageable. We also had enough time budget in the main thread to control our game’s UI. This traversal algorithm starts with the first block and extends outward, which means that our patches are sorted in the same way. If the frame budget of a main thread can accommodate only one message (like the Nokia 8110), some updates will be disguised as display animations. If we were on a more powerful machine, the main thread would keep updating message events until its budget was consumed by the JS event loop.

Classic: In [PROXX], patches in the patch set look like animations, especially on low-end phones, or desktops with 6x cpus.

Or JSON?

Json.parse () and json.stringify () are incredibly fast. JSON is a small subset of JS. So the parser needs to control very few surprises. They are also highly optimized because they are used frequently. Mathias recently pointed out that if a large object is converted by Json.parse, it can reduce the time you take to convert it. Maybe we could use JSON to improve postMessage() time? Unfortunately, it doesn’t.

Comparing the performance of manual JSON serialization with native postMessage() is inconclusive.

There is no clear winner between the two; native postMessage() performs better in the best use cases and worst in the worst.

Binary format

Another way to deal with the effects of structured cloning is to not use it at all. In addition to structurally cloning objects, postMessage() can also convert a specific type. An ArrayBuffer is a convertible type. As the name implies, converting an ArrayBuffer does not involve copying. The sender has actually lost access to the buffer, which is now owned by the receiver. Converting an ArrayBuffer is very fast and has nothing to do with its size. The downside is that an ArrayBuffer is a contiguous collection of blocks of cache. We no longer work as objects or properties. If we want an ArrayBuffer to work then we have to decide how our data organizes itself. It would be very wasteful to do this in data, but if we knew the form of the data at the time of the data construction, we might be able to apply a lot of optimization methods that we can’t use in cloning algorithms.

One operation that can be optimized is in the form of FlatBuffers. FlatBuffers have the ability to compile JS and other languages and convert them from model descriptions to code. The code contains methods to serialize and deserialize your data. Even more interesting: FlatBuffers do not need to parse the entire ArrayBuffer and return the form it contains.

WebAssembly

Everyone loves WebAssembly. One way is to use WebAssembly to look at serialization libraries in other language ecosystems. CBOR, a JSON-inspired binary form, has been inherited by many languages. ProtoBuffers and the previously mentioned FlatBuffers also have extensive language support.

However, we can be cheeky and rely on the memory layout of the language as a serialization format. I’ve written a small example of Rust: It defines a state structure, symbolically regardless of your application state, with setters and getters so I can track and manipulate the JS state. To serialize the state object, I simply copy the block of memory that the structure occupies. To deserialize, I collected a new state object and overwrote its data with a deserialization function. Because I’m using the same WebAssembly object in both examples, the memory distribution is exactly the same.

This is just a proof of concept; if you have Pointers in your structure, you can easily take advantage of undefined behavior. There are still some unnecessary coding behaviors, please code carefully!

pub struct State { counters: [u8; NUM_COUNTERS] } #[wasm_bindgen] impl State { // Constructors, getters and setter... pub fn serialize(&self) -> Vec<u8> { let size = size_of::<State>(); let mut r = Vec::with_capacity(size); r.resize(size, 0); unsafe { std::ptr::copy_nonoverlapping( self as *const State as *const u8, r.as_mut_ptr(), size ); }; r } } #[wasm_bindgen] pub fn deserialize(vec: Vec<u8>) -> Option<State> { let size = size_of::<State>(); if vec.len() ! = size { return None; } let mut s = State::new(); unsafe { std::ptr::copy_nonoverlapping( vec.as_ptr(), &mut s as *mut State as *mut u8, size ); } Some(s) }Copy the code

Note: Ingvar points out a seriously dubious serialization library called Abomonation that works with Pointers. His advice: “Don’t try it!”

The WebAssembly module is about 3KiB in size when compressed. Much of this footprint comes from memory management and key library functions. When a change occurs, the overall state object is sent. But thanks to the conversion capabilities of ArrayBuffer, this cost is actually very low. In other words: this technique can ignore the size of the state object and almost keep the transition time constant. But it also takes more time to get the data. There is always a weakness that cannot be avoided!!

This technique also requires that state objects do not use data structures such as Pointers. These values are not available when copied to a new WebAssembly module instance. Therefore, you may need a higher-level language to implement this method. My recommendations are C, Rust and AssemblyScript, as long as you have complete control over memory management.

SABs & WebAssembly

Caution: This section is based on SharedArrayBuffer, which is not supported except for desktop Chrome. This work is ongoing, but no estimated time of completion (ETA) can be provided.

Especially for game developers, I’ve heard a lot of requests to add the ability to share objects across threads to JS. I think this should not be added to the language itself, as it breaks one of the basic presuppositions of the JS engine. However, SharedarrayBuffers (” SABs “) are one of those surprises. SABs behaves much like ArrayBuffers, except that when it is transformed, a field does not lose access. They will be cloned and both fields will have access to the same block of memory. SABs allows JS to receive a shareable memory module. Atomics provides mutex automation for synchronization of domains.

With SABs, you just convert memory blocks when your app starts. However, aside from the binary presentation problem, you can use Atomics to avoid having one domain read a state object while another is still writing, and vice versa. This takes into account the performance impact.

Since manually serializing and deserializing data with SABs is an option, you can also include threaded WebAssemblies. WebAssembly has standard support for threads, but this is governed by the availability of SABs. With threaded WebAssembly, you can write code using exactly the same patterns as in threaded programming languages. This of course depends on the cost of development complexity, business processes, and larger overall modules that may need to be delivered.

Conclusion:

Here’s my verdict: even on the slowest devices, you can use postMessage() with a maximum object size of 100KiB and a return budget of 100ms, and if you have JS animations, loads below 10KiB are safe, which is sufficient for most apps. PostMessage () does cost money, but it doesn’t crash the non-main thread architecture to the extent that it does.

If your load is greater than the above, you can try using patches or converting your data to binary form. Consider state layout and patching capabilities as infrastructure capabilities that can help your app run on a wide range of device models. If you feel that shared memory modules are your best bet, WebAssembly will open the way for you in the future.

As I hinted about the Actor Model in a previous blog post, I strongly believe that we can effectively use mainline-free tools on the Web today. But this requires us to step out of our comfort zone with threaded languages, and the network defaults to all. We also need to continue exploring alternative structures and models that incorporate the unity of Web and JS, and the benefits are well worth it.