Links to articles:

  1. Memory management
  2. ArrayBuffers and SharedArrayBuffers

  3. Using Atomics to avoid Race Conditions in SharedArrayBuffers

 

In my last article, I talked about how using SharedArrayBuffers leads to race conditions. This makes it difficult to use SharedArrayBuffers. I do not recommend that application developers use SharedArrayBuffers directly.


But library developers with multithreaded programming experience can use these new underlying apis to create packaged tools. Application developers can then use these tools without directly touching SharedArrayBuffers or Atomics.


Even if you don’t use SharedArrayBuffers and Atomics directly, I think it’s still important to understand how they work. So in this article, I’ll explain what types of race conditions concurrency brings and how Atomics can avoid them.


But first, what are race conditions?

Race conditions: An example you may have seen before

A very simple example of race conditions can occur when you have a variable that needs to be shared between two threads. Suppose one thread wants to load a file, and another thread checks to see if it exists. They share a variable fileExists to communicate.

Initially, fileExists is set to false.

As soon as the code in thread 2 runs first, the file is loaded.

But if the code in thread 1 runs first, it will throw an error to the user indicating that the file does not exist.

But that’s not the problem, it’s not that the file doesn’t exist, the real problem is race conditions.


Many JavaScript developers already encounter such race conditions, even in single-threaded code. You don’t need to understand anything about multithreading to understand that this is a competition.


Even if some race conditions don’t exist in single-threaded code, this can happen when you program with multiple threads that share memory.

Race conditions for different classes and how Atomics works

Let’s explore some of the different types of race conditions that can exist in multithreaded code and how Atomics can avoid them. This doesn’t cover all race conditions, but it should give you an idea of why the API provides the methods it does.

Before we get started, I want to say one more thing: You shouldn’t use Atomics directly, and writing multithreaded code is a known problem. Instead, you should use a reliable library to handle shared memory in multithreaded code.

Don’t ask for it…

1. Race conditions in a single operation

Suppose you have two threads incrementing the same variable. You might think that the end result would be the same, no matter which path goes first.

However, in the source code, the increment variable looks like a separate operation, but when you look at the compiled code, it is not a separate operation.

In the CPU, incrementing a value requires three instructions. The reason is that computers have both long-term and short-term memory.

All threads share long-term memory, but short-term memory — registers — is not shared between threads.


Each thread needs to fetch values from memory into its registers. It can then run the calculation of that value in a register. It then writes the value back from its short-term storage to long-term storage.

If all operations in thread 1 occur first and all operations in thread 2 occur next, we will get the expected result.

But if they are interleaved in time, the value thread 2 pulls into the register is out of sync with the value in memory. This means that thread 2 does not consider the results of thread 1’s calculations. Instead, thread 1 overwrites the value in memory with its own value.

One of the things Atomics operations do is take these “as single operations for humans, but multiple operations for computers, and make them look like single operations for computers.”


This is why they are called Atomics operations. Because they took an operation that would normally have multiple instructions — where the instructions could be paused and resumed — it made them all seem instantaneous, as if it were a single instruction. It’s like an indivisible atom.

With Atomics operations, the incremental code looks a little different.

Now that we’re using atomics.add, the different steps involved in adding variables don’t get confused between threads. Instead, one thread completes its Atomics operation and prevents another thread from starting. The other will then start its own Atomics operation.


Atomics methods that help avoid this race are:

  • Atomics.add
  • Atomics.sub
  • Atomics.and
  • Atomics.or
  • Atomics.xor
  • Atomics.exchange

You’ll notice that this list is rather limited. It doesn’t even include division and multiplication. However, Library developers can create Atomics-like operations for their own needs.

To do this, developers can use Atomics.compareExchange. This way, you can take a value from SharedArrayBuffer, operate on it, and write it back to SharedArrayBuffer only if no other thread updates it after your first check. If another thread updates it, you get the new value and recalculate.

2. Race conditions across multiple operations

These Atomics operations help avoid race conditions during a “single operation.” But sometimes you want to change multiple values on an object (using multiple operations) and make sure that no one else is changing the object at the same time. This means that during each change to the object, the object is locked and cannot be accessed by other threads.


The Atomics object does not provide any direct way to deal with this problem. But it provides a tool to handle this — create a lock.

If the code wants to use the locked data, it must acquire the lock for the data. It can then use the lock to lock other threads. Only when the lock is active can it access or update data.

To set up locks, library authors can use atomics. wait and atomics. wake, as well as others such as Atomics.compareExchange and atomics.store. If you want to know how this works, take a look at this basic lock implementation.

In this case, thread 2 acquires the lock for the data and sets locked to true. This means that thread 1 cannot access the data until thread 2 unlocks.

If thread 1 needs to access the data, it will attempt to acquire the lock. But because the lock is already in use, it cannot operate. The thread will then be in a wait state — so it will be blocked — until the lock is available.

Once thread 2 is complete, it will notify to unlock. The lock notifies one or more waiting threads that it is now available.

A thread on the wait can then acquire the lock and lock the data for its own use.

Libraries that provide locks will use a number of different methods on Atomics objects, the two most important being:

  • Atomics.wait
  • Atomics.wake

3. Race conditions caused by instruction reordering

Atomics deals with a third synchronization problem, which may be surprising.


You may not realize it, but there’s a good chance that the code you’re writing isn’t running in the order you expect it to. Both the compiler and the CPU reorder the code to make it run faster.


Suppose you’ve written some code to count the total. You want to set a flag when the calculation is complete.

To compile this, we need to decide which register to use for each variable. We can then translate the source code into machine instructions.

Most computers divide the process of running instructions into multiple steps. This ensures that all different parts of the CPU are always busy in order to make full use of the CPU.


Here is an example of the steps the directive goes through:

  1. Retrieves the next instruction from memory
  2. Find out what the instruction tells us to do (i.e., decode the instruction) and get the value from the register
  3. An instruction to
  4. Write the result back to the register

This is how an instruction passes through a pipe. Ideally, we want the second directive to follow. Once it gets to phase 2, we need to get the next instruction. The problem is that there is a dependency between instructions 1 and 2.

We can pause the CPU until instruction 1 updates the subTotal in the register. But this will slow down the CPU.


Many compilers and cpus do code reordering to make cpus run more efficiently. They look for other directives that don’t use subTotal or total and move them between these two lines.


This keeps the instruction stream flowing through the pipe, constantly keeping the instruction executed


Because line 3 does not depend on any value in line 1 or line 2, the compiler or CPU considers it safe to reorder like this. When you run in a single thread, no other code will see these values until the entire function completes.


This is not the case when you are running another thread simultaneously on another processor. Another thread does not have to wait until the function completes to see these changes. It sees them almost as soon as it writes back into memory. So you can see that isDone is set before Total.


If you use isDone as a flag to indicate that the total has been calculated and can be used in another thread, this reordering results in race conditions


Atomics tries to address some of these shortcomings. When you write with Atomics, it’s like putting a fence between two parts of your code.

Atomics operations do not reorder relative to each other, nor do other operations move around them. Two operations that are commonly used to force a sort are:

  • Atomics.load
  • Atomics.store


All of the above variables updated atomics. store in the function’s source code are guaranteed to be completed before atomics. store writes its value back into memory. Even if non-Atomics instructions are reordered relative to each other, they are not moved to the source code below atomics.store.


As with all subsequent variables atomics.load in a function, ensure that atomics.load gets its value in the same way. Even if non-Atomics instructions are reordered, they are not moved to Atomics.load above them in the source code.

Note: The while loop I show here is called a spinlock and is very inefficient. If it is on the main thread, it will cause your application to stop running. You don’t want to use it in real code.

Again, these methods are not intended to be used directly in the application code. Instead, the Library uses them to create locks.

conclusion

It is difficult to write multiple threads that share memory. There are many different race conditions for you to pit.

This is why it is not recommended to use SharedArrayBuffers and Atomics directly in your application. Instead, you are recommended to use libraries developed by experienced multithreaded developers and developers who spend time studying memory models.


It’s early days for SharedArrayBuffer and Atomics. Those libraries haven’t been created yet. But these new apis provide a foundation on which to build.