Parallel operation AtomicLong & LongAdder difference

New AtomicInteger and AtomicLong classes for Integer/Long for concurrent use have been added to JDK1.5.

In a concurrent scenario, AtomicInteger and AtomicLong can be used if we need to implement counters. In this way, we can avoid locking and complex code logic. With them, we only need to execute the corresponding encapsulated methods. For example, atomic increments or subtractions of these two variables can satisfy the requirements of most business scenarios.

However, while they are useful, if your business scenario is highly concurrent, you may also find that these two atomic classes can actually have significant performance problems. Why? Let’s expand on the differences between the two and explain why We have AtomicLong followed by LongAdder. With that in mind, let’s first look at AtomicLong’s problems.

The problem with AtomicLong

Let’s write an example that uses thread groups to add up

/** * Description: Use AtomicLong */ on 16 threads

public class AtomicLongDemo {

 

   public static void main(String[] args) throws InterruptedException {

       AtomicLong counter = new AtomicLong(0);

       ExecutorService service = Executors.newFixedThreadPool(16);

       for (int i = 0; i < 100; i++) {

           service.submit(new Task(counter));

       }

 

       Thread.sleep(2000);

       System.out.println(counter.get());

   }

 

   static class Task implements Runnable {

 

       private final AtomicLong counter;

 

       public Task(AtomicLong counter) {

           this.counter = counter;

       }

 

       @Override

       public void run(a) { counter.incrementAndGet(); }}}Copy the code

In this code, you can see that you have created an AtomicLong with a raw value of 0. Then, you have a pool of 16 threads, and you add the same task 100 times to the pool.

Scroll down to see what the mission is. As you can see in the Task class below, this Task is essentially an incrementAndGet() call to AtomicLong, which is equivalent to a self-add operation. So what the whole class does is start the atomic class at zero and add 100 tasks, one for each task.

The result of this code is, of course, 100. AtomicLong keeps the incrementAndGet operation atomically, so thread-safety issues do not occur.

But if we take a closer look inside, you might be surprised. We simplify the model to a concurrent scenario with only two threads working at the same time, because two threads and more threads are essentially the same. As shown in the figure:

You can see that in this diagram, each thread is running in its own core, and each thread has a local memory that it can use exclusively. Below the local memory, there is shared memory shared by the two CPU cores.

The value property inside the AtomicLong, which holds the current AtomicLong value, is volatile, so it needs to be visible.

This way, it needs to flush and refresh every time its value changes. For example, if you start with a value of 0 for CTR, as shown, once Core 1 changes it to 1, it first flushers the latest result of that 1 to the shared memory below on the left. Then, go to the right to refresh the local memory of core 2. In this way, for core 2, it can perceive the change.

Because competition is intense, such flush and refresh operations are costly, and CAS often fail.

LongAdder principle and improvement

In JDK 8, we added the LongAdder class, which is an action utility class for Long. So why add a LongAdder class when you already have AtomicLong?

Again, let’s use an example. The following example is similar, except that we have changed the utility class from AtomicLong to LongAdder. Another difference is that when the result is finally printed, the method called is now sum instead of get. The other logic is the same.

case

/** * Description: Use LongAdder */ in 16 threads

public class LongAdderDemo {

 

   public static void main(String[] args) throws InterruptedException {

       LongAdder counter = new LongAdder();

       ExecutorService service = Executors.newFixedThreadPool(16);

       for (int i = 0; i < 100; i++) {

           service.submit(new Task(counter));

       }

 

       Thread.sleep(2000);

       System.out.println(counter.sum());

   }

   static class Task implements Runnable {

 

       private final LongAdder counter;

 

       public Task(LongAdder counter) {

           this.counter = counter;

       }

 

       @Override

       public void run(a) { counter.increment(); }}}Copy the code

The code also runs at 100, but faster than the AtomicLong implementation. Let’s explain why LongAdder is more efficient than AtomicLong at high concurrency.

Because LongAdder introduces the concept of piecewise accumulation, there are two internal parameters involved in counting: the first is called base, which is a variable, and the second is Cell[], which is an array.

Where base is used in the case of not fierce competition, you can directly change the sum result to the base variable.

Then, when the competition is intense, we use our Cell[] array. Once the race is intense, threads will add to one object in their own Cell[] array rather than share the same object.

In this way, LongAdder will map different threads to different cells for modification, reducing the probability of collisions. This is a segmentation concept that improves concurrency, similar to the idea of 16 segments in Java 7’s ConcurrentHashMap.

When competition is fierce, LongAdder will allocate threads to different cells by calculating the hash value of each thread. Each Cell is equivalent to an independent counter, so as not to interfere with other counters. There is no competition between cells, so in the process of self-addition, This is why The throughput of LongAdder is higher than that of AtomicLong. It is essentially space swap time, because it has multiple counters working at the same time, so it takes up more memory.

So how did LongAdder finally implement multithreaded counting? The answer is in the sum method of the last step. When executing longadder.sum (), the sum of cells in each thread is accumulated and base is added to form the final sum. The following code

public long sum(a) {

   Cell[] as = cells; Cell a;

   long sum = base;

   if(as ! =null) {

       for (int i = 0; i < as.length; ++i) {

           if((a = as[i]) ! =null) sum += a.value; }}return sum;

}
Copy the code

As you can see in this sum method, the thinking is very clear. First take the base value, then iterate over all the cells, adding the value of each Cell to form the final sum. Since there is no lock operation during the statistics, the sum obtained here may not be completely accurate, because the Cell value may be modified during the calculation of sum.

So we’ve seen why AtomicLong or AtomicInteger doesn’t perform well at high concurrency, and we’ve also seen LongAdder perform better.

Different usage scenarios

In the case of low contention, the AtomicLong and LongAdder classes have similar characteristics and throughput because the contention is not high. In a highly competitive situation, however, The expected throughput of LongAdder is much higher. In trials, The throughput of LongAdder is about ten times that of AtomicLong, but there is always a price to pay. LongAdder needs to consume more space while maintaining efficiency.

With a more efficient LongAdder, could AtomicLong not be used? Is it possible to replace AtomicLong with LongAdder wherever it is used? The answer is no, it’s a distinction between scenarios.

LongAdder only provides simple methods such as Add and increment, which are suitable for the scene of statistical summation and counting, but the scene is relatively simple. AtomicLong also has advanced methods such as compareAndSet, which can deal with more complex scenes requiring CAS except addition and subtracting.

Conclusion: If our scenario is just adding and subtracting, we can use the more efficient LongAdder directly, but if we need to use CAS such as compareAndSet, we need to use AtomicLong.

Parallel operation AtomicLong & LongAdder difference

The problem with AtomicLong

LongAdder principle and improvement

Different usage scenarios

Related Posts

Distributed Job System Extirpate-Job-Lite Source code Analysis — Job data Store

Check port usage in centos and kill processes

Long Connection Gateway technology topic (4) : IQiyi WebSocket real-time push gateway technology practice