This question comes from a friend who met byte interview recently, and finally he also successfully got byte offer. I think many people are not clear about this question, so I want to talk about it separately.
All right, let’s get down to business.
What is pseudo-sharing
First of all, as we all know, with the development of CPU and memory speed difference problem, resulting in CPU speed much faster than memory, so generally now the CPU has joined the cache, is often said to solve the performance difference between different hardware problem.
In this case, quite simply, the introduction of cache will inevitably lead to cache consistency issues, thus, the introduction of the cache consistency protocol. (If you don’t know, you are advised to go to Baidu. There is no expansion here.)
CPU cache, as the name implies, the closer it is to the CPU, the faster the cache is, the smaller the capacity is, and the higher the cost is. Generally, the cache can be divided into L1, L2, and L3, according to the classification of performance: L1>L2>L3.
In fact, data is stored in rows inside the cache, called cache rows. Cache lines are typically an integer power of 2 bytes, typically ranging from 32 to 256 bytes. The most common cache line size is now 64 bytes.
So, in this way, the data in the cache is not stored as a single variable, but as multiple variables in a row.
We often say that an example of this is the array and a linked list and an array of memory address is continuous, when we go to read the elements in the array, the CPU will later a number of elements in the array is loaded into the cache, improving efficiency, but the list does not, that is to say, the memory address of continuous variables is likely to be in a cache line.
When multiple threads concurrently modify multiple variables in a cached row, performance deteriorates because only one thread can operate on the cached row at the same time. This problem is called pseudo-sharing.
Why can only one thread operate? Let’s take an actual chestnut to illustrate the situation:
Suppose there are two variables in the cache, x and y, that are already in different levels of the cache.
Two threads, A and B, are modifying the variables x and y in Core1 and Core2.
If thread A tries to modify the x variable in Core1’s cache, the corresponding cache line in Core2 that cached the x variable will be invalidated due to the cache consistency protocol, and it will be forced to reload the variable from main memory.
In this case, frequent access to main memory, the cache is basically invalid, will lead to performance degradation, this is the problem of pseudo-sharing.
How to avoid it?
Now that you know what fake sharing is, how can you avoid it?
Change the way rows are stored? Don’t even think about it.
The only way to do that is to fill in. Wouldn’t it be nice if I was the only data in this row?
Indeed, there are two common solutions.
Byte filling
Prior to JDK8, you could avoid the problem of pseudo-sharing by padding bytes, as shown in the following code:
In general, the cache line is 64 bytes, and we know that a long is 8 bytes. After filling five longs, that’s 48 bytes.
In Java, object headers take up 8 bytes in 32-bit systems and 16 bytes in 64-bit systems, so filling five longs fills 64 bytes, or one cache line.
@ Contented annotations
JDK8 and later versions of Java provide Sun.misc.Contended annotations that can solve the problem of pseudo-sharing through the @contented annotation.
Using the @contented annotation adds 128 bytes of padding and requires enabling the -xx: -restrictContEnded option to take effect.
Therefore, you will find that the size of the object header and the size of the cache line depend on the operating system bit. JDK annotations help you solve this problem, so it is recommended to use annotations as much as possible.
Although the pseudo-sharing problem is solved, this padding method also wastes cache resources. Although the size is only 8B, it simply uses 64B cache space, resulting in the waste of cache resources.
And we know that caches are small and expensive, so it’s up to you to choose between time and space.
The practical application
In Java, multiple atomic variable operation classes are provided, such as AtomicLong and AtomicInteger, to update variables through CAS, but the failure will be infinite spin attempts, resulting in a waste of CPU resources.
To address this shortcoming in high concurrency, the LongAdder class has been added in JDK8, and its use is a practical application for addressing pseudo sharing.
LongAdder inherits from Striped64 and maintains a Cell array internally. The core idea is to split the competition of a single variable. In multi-threading, if a Cell fails to compete, it will go to other cells for CAS again.
The real core of tackling pseudo-sharing lies in the Cell array, which uses the Contented annotation.
As mentioned above, the memory addresses of arrays are contiguous, so the elements of arrays are often placed in a cache row, which can cause the problem of pseudo-sharing and affect performance.
Using a Contented mind here avoids the problem of pseudo-sharing, so that elements in the array no longer share a cached row.
Ok, today’s content is here, I am Ai Xiaoxian, my slogan has not been good, but we will see you next time.