When using the volatile keyword, there is an additional concern about pseudo-sharing.
Let’s start with the CPU cache model:
There is a cache between the CPU and main memory, which can be accessed much faster than memory. When the CPU reads data, it first tries to obtain data from the cache. If the cache fails to hit, the CPU reads data from the memory. When a piece of data is manipulated, it is first modified in the cache and then synchronized to main memory.
The unit of cache read is cache line, which means that the CPU reads one cache line each time it reads data from the cache.
We know that the volatile keyword can be used to ensure that variables are visible. It does so through the cache consistency protocol: once a CPU changes a row, the row is invalidated to other cpus and can only be reloaded from memory.
The cache row is typically 64 bytes in size and can hold up to eight long variables.
Now imagine a situation where we have a set of volatile long variables that are operated on simultaneously by multiple threads. These variables are contiguous in memory and stored on the same cache line. Because of the cache consistency protocol, as long as any variable is modified, the entire cache row will be invalidated, and other cpus can only re-access the memory when they want to read, resulting in relatively large overhead, this problem is called “fase sharing”.
An effective way to address pseudo-sharing is to populate the cache rows. If we can ensure that each volatile variable has an exclusive cache row, the cache will not be invalidated if other variables are modified, a space-for-time strategy.
The following code, from Martin Thompson’s blog, demonstrates the effect of populating cached lines:
public final class FalseSharing
implements Runnable
{
public final static int NUM_THREADS = 4; // change
public final static long ITERATIONS = 500L * 1000L * 1000L;
private final int arrayIndex;
private static VolatileLong[] longs = new VolatileLong[NUM_THREADS];
static
{
for (int i = 0; i < longs.length; i++)
{
longs[i] = newVolatileLong(); }}public FalseSharing(final int arrayIndex)
{
this.arrayIndex = arrayIndex;
}
public static void main(final String[] args) throws Exception
{
final long start = System.nanoTime();
runTest();
System.out.println("duration = " + (System.nanoTime() - start));
}
private static void runTest(a) throws InterruptedException
{
Thread[] threads = new Thread[NUM_THREADS];
for (int i = 0; i < threads.length; i++)
{
threads[i] = new Thread(new FalseSharing(i));
}
for (Thread t : threads)
{
t.start();
}
for(Thread t : threads) { t.join(); }}public void run(a)
{
long i = ITERATIONS + 1;
while (0 != --i)
{
longs[arrayIndex].value = i;
}
}
public final static class VolatileLong
{
public volatile long value = 0L;
public long p1, p2, p3, p4, p5, p6; // comment out}}Copy the code
In this code, four threads are opened simultaneously to loop through 50 million volatile variables.
Note that line 61 declares six long integer variables that have no effect. The purpose is to increase the gap between each two volatile variables so that they do not appear on the same cache line as possible.
We can comment out line 61 and look at the running time of the output.
The uncommented time is 16544754400, and the commented time is 43853611600, which is 3 times faster after filling the cache line.
It is worth noting that this padding may not work after jdk1.7. It has been noted that jdk1.7 optimizes variables that have no effect at compile time, rendering the above code ineffective. (but it seems that this is also related to virtual machines, I did not reproduce this problem in JDK1.8, hotspot environment)
Online I’ve seen two ways around this problem.
The first is to trick the compiler’s optimization mechanism by adding an operation to populate the variable:
public final static class VolatileLong
{
public volatile long value = 0L;
public long p1, p2, p3, p4, p5, p6;
public long sum(a) {
returnp1 + p2 + p3 + p4 + p5 + p6; }}Copy the code
The second approach is to use inheritance to put padding in subclasses and bypass optimizations (as used in the Disruptor framework) :
public static class VolatileLong
{
public volatile long value = 0L;
}
public final static class PaddingLong extends VolatileLong {
public long p1, p2, p3, p4, p5, p6;
}
Copy the code
In addition, an @contended annotation is provided in java8 that can be used to align cache lines and resolve pseudo-sharing. To use it, you need to add: -xx: -restrictContEnded on JVM directives