This is the first day of my participation in the Gwen Challenge in November. Check out the details: the last Gwen Challenge in 2021
The art of multipropcessor programming is an important programming tool for programming and programming. The art of multipropcessor programming is an important programming tool for programming. And according to the personal information and understanding of the experience, to you want to have a deeper understanding of the people share some personal information
Spin locking and contention
1. On the spin lock of TAS and TTAS
In the previous section we implemented TASLock and TTASLock spinlocks. Since both compareandSets cause a broadcast on the interconnect, this causes delays for all threads, including those that are not waiting for the lock. To make matters worse, the compareAndSet call causes other processors to discard copies of their cache, so that every spinning thread encounters a cache miss almost every time and needs to fetch a new value across the bus. Even worse, when the thread holding the lock tries to release the lock, the release may be delayed because the interconnect may be monopolized by the spinning thread. That’s why TASLock performs so poorly.
Let’s examine the behavior of the TTASLock lock when the lock is held by thread A. Thread B has a cache miss when it reads the lock for the first time, blocking the wait for the value to be loaded into its cache. As long as A holds the lock, B keeps reading the value and hitting the cache each time. This way, when A holds the lock, there is no bus traffic and it does not slow down the access speed of other threads. In addition, A releasing the lock is not delayed by the spinning thread.
However, when the lock is released, it causes A bus storm: thread A writes false to the lock variable to release the lock, which invalidates the spin thread’s cache copy immediately.
2. Exponential Backoff
We may see the term Backoff a lot in microservice system design. It often occurs when a microservice call fails and retries occur, often not directly, but at intervals. The retry interval is also not fixed. For the same request, there is a relationship between the retry interval and the number of retries. The most common one is exponential.
This design actually stems from the underlying software design adapted to the hardware. Key to contention is the contention for multiple threads for the same resource, in this case a lock. High contention refers to a large number of threads competing for the same lock, and low contention refers to the opposite.
In our previous implementation of TTASLock, there were two main steps to lock: keep reading the lock state, and when reading is idle, try to acquire the lock. If one thread goes through the full process but fails to acquire the lock, and another thread does, it is likely that the lock is highly contended for. Trying to acquire a highly contoured resource is something to avoid. This is because the probability of the thread acquiring resources is very small, but the resulting bus traffic is very large. Instead, it is more efficient to let the thread go back for a while without contending for the lock.
How long should the thread go back before trying again? A good approach is to make the time to fall back proportional to the number of retries, because the more retries there are, the higher the likelihood of high contention. Here’s a simple way to do it:
- Read lock state
- When reading is idle, try to acquire the lock
- If the lock fails to be acquired, a random period of time is backed up
- Repeat steps 1 to 3. If the lock fails to be acquired, the backward time of Step 3 is doubled until a fixed maximum value, maxDelay, is set.
Let’s implement this lock:
public class Backoff { private final long minDelay; private final long maxDelay; private long current; public Backoff(long minDelay, long maxDelay) { this.minDelay = minDelay; this.maxDelay = maxDelay; MinDelay this. Current = minDelay; } public void backoff () {/ / use ThreadLocalRandom prevent concurrent influence random long delay = ThreadLocalRandom. The current () nextLong (1, current); MaxDelay current = math.min (current * 2L, maxDelay); try { Thread.sleep(delay); } catch (InterruptedException e) { //ignore } } }Copy the code
public class TTASWithBackoffLock implements Lock { private boolean locked = false; private final Backoff backoff = new Backoff(10L, 100L); Private static final VarHandle locked; Static {try {/ / handle to initialize the LOCKED = MethodHandles. The lookup () findVarHandle (TTASWithBackoffLock. Class, "LOCKED", boolean.class); } catch (Exception e) { throw new Error(e); }} @override public void lock() {while (true) { // SPIN while ((Boolean) locked.get (this)) {// SPIN while ((Boolean) locked.get (this)) {// SPIN while ((Boolean) locked.get (this)) {// SPIN while ((Boolean) locked.get (this)) {// SPIN while ((Boolean) locked.get (this)) { Thread.sleep thread.onspinwait (); If (LOCKED.compareAndSet(this, false, true)) {return; } else {backoff.backoff(); } } } @Override public void unlock() { LOCKED.setVolatile(this, false); }}Copy the code
After that, we use JMH to test the performance difference between TTASWithBackoffLock and TTASLock previously implemented:
@benchmarkMode (mode.singleshotTime) @benchmarkMode (mode.singleshotTime) @warmup (iterations = 1) // You can go to @fork (1) // And iterations = 10. We test 10 times. @Measurement(Iterations = 10) @state (value = scope.benchmark) Public Class LockTest {private static class ValueHolder {int count = 0; } / / testing different number of threads @ Param (value = {" 1 ", "2", "5", "10", "20", "50", "100"}) private int threadsCount; @Benchmark public void testTTASWithBackoffLock(Blackhole blackhole) throws InterruptedException { test(new TTASWithBackoffLock()); } @Benchmark public void testTTASLock(Blackhole blackhole) throws InterruptedException { test(new TTASLock()); } private void test(Lock lock) throws InterruptedException { ValueHolder valueHolder = new ValueHolder(); Thread[] threads = new Thread[threadsCount]; For (int I = 0; i < threads.length; i++) { threads[i] = new Thread(() -> { for (int j = 0; j < 5000000 / threads.length; j++) { lock.lock(); try { valueHolder.count++; } finally { lock.unlock(); }}}); threads[i].start(); } for (int i = 0; i < threads.length; i++) { threads[i].join(); } if (valueHolder.count ! = 5000000) { throw new RuntimeException("something wrong in lock implementation"); } } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder().include(LockTest.class.getSimpleName()).build(); new Runner(opt).run(); }}Copy the code
The result:
Benchmark (threadsCount) Mode Cnt Score Error Units LockTest. TestTTASLock 1 SS 10 0.064 ± 0.005s /op Locktest.testttaslock 2 SS 10 0.138 ± 0.044 s/op locktest.testTTaslock 5 SS 10 0.426 ± 0.100 s/op locktest.testttaslock 10 SS 10 0.699 ± 0.128 S /op locktest.testttaslock 20 SS 10 0.932 ± 0.241 s/op locktest.testTTaslock 50 SS 10 1.162 ± 0.542 s/op LockTest. TestTTASLock 100 ss 10. 1.379 + / - 0.939 s/op LockTest testTTASWithBackoffLock 1 ss 10 0.068 + / - 0.008 S/op LockTest. TestTTASWithBackoffLock 2 ss 10 0.080 + / - 0.023 s/op. LockTest testTTASWithBackoffLock 5 ss 10 0.135 + / - 0.037 S/op LockTest. TestTTASWithBackoffLock 10 ss 10 0.187 + / - 0.072 s/op. LockTest testTTASWithBackoffLock 20 ss 10 + / - 0.200 0.063 s/op LockTest. TestTTASWithBackoffLock 50 ss 10 0.239 + / - 0.052 s/op. LockTest testTTASWithBackoffLock 100 ss 10, 0.261 ± 0.042 S /op Process Finished with exit code 0Copy the code
As you can see from the results, the performance is much better.
While the rollback based lock implementation is simple, it also improves performance. However, for different machines and different configurations, it is difficult to find the most suitable minDelay and maxDelay in general.