1. The background

I’ve been working on some problems with Seata thread concurrency recently, and I want to share some of my experiences with you. To briefly describe this problem, in Seata, a distributed transaction framework, there is a concept of global transaction. In most cases, global transaction flows are basically sequential without concurrency problems, but in some extreme cases, multi-threaded access can cause us to handle global transactions incorrectly. During our global transaction commit phase, we have the following code:

    if (status == GlobalStatus.Begin) {
        globalSession.changeStatus(GlobalStatus.Committing);
    }
Copy the code

“There’s a little bit of ellipsis in the code,” do you think of the status state as Begin, “and change the state to research (right)?

During our global transaction ROLLBACK phase, we have the following code:

if (status == GlobalStatus.Begin) {
            globalSession.changeStatus(GlobalStatus.Rollbacking);
        }
Copy the code

Again, part of the code is omitted, which checks whether the status is begin and then changes to Rollbacking. There is no way to do thread synchronization in Seata’s code, and if the two logic are executed at the same time (which is usually not the case, but can happen in extreme cases), our results could be unpredictably wrong. All we have to do is solve the problem of concurrency in this extreme case.

2. The pessimistic locking

For this kind of concurrent problem, I believe that the first thing you think of is locking. In Java, we generally use the following two methods to lock:

  • Synchronized
  • ReentrantLock

Locking can be done with Synchronized or ReentrantLock, and the code can be modified to use the following logic:

synchronized:

synchronized(globalSession){
            if(status == GlobalStatus.Begin) { globalSession.changeStatus(GlobalStatus.Rollbacking); }}Copy the code

ReentrantLock:

 reentrantLock.lock();
 try {
    if  (status == GlobalStatus.Begin) {
    globalSession.changeStatus(GlobalStatus.Rollbacking);
        }
    }finally {
            reentrantLock.unlock();
    }
Copy the code

This locking is relatively simple and is currently implemented in Seata’s Go-Server. However, this implementation scenario ignores the case we mentioned above, which is that in the extreme case, it is possible that 99.9% of the time there will be no concurrency problems and only %0.1 of the time will cause this concurrency problem. Although we pessimistic lock a lock time is relatively short, but in this high-performance middleware is not enough, so we introduced our optimistic lock.

3. Optimistic locking

When optimistic locks are mentioned, many friends will think of optimistic locks in the database. Imagine the above logic in the database, and did not use optimistic locks to do, we would have the following pseudo-code logic:

select * from table where id = xxx for update;
if(status == begin){
    //do other thing
    update table set status = rollbacking;
}
Copy the code

The above code, which can be seen in much of our business logic, has two small problems:

  1. Transactions are large, and since we lock our data in the first place, there must be a transaction in which our query and update are interspersed with some time-consuming logic and our transaction will be large. Since each of our transactions occupies a database connection, it is easy to run out of database connection pools during high traffic.
  2. It takes a long time to lock the data. In our entire transaction, we lock the data. If other transactions want to modify the data, it will block and wait for a long time.

So to solve the above problem, in many scenarios where there is little competition, we use the optimistic locking method. We add a field version to the database to represent the version number, and we change the code to look like this:

select * from table where id = xxx ;
if(status == begin){
    //do other thing
    int result = (update table set status = rollbacking where version = xxx);
    if(result == 0){ throw new someException(); }}Copy the code

If the number of rows in the update statement is 0, it indicates that other transactions have changed the update statement. You can throw an exception here or do something else.

From this we can see that we use optimistic locks to solve both problems of large transactions and long locks, but the corresponding cost is that if the update fails we may throw an exception or do some other remedial action, whereas our pessimistic locks are already limited before the transaction can be executed. So optimistic locking must only be used in small cases where a certain piece of data is processed concurrently.

3.1 Optimistic locking in code

We talked about optimistic locking in a database, and a lot of people are asking, how can we implement optimistic locking in our code without a database? Those of you familiar with synchronized will know that synchronized optimized it after Jdk1.6 to introduce a model of lock inflation:

  • Biased lock: A lock that is biased to a thread. It applies to a thread that can acquire the lock for a long time.
  • Lightweight lock: If biased lock acquisition fails, it is done using CAS spin. Lightweight lock is suitable for alternating threads into critical sections.
  • Heavyweight lock: After a spin failure, a heavyweight lock strategy is adopted and our thread blocks and suspends.

In the above hierarchical locking model, the thread for the lightweight lock alternately enters the critical region is suitable for our scenario, because our global transaction is generally not a single thread processing the transaction all the time (of course, it can be optimized into this model, but the design will be more complicated). Most of the time, our global transaction will be handled by different threads alternately entering the transaction logic, so we can borrow the idea of lightweight CAS spin to do our code-level spin locking. Here, too, you might ask why synchronized? Here, through measurement, CAS spin performance achieved by ourselves is the highest when entering the critical region alternately, and synchronized has no timeout mechanism, which makes it inconvenient for us to deal with abnormal situations.

 class GlobalSessionSpinLock {
        
        private AtomicBoolean globalSessionSpinLock = new AtomicBoolean(true);

        public void lock() throws TransactionException {
            boolean flag;
            do {
                flag = this.globalSessionSpinLock.compareAndSet(true.false);
            }
            while(! flag); } public voidunlock() {
            this.globalSessionSpinLock.compareAndSet(false.true);
        }
    }
  // method rollback  
  void rollback(){
    globalSessionSpinLock.lock();
    try {
        if(status == GlobalStatus.Begin) { globalSession.changeStatus(GlobalStatus.Rollbacking); } }finally { globalSessionSpinLock.unlock(); }}Copy the code

Above, we simply implemented an optimistic lock with CAS. However, this optimistic lock has a small disadvantage that once there is a competition, it cannot expand into a pessimistic lock blocking and waiting, and there is no expiration or timeout, which may occupy a lot of our CPU, so we continued to optimize:

        public void lock() throws TransactionException {
            boolean flag;
            int times = 1;
            long beginTime = System.currentTimeMillis();
            long restTime = GLOBAL_SESSOION_LOCK_TIME_OUT_MILLS ;
            do {
                restTime -= (System.currentTimeMillis() - beginTime);
                if (restTime <= 0){
                    throw new TransactionException(TransactionExceptionCode.FailedLockGlobalTranscation);
                }
                // Pause every PARK_TIMES_BASE times,yield the CPU
                if (times % PARK_TIMES_BASE == 0){
                    // Exponential Backoff
                    long backOffTime =  PARK_TIMES_BASE_NANOS << (times/PARK_TIMES_BASE);
                    long parkTime = backOffTime < restTime ? backOffTime : restTime;
                    LockSupport.parkNanos(parkTime);
                }
                flag = this.globalSessionSpinLock.compareAndSet(true.false);
                times+ +; }while(! flag); }Copy the code

The code above has the following optimizations:

  • The introduction of timeout mechanism, generally speaking, a critical region lock must do a good job of timeout mechanism, especially in this kind of high performance requirements of the middleware.
  • Lock inflation is introduced, where there is no loop a certain number of times and if you don’t get the lock, the thread will hangparkTimeIf we cannot get it again, we will suspend our parkTime in the form of exponential retreat, gradually increasing our suspension time until timeout.

conclusion

From our treatment of concurrency control, there are many ways to achieve a goal, we need to choose the appropriate method according to different scenarios, different conditions, choose the most efficient means to achieve our goal. If you can only remember one thing after reading this article, remember to consider optimistic locking when implementing thread concurrency safety.

Finally, this article was included in the JGrowing Concurrent Programming article, which is a comprehensive and excellent Java learning path co-built by the community. If you want to participate in the maintenance of open source projects, you can build it together. The address of github is github.com/javagrowing… A little star, please.

If you find this article helpful to you, your attention and forwarding will be my biggest support.