Bugs are the natural enemies of programmers, and also the ladder of programmers’ progress

Antecedents feed

Some time ago, my colleagues reported that the warehousing of another batch of business data was very slow, and some data did not match the data of the partner, sometimes there was a discrepancy. Therefore, as interested in difficult problems, I began to assist the investigation.

The scene of the accident

After logging in to the log platform, logs are found:

com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: 
Deadlock found when trying to get lock
Copy the code

So this is the root cause of the slow loading of the database and the inconsistency between the data and some of the data returned by the customer. Deadlock. And the probability of occurrence is very high.

I began to wonder if there was a problem with the setting of the transaction isolation level of the business library. After confirming with colleagues related to the infrastructure department, the database isolation level was RC.

Start code walk

When it was found that the database was entered, replace into was used and was inserted by multiple threads because there might be updates in other fields except the unique index field in the record. That’s probably where the problem lies.

Replace into performs insert first and then updates the original record after duplicate key occurs. This means that the operation is not atomic from the engine’s point of view and there may be multiple row locks.

To simplify the business logic and illustrate the problem, simplify the DDL of the table:

Where, field B is the unique index key.

CREATE TABLE `c` (
  `a` int(11) NOT NULL AUTO_INCREMENT,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`a`),
  UNIQUE KEY `idx_uk_b` (`b`)
) ENGINE=InnoDB

Copy the code

We use the program to simulate the crime scene and update the three data in three ways:

@Insert({"replace into c values(NULL,3)"})
void replaceInsertTest1();

@Insert({"replace into c values(NULL,1)"})
void replaceInsertTest2();

@Insert({"replace into c values(NULL,2)"})
void replaceInsertTest3();
Copy the code

Simulating multithreaded write

new Thread() {
    @Override
    public void run() {
        while (flag) {
            try {
                mapper.replaceInsertTest1();
            } catch (Exception e) {
                log.error(e.getMessage(), e);
                flag = false;
            }
        }

    }
}.start();

new Thread() {
    @Override
    public void run() {
        while (flag) {
            try {
                mapper.replaceInsertTest2();
            } catch (Exception e) {
                log.error(e.getMessage(), e);
                flag = false;
            }
        }

    }
}.start();

new Thread() {
    @Override
    public void run() {
        while (flag) {
            try {
                mapper.replaceInsertTest3();
            } catch (Exception e) {
                log.error(e.getMessage(), e);
                flag = false;
            }
        }

    }
}.start();
Copy the code

Deadlocks appear.

Cause analysis,

The duplicate key conflict index is IDx_UK_B. In this case, replace into can be divided into the following steps:

  1. Insert the clustered index primary key, this step must succeed.

  2. Insert the secondary index and check whether there is a conflict on the secondary unique index IDx_UK_B. If yes, undo the clustered index records inserted in Step 1 and go to Step 3. If no, go to Step 4.

  3. Deal with conflicts. The idx_UK index is used to locate and lock the conflicting rows, and delete the conflicting rows after the insert record is successful.

  4. Direct insert record.

The deadlock occurred in the DELETE + INSERT step 3.

We know that MySQL introduces gap locks in RR isolation level to solve the phantom read problem of data records. In RC isolation level, usually gap locks disappear and are degraded to record locks. Therefore, the RC isolation level can improve the performance of concurrent writes.

However, in some special scenarios, the RC isolation level can also contain gap locks. Make sense of the problem

First of all, we need to know the following two knowledge points:

  • MySQL > alter table select * from unique index;

    A range query on a unique index accesses up to the first value that does not satisfy the condition

    This locking principle seems unreasonable and like a bug, because a unique index means that all records cannot be duplicated. In theory, you only need to add a row lock for the record, but in practice you do need to access the next record for the current record.

  • Insert intent lock

    There is no conflict between intention locks, and intention locks are also a gap lock, which exists to improve the concurrency of inserts. When applying for an intent lock, you need to determine whether the next record in the current position of the inserted record holds a lock. If so, you need to determine whether it conflicts with the intent lock. If not, add a next-key Lock without judgment.

In addition, according to MySQL 5.7 Reference, when a delete row is deleted, next-key Lock will Lock the previous interval of the row on the index to prevent phantom reads.

That is, a next-key lock is an index-record lock plus a gap lock on the gap preceding the index record. If one session has a shared or exclusive lock on record R in an index, another session cannot insert a new index record in the gap immediately before R in the index order.

Deadlock can occur if both sessions delete records with the same primary key (the only index is a B+ tree, and the primary key of the B+ tree is field B) and insert a B+ tree index that sorts the records before deleting data (for example, insert a record with a smaller primary key).

conclusion

Now we know that the key point of the problem is that replace updates the record in the way of DELETE + INSERT, and changes the value of the clustered index. It is different to increment the primary key ID after update. The solution to this problem is to avoid reassigning new records during updates, using insert… on duplicate key update … . When this SQL encounters a unique index conflict, it always updates the old record using the update method.

insert into c values(NULL,4) on duplicate key update  b=4
Copy the code

That solves the problem, of course, and leaves the door open for the next big data pagination.

I am xiao Wang elder brother, hope can receive your keyboard transfer current!