1. Problem description

Duplicate key error occurred on master after update from 5.6 to 5.7. Duplicate key error occurred on master and RO instances.

For example, run the “show create table” command to check the AUTO Increment ID of a table before the migration and then change to 1758598 after the migration. In fact, the maximum value of the self-added columns in the new table generated by the migration is 1758609.

The user uses the Innodb engine, and according to the operation and maintenance students, we have encountered similar problems before, which can be restored by restarting.

2. Troubleshooting kernel problems

Because the user reported that the access was normal on 5.6, an error was reported after switching to 5.7. Therefore, the first thing to suspect is that there is a problem with the 5.7 kernel, so the first reaction is to search the official bug list to see if there is a similar problem, so as to avoid duplication. After searching, I found an official similar bug. Here is a brief introduction of the bug.

Background 1

Auto Increment parameters and data structures in Innodb engine

The main parameters are as follows: Innodb_autoINC_lock_mode is used to control the lock mode of obtaining self-increment, auto_increment_increment, and auto_increment_offset is used to control the increment interval and the start offset of the self-increment column.

The main structures involved include: data dictionary structure, saving the current auto increment value of the entire table and protection lock; A transaction structure, which holds the number of rows processed within a transaction; A handler structure that holds information about the iteration of multiple lines within a transaction.

Background 2

Mysql and Innodb engine autoIncrement access and change process

(1) Data dictionary structure (dict_table_t) swap in and out of the autoincrement value save and restore. Swap out the autoincrement stored in the global mapping table, and then eliminate dict_table_t in memory. The dict_table_t structure is restored by searching for the global mapping table. The related functions are dict_table_add_to_cache and dict_table_remove_from_cache_low.

(2) ROW_import, table TRUNCate process updates autoincrement.

(3) When the handler first opens the table, it queries the value of the largest autoinc column in the current table and increments the value of the largest autoInc column by 1.

(4) Insert process. The stack of related changes to AutoInc is as follows:

Ha_innobase: : write_row: Handler :: update_AUTO_INCREMENT increment handler::update_auto_increment increment Call Innodb interface to get an increment and adjust the increment based on the value of the current auto_increment variable; Also sets the value of the next increment column to be processed by the current handler. Ha_innobase :: get_auto_INCREMENT: Obtains the current auto increment value in dict_tabel. Ha_innobase :: dict_TABLE_autoINC_INITIALIZE: Updates the value of auto Increment. If the specified value is greater than the current value, the value will be updated. Handler ::set_next_insert_id: Sets the value of the increment column of the next row to be processed in the current transaction.Copy the code

(5) update_row. For the “INSERT INTO T (C1,c2) VALUES(x,y) ON DUPLICATE KEY UPDATE” statement, advance the value of Auto INCREMENT regardless of whether the row pointing to the unique index column exists.

The relevant codes are as follows:

if (error == DB_SUCCESS && table->next_number_field && new_row == table->record[0] && thd_sql_command(m_user_thd) == SQLCOM_INSERT && trx->duplicates) { ulonglong auto_inc; ... auto_inc = table->next_number_field->val_int(); auto_inc = innobase_next_autoinc(auto_inc, 1, increment, offset, col_max_value); error = innobase_set_max_autoinc(auto_inc); ... }Copy the code

From the perspective of our actual business processes, our errors could only involve insert and update processes.

BUG 76872 / 88321: “InnoDB AUTO_INCREMENT produces same value twice”

(1) Bug Overview: If the autoINC_LOCK_mode is greater than 0 and the auto_increment_increment is greater than 1, a duplicate key error occurs when multiple inserts are performed on tables after the system restarts.

(2) Cause analysis: InnoDB will set the autoincrement value to Max (ID) + 1 after restart. On the first insert, the write_ROW process calls handler:: update_AUTO_INCREMENT to set the autoInc information.

First, run ha_innobase:: get_AUTO_INCREMENT to obtain the current autoincrement value (Max (ID) + 1), and change the next Autoincrement value to next_id according to related parameters of autoIncrement. When auto_increment_increment is greater than 1, Max (ID) + 1 is not greater than next_id.

Handler :: update_auto_INCREMENT After obtaining the value returned by the engine layer, the system recalculates the auto increment of the current row based on the auto increment parameter. Since Innodb internally considers global parameters, the handle layer calculates the increment of Innodb’s increment ID as next_id, so a row with increment ID next_id will be inserted.

The handler layer sets the next Autoincrement value at the end of write_row based on the current row value next_id. If another thread is also inserting during the next autoincrement period when write_row has not set the table, then it will also get the increment next_id. This creates repetition.

(3) Solution: Consider the global Autoincrement parameter when the autoincrement column is obtained internally by the engine, so that the autoincrement obtained by the first insertion thread after the restart is not Max (ID) + 1, but next_id, and then set the value of the next autoincrement according to next_id. Since this process is locked, no other thread will get the same value when it fetches autoIncrement.

Based on the above analysis, this bug only occurs if autoINC_LOCK_mode > 0 and auto_increment_increment > 1. In actual online services, both parameters are set to 1. Therefore, the possibility of online problems caused by this bug can be ruled out.

3. On-site analysis and reproducibility verification

Since the official bug didn’t solve our problem, we had to start by analyzing the error phenomenon.

(1) Select Max ID, AUTOincrement and autoincrement from CURRENT_TIMESTAMP; (2) select Max ID, autoincrement and autoincrement from CURRENT_TIMESTAMP; Let’s see if there’s a pattern. The captured information is as follows:

The update time column is the last time when the auto INCREMENT increment value is inserted or modified. The update time column is the last time when the AUTO INCREMENT VALUE is inserted or modified.

Referring to the usage description of Auto Increment in the official document, the UPDATE operation can only update the auto-increment ID but does not trigger auto Increment. Following this idea, I tried to reproduce the user’s scene. The recurrence method is as follows:

Also, in the binlog, we can see that there is an update operation. As shown in figure:

However, since binlog is in ROW format, it is impossible to tell whether this is caused by a kernel problem or the user’s own update. Therefore, we contacted the customer for confirmation, and the result was that the customer was quite sure that the operation of updating the autoincrement column was not carried out. So where do these increment columns come from?

(2) After analyzing the user’s table and SQL statement, we find that there are three types of tables (Hz_notice_stat_sharding, Hz_group_stat_sharding, and hz_freeze_balance_sharding). All three tables have auto-increment primary keys.

The hz_freeze_balance_sharding table does not have an autoinc error. Could it be that users access the two tables differently?

Grab the user’s SQL statement, and sure enough, the first two are the replace into operating table, finally a table with the update operation. Is the replace into statement causing the problem?

Search official bug, found another suspected bug.

Bug #87861: “Replace into causes master/slave have different auto_increment offset values”

The reason:

Mysql > replace into update Mysql > replace into update Mysql > replace into update Mysql > replace into update Mysql > Replace into update Mysql > Replace into update Mysql > Replace into update Mysql > Replace into update Mysql > Replace into update Mysql The Insert statement synchronously updates autoIncrement; the UPDATE statement does not.

(2) Replace into delete+ INSERT on Master, autoincrement is normal. After the data is copied to the slave in ROW format, the slave plays back the update operation and updates only the self-increment key value in the ROW but not the autoincrement. Therefore, Max (ID) is greater than Autoincrement on the slave. In ROW mode, binlog records all column values for the INSERT operation. Playback on the slave does not reassign the increment ID, so no error is reported. However, if the slave cuts the master, the Duplicate key error will appear in the Insert operation.

(3) Because the user migrates from 5.6 to 5.7 and then directly inserts on 5.7, it is equivalent to cutting off the master slave, so an error will be reported.

4. Solutions

Possible solutions on the business side:

(1) Change the format of binlog to mixed or statement

Insert on duplicate key update instead of replace into

Possible solutions on the kernel side:

(1) If a replace INTO statement is encountered in ROW format, logEvent in statement format is recorded and the original statement is recorded in binlog.

(2) Log the logEvent of the replace INTO statement as a DELETE event and an INSERT event in ROW format.

5, notes

(1) The change of autoINC_LOCK_mode and auto_increment_INCREMENT parameters may easily lead to duplicate keys, so it is necessary to avoid dynamic modification in the process of using autoINCREMENT.

(2) In case of online problems, on-site analysis should be done first to clarify the scene of the occurrence of the fault, the user’s SQL statement, the scope of the occurrence of the fault and other information. At the same time, backup should be made to the configuration information, binlog and even instance data related to the instance to prevent expiration and loss. Only in this way can we accurately match scenes when looking for official bugs. If there are no official bugs, we can also independently analyze existing clues.

The original link: cloud.tencent.com/developer/a…

Wenyuan network, only for the use of learning, such as infringement, contact deletion.

I have collected quality technical articles and experience summary in my public account “Java Circle”.

In order to facilitate your learning, I have compiled a set of learning materials, covering Java virtual machine, Spring framework, Java threads, data structures, design patterns and so on, free for students who love Java! More learning communication group, more communication problems can be faster progress ~