1 Innodb MVCC composition
1. Why Do I Use MVCC to Read Snapshots
Innodb storage engine’s snapshot reads are based on multi-version concurrent control MVCC and Undolog. The MVCC mechanism improves system read and write concurrent performance. lock in share mode, select … for update
- Ideas for improving concurrency
Concurrent data consistency is generally implemented as follows: Lock + multiple data versions To improve concurrent read performance: Use shared locks to ensure concurrent read performance, and use exclusive locks to achieve mutual read/write exclusion to improve concurrent read/write performance
1.2 MVCC related Concepts
InnoDB’s MVCC implementation is based on undo log, using rollback segments to save undo log version snapshot data, using the readView mechanism to determine data visibility, and using the Purge thread to purge old data versions
1.2.1 undolog
- Before the mysql transaction is committed, the old version data before the transaction modification will be stored in the Undo log for the impact of transaction rollback or database crash on the database data to ensure the atomicity of data.
- Undo logs are stored in
Rollback segments
In, divided into:insert undo log
和update undo log
1.2.2 readview
- Readview is primarily used for visibility judgment
- At repeatable Read isolation level, snapshot generation only creates readView snapshot after the first SELECT read operation after the transaction is started
- At the Read Committed isolation level, a Read View snapshot is created for each SELECT read in a transaction
1.2.3 Three implicit fields
field | meaning | Storage location | The size of the |
---|---|---|---|
DB_TRX_ID | Last updated transaction ID (UPDATE,delete, INSERT) | Table data row and cluster index | Six byte |
DB_ROLL_PTR | Roll back the pointer to the previous undolog record to form the undo list | Table data row and cluster index | 7 bytes |
DB_ROW_ID | Data row ID, monotonically increasing | Table data row and cluster index | Six byte |
2 Innodb MVCC implementation (5.6.x)
2.1 Multi-version implementation
2.1.1 Transaction snapshot update process
Assume that the field ID in the user table is a clustered index and the field name is a non-clustered index
- Step 1 Create a record
INSERT INTO `user`(`id`, `name`, `score`) VALUES (9.'zhang3'.60)
Copy the code
As shown in the figure:
The data row generated after insertion has three implicit fields: the row ID identifier for the should row, the latest transaction ID identifier, and the rollback pointer, along with the corresponding business property data
- Step 2 Update the cluster index
update table `user` set `score`= 70 where id=9
Copy the code
As shown in the figure:
Alter table undo log undo log alter table undo log alter table undo log Update DB_TRX_ID and DB_ROLL_PTR in the row where the cluster index is located. 5 Commit a transaction to release the lock
-
Step 3 Update the non-clustered index
update table `user` set `score`= 80 where `name` = 'zhang3' Copy the code
As shown in the figure:
Secondary index update
Alter table undo log undo log alter table undo log alter table undo log Change DB_TRX_ID to 668. 5 Insert a new index row and update the maximum trx_id on the Page. 6 Update the DB_TRX_ID and DB_TRX_ID on the cluster index row DB_ROLL_PTR 7 Commit a transaction to release the lock
-
conclusion
Through multiple updates, old version snapshots are concatenated into a linked list by rolling back Pointers
The Undolog log is also deleted by the Purge thread looking for the rollback log of the oldest transaction ID that is currently active
2.2 ReadView visibility judgment
2.2.1 Visibility judgment process
- The readView structure contains the following properties
field | meaning |
---|---|
creator_trx_id | The transaction ID for creating the view |
trx_ids | An array of active read/write transaction ids is stored in order when a ReadView is created |
low_limit_id | Set to the current maximum transaction ID |
up_limit_id | The minimum value in the m_IDS collection |
- Visibility algorithm
When a transaction is started at the RC or RR level, a readView snapshot is generated. When a data is queried by the SELECT, the trx_ID of the data is compared with the readVEiW data. The visibility algorithm is as follows:
- If you record a row
trx_id
Less thanread_view_t::up_limit_id
, the transaction was committed when the ReadView was created and must be visible - If you record a row
trx_id
Greater than or equal toread_view_t::m_low_limit_id
, the transaction was started after the readView was created and must not be visible - when
trx_id
inup_limit_id
andlow_limit_id
Between, and inread_view_t::trx_ids
Array, the transaction was active when the readView was created and its data changes are not visible to the current view, if not in the active transaction listtrx_ids
In thetrx_id
Changes to visible
2.2.2 RRR Repeatable Read process
- Data preparation
CREATE TABLE `user` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8mb4_bin NOT NULL,
`scores` int(10) NOT NULL,
`status` bigint(20) NOT NULL.PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
BEGIN;
INSERT INTO `user` VALUES (1.'wang5'.33.2);
INSERT INTO `user` VALUES (2.'zhao6'.23.2);
INSERT INTO `user` VALUES (9.'zhang3'.12.1);
COMMIT;
Copy the code
-
Procedure Step 1 Start two new transactions AB
-A transactionset session transaction isolation level repeatable read BEGIN SELECT * from user where id = 1Query the transaction ID of the current sessionselect TRX_ID from INFORMATION_SCHEMA.INNODB_TRX where TRX_MYSQL_THREAD_ID = CONNECTION_ID(); -Transaction Bset session transaction isolation level repeatable read BEGIN SELECT * from user where id = 9Query the transaction ID of the current sessionselect TRX_ID from INFORMATION_SCHEMA.INNODB_TRX where TRX_MYSQL_THREAD_ID = CONNECTION_ID(); Copy the code
After transaction A initiates transaction B, the current session transaction IDS are 710 and 711, respectively. According to the readView mechanism, the readView snapshot of transaction B is as follows
-
Step 2 Transaction A updates the submission score data
-Transaction A UPDATEuser set scores = 112 where id = 1 commit Copy the code
At this time, transaction A will generate the old version of Undolog snapshot, transaction B’s readView snapshot is still as follows:
SELECT * from user where id = 1 and trx_ids are [710, 711]
According to the visibility algorithm, the current transaction ID is found in the transaction active list, so we need to go to undolog to query the version. Trx_id = 709; up_limit_ID = up_limit_id
-
Step 3 Transaction C inserts data
-Transaction Cset session transaction isolation level repeatable read BEGIN INSERT INTO `user`(`name`, `scores`, `status`) VALUES ('li4'.55.1SQL > query transaction id of current sessionselect TRX_ID from INFORMATION_SCHEMA.INNODB_TRX where TRX_MYSQL_THREAD_ID = CONNECTION_ID(); COMMIT Copy the code
SELECT * from user where id > 8 and trx_id = 712
Two records with id 9 and 10 are queried. However, the record with ID 10 trx_id = 712 exceeds the maximum value of readView snapshot, so it is not visible. The undo log also does not have the old version record, so only the record with ID 9 can be queried
2.3 Snapshot Read Source Code Parsing
2.3.1 InnoDB three hidden field source code
-
The dict0dict.cc#dict_table_add_system_columns method is added to the table data row
/** * 添加数据行的隐藏字段 * @param table * @param heap */ void dict_table_add_system_columns( dict_table_t* table, mem_heap_t* heap) { // Add row ID, transaction ID, and rollback pointer in sequence dict_mem_table_add_col(table, heap, "DB_ROW_ID", DATA_SYS, DATA_ROW_ID | DATA_NOT_NULL, DATA_ROW_ID_LEN); dict_mem_table_add_col(table, heap, "DB_TRX_ID", DATA_SYS, DATA_TRX_ID | DATA_NOT_NULL, DATA_TRX_ID_LEN); dict_mem_table_add_col(table, heap, "DB_ROLL_PTR", DATA_SYS, DATA_ROLL_PTR | DATA_NOT_NULL, DATA_ROLL_PTR_LEN); } Copy the code
-
The cluster index adds the row0upD.cc # ROW_upD_index_entry_sys_field method
/** * Set the transaction ID or roll back the pointer to the record row corresponding to the cluster index * @param Entry Data row record * @param index Cluster index * @param Type Type DATA_TRX_ID or DATA_ROLL_PTR * @param val Updated value */
void row_upd_index_entry_sys_field(
dtuple_t* entry,
dict_index_t* index,
ulint type,
ib_uint64_t val)
{
dfield_t* dfield;
byte* field;
ulint pos;
// Query the initial position of the index hidden field
pos = dict_index_get_sys_col_pos(index, type);
// Get a pointer to the corresponding data record row
dfield = dtuple_get_nth_field(entry, pos);
field = static_cast<byte*>(dfield_get_data(dfield));
// Add type
if (type == DATA_TRX_ID) {
trx_write_trx_id(field, val);
} else {
ut_ad(type == DATA_ROLL_PTR);
trx_write_roll_ptr(field, val); }}Copy the code
Additional information is stored on the cluster index, the 6-byte DB_TRX_ID field, which represents the transaction ID of the record that was last inserted or updated. The 7-byte DB_ROLL_PTR field points to the undo log of the rollback segment. The 6-byte DB_ROW_ID is automatically incremented when new data is inserted. InnoDB automatically generates a clustered index containing the DB_ROW_ID field if the table does not have a primary key.
2.3.2 ReadView execution process
-
Readview data structure
struct read_view_t{ / / equal low_limit_id trx_id_t low_limit_no; // Maximum transaction ID trx_id_t low_limit_id; // Minimum transaction ID trx_id_t up_limit_id; // Number of active transactions ulint n_trx_ids; // Active transaction trx_id_t* trx_ids; // The transaction ID of the snapshot created trx_id_t creator_trx_id; UT_LIST_NODE_T(read_view_t) view_list; }; Copy the code
-
The row0sel.cc#row_search_for_mysql method is used to determine the current read or snapshot read judgment process
// If locked, the current read if(prebuilt->select_lock_type ! = LOCK_NONE) {/ / to omit }else { // Snapshot read without lock if (trx->isolation_level == TRX_ISO_READ_UNCOMMITTED) { // Read the uncommitted direct latest record row } else if (index == clust_index) { // If the current index is clustered index // Direct visibility judgment if (UNIV_LIKELY(srv_force_recovery < 5) &&!lock_clust_rec_cons_read_sees( rec, index, offsets, trx->read_view)) { rec_t* old_vers; // If not, use undo log to get the version on the record line err = row_sel_build_prev_vers_for_mysql( trx->read_view, clust_index, prebuilt, rec, &offsets, &heap, &old_vers, &mtr); if(err ! = DB_SUCCESS) {goto lock_wait_or_error; } if (old_vers == NULL) { goto next_rec; } // Assign to the old versionrec = old_vers; }}else { // Non-clustered index visibility judgment ut_ad(!dict_index_is_clust(index)); DB_TRX_ID = DB_TRX_ID = DB_TRX_ID = DB_TRX_ID = DB_TRX_ID = DB_TRX_ID // Secondary index does not contain hidden columns, only the maximum trx_id of the page if (!lock_sec_rec_cons_read_sees( rec, trx->read_view)) { switch (row_search_idx_cond_check( buf, prebuilt, rec, offsets)) { // If the ICP does not meet the conditions and the ultra-scan range is not reached, the next record is obtained to continue the search case ICP_NO_MATCH: goto next_rec; case ICP_OUT_OF_RANGE: // If the condition is not met and the hyperscan scope is not met err = DB_RECORD_NOT_FOUND; goto idx_cond_failed; case ICP_MATCH: // If the ICP matches the record, the cluster index is checked to determine the visibility gotorequires_clust_rec; } ut_error; }}}Copy the code
For non-clustered indexes, because there are no hidden columns in the clustered index, when snapshot reads match the secondary index, lock_sec_rec_cons_read_sees is called to determine whether the trx_id of the last change recorded on the page is smaller than up_limit_id. If the value is smaller than the specified value, the data on the page is visible. Otherwise, the system checks the row in the cluster index and determines the visibility based on the DB_TRX_ID value to find the correct visible version data
- Readview start row0sel.cc#row_search_for_mysql
→
Trx_assign_read_view ` method
trx_assign_read_view(
trx_t* trx)
{
ut_ad(trx->state == TRX_STATE_ACTIVE);
// Return if there is one
if(trx->read_view ! =NULL) {
return(trx->read_view);
}
// If no, create one
if(! trx->read_view) { trx->read_view =read_view_open_now(
trx->id, trx->global_read_view_heap);
trx->global_read_view = trx->read_view;
}
// Returns a snapshot
return(trx->read_view);
}
Copy the code
Trx_assign_read_view → read_view_open_now → read_view_open_now_low
- The closing of the readview
ha_innodb.cc#ha_innobase::external_lock
methods
if (trx->n_mysql_tables_in_use == 0) {
trx->mysql_n_tables_locked = 0;
prebuilt->used_in_HANDLER = FALSE;
// autocommit=1, commit transaction
if (!thd_test_options(
thd, OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN)) {
Commit the transaction or mark the end of the SQL statement
if (trx_is_started(trx)) {
innobase_commit(ht, thd, TRUE);
}
// The transaction isolation level is lower than or equal to the RC level
} else if (trx->isolation_level <= TRX_ISO_READ_COMMITTED
&& trx->global_read_view) {
// Close the snapshot
read_view_close_for_mysql(trx); }}/** * transaction isolation level less than or equal to RC level call * @param TRX */
void read_view_close_for_mysql(
trx_t* trx)
{
ut_a(trx->global_read_view);
read_view_remove(trx->global_read_view, false);
mem_heap_empty(trx->global_read_view_heap);
// Snapshot is set to NULL
trx->read_view = NULL;
trx->global_read_view = NULL;
}
Copy the code
RR isolation level: Snapshots are created only after the first SELECT read after the transaction is started and are not deleted until the transaction is committed. For RC isolation level, readView snapshots are created for each SELECT read in the transaction
- Transaction visibility judgment
lock_clust_rec_cons_read_sees
→read_view_sees_trx_id
/** * visibility check process * @param view current transaction readView snapshot * @param trx_id Data row corresponding transaction ID * @return */
bool read_view_sees_trx_id(
const read_view_t* view,
trx_id_t trx_id)
{
// If less than the minimum id of the current transaction
if (trx_id < view->up_limit_id) {
return(true);
// If greater than or equal to the maximum ID of the current transaction snapshot
} else if (trx_id >= view->low_limit_id) {
return(false);
} else {
// If in between
ulint lower = 0;
ulint upper = view->n_trx_ids - 1;
ut_a(view->n_trx_ids > 0);
// Based on the currently active transaction array, find and compare whether trx_id exists in it by dichotomy
do {
ulint mid = (lower + upper) >> 1;
trx_id_t mid_id = view->trx_ids[mid];
if (mid_id == trx_id) {
return(FALSE);
} else if (mid_id < trx_id) {
if (mid > 0) {
upper = mid - 1;
} else {
break; }}else {
lower = mid + 1; }}while (lower <= upper);
}
// The current row is visible if it is not in an active transaction
return(true);
}
Copy the code
3 Reference Materials
Database multi-version implementation insider
Engine features · InnoDB MVCC related implementation
Recommended reading
Guava Cache actual Combat – From scenario to principle analysis
Details of HTTP2.0 and HTTPS protocols
, recruiting
Zhengcaiyun Technology team (Zero) is a passionate, creative and executive team based in picturesque Hangzhou. The team has more than 300 r&d partners, including “old” soldiers from Alibaba, Huawei and NetEase, as well as newcomers from Zhejiang University, University of Science and Technology of China, Hangzhou Electric And other universities. Team in the day-to-day business development, but also in cloud native, chain blocks, artificial intelligence, low code platform system, middleware, data, material, engineering platform, the performance experience, visualization technology areas such as exploration and practice, to promote and fell to the ground a series of internal technical products, continue to explore new frontiers of technology. In addition, the team is involved in community building, Currently, There are Google Flutter, SciKit-Learn, Apache Dubbo, Apache Rocketmq, Apache Pulsar, CNCF Dapr, Apache DolphinScheduler, and Alibaba Seata and many other contributors to the excellent open source community. If you want to change something that’s been bothering you, want to start bothering you. If you want to change, you’ve been told you need more ideas, but you don’t have a solution. If you want change, you have the power to make it happen, but you don’t need it. If you want to change what you want to accomplish, you need a team to support you, but you don’t have the position to lead people. If you want to change the original savvy is good, but there is always a layer of fuzzy window…… If you believe in the power of believing, believing that ordinary people can achieve extraordinary things, believing that you can meet a better version of yourself. If you want to be a part of the process of growing a technology team with deep business understanding, sound technology systems, technology value creation, and impact spillover as your business takes off, I think we should talk. Any time, waiting for you to write something and send it to [email protected]
Wechat official account
The article is published synchronously, the public number of political cloud technology team, welcome to pay attention to