preface
When designing tables in mysql, it is recommended not to use uuid or non-consecutive and non-repeating snowflake ids (long and unique, Auto_increment is recommended as a primary key id, but is not recommended as an auto_increment primary key ID.
In this blog we will analyze this problem and explore the internal causes.
The mysql program example in this blog is summarized using the index structure comparison of UUID and increment ID
Mysql and program instances
1.1. To illustrate this problem, let’s first create three tables
User_auto_key user_uuid user_random_key user_auto_key user_uuid user_random_key user_auto_key
Select primary key from each table using a different strategy, while the other fields are the same. Then test the insert speed and query speed of the table:
Note: the random key here actually refers to the random ID calculated by the snowflake algorithm: a string of 18-bit long values
1.2. The theory is not enough, so we can directly use spring jdbcTemplate to implement the increment test:
Technical Framework: Springboot +jdbcTemplate+junit+ Hutool, the principle of the program is to connect their own test database, and then write the same amount of data in the same environment, to analyze the insert time to integrate its efficiency, In order to achieve the most realistic effect, all data are randomly generated, such as name, email, address are randomly generated.
package com.wyq.mysqldemo; import cn.hutool.core.collection.CollectionUtil; import com.wyq.mysqldemo.databaseobject.UserKeyAuto; import com.wyq.mysqldemo.databaseobject.UserKeyRandom; import com.wyq.mysqldemo.databaseobject.UserKeyUUID; import com.wyq.mysqldemo.diffkeytest.AutoKeyTableService; import com.wyq.mysqldemo.diffkeytest.RandomKeyTableService; import com.wyq.mysqldemo.diffkeytest.UUIDKeyTableService; import com.wyq.mysqldemo.util.JdbcTemplateService; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.util.StopWatch; import java.util.List; @SpringBootTest class MysqlDemoApplicationTests { @Autowired private JdbcTemplateService jdbcTemplateService; @Autowired private AutoKeyTableService autoKeyTableService; @Autowired private UUIDKeyTableService uuidKeyTableService; @Autowired private RandomKeyTableService randomKeyTableService; @test void testDBTime() {StopWatch StopWatch = new StopWatch(" SQL "); /** * auto_increment key */ final String insertSql = "INSERT INTO user_key_auto(user_id,user_name,sex,address,city,email,state) VALUES(? ,? ,? ,? ,? ,? ,?) "; List<UserKeyAuto> insertData = autoKeyTableService.getInsertData(); Stopwatch. start(" automatic key table creation task starts "); long start1 = System.currentTimeMillis(); if (CollectionUtil.isNotEmpty(insertData)) { boolean insertResult = jdbcTemplateService.insert(insertSql, insertData, false); System.out.println(insertResult); } long end1 = System.currentTimeMillis(); System.out.println("auto key :" + (end1-start1)); stopwatch.stop(); / * * * uudID key * / final String insertSql2 = "INSERT INTO user_uuid (id, user_id, user_name, sex, address, city, email, state) VALUES(? ,? ,? ,? ,? ,? ,? ,?) "; List<UserKeyUUID> insertData2 = uuidKeyTableService.getInsertData(); Start (" key table task start for UUID "); long begin = System.currentTimeMillis(); if (CollectionUtil.isNotEmpty(insertData)) { boolean insertResult = jdbcTemplateService.insert(insertSql2, insertData2, true); System.out.println(insertResult); } long over = System.currentTimeMillis(); System.out.println(" timeout for UUID key :" + (over-begin)); stopwatch.stop(); /** * final String insertSql3 = "INSERT INTO user_random_key(id,user_id,user_name,sex,address,city,email,state) VALUES(? ,? ,? ,? ,? ,? ,? ,?) "; List<UserKeyRandom> insertData3 = randomKeyTableService.getInsertData(); Stopwatch. start(" random long key table task starts "); Long start = System.currentTimeMillis(); if (CollectionUtil.isNotEmpty(insertData)) { boolean insertResult = jdbcTemplateService.insert(insertSql3, insertData3, true); System.out.println(insertResult); } Long end = System.currentTimeMillis(); System.out.println(" random key task consumption :" + (end-start)); stopwatch.stop(); String result = stopwatch.prettyPrint(); System.out.println(result); }Copy the code
1.3. The program writes the results
It can be seen that when the amount of data is about 100W, the insertion efficiency of UUID is at the bottom, and when 130W data is added in the subsequent sequence, the time of UUDI plummets again.
Auto_key >random_key> UUID, uUID has the lowest efficiency, in the case of a large amount of data, the efficiency plummets. So why does this happen? With that in mind, let’s explore this question:
2. Compare the index structure with uUID and increment ID
2.1. Internal structure using the increment ID
The values of the incremented primary keys are sequential, so Innodb stores each record at the end of a record. When the maximum fill factor of a page is reached (InnoDB defaults to 15/16 of the page size, leaving 1/16 of the page space for future changes) :
① The next record will be written to a new page. Once the data is loaded in this way, the primary key page will be almost filled with sequential records, increasing the maximum fill rate of the page, and there will be no page waste
(2) The inserted row must be one row below the original maximum data row. Mysql is quick to locate and address the new row, so there is no extra cost to calculate the position of the new row
③ Reduce the generation of page splitting and fragmentation
2.2. Internal index structure using UUID
Because uuid is random relative to sequential increment ids, the value of a new row does not have to be greater than the value of the previous primary key, so InnoDB cannot always insert a new row to the end of the index. Instead, it needs to find a new appropriate place for the new row to allocate new space.
This process requires a lot of extra operations, and the data being out of order leads to the data distribution being disorganized, which can lead to the following problems:
① The written target page may have been flushed to disk and removed from the cache, or not yet loaded into the cache. Innodb will have to find and read the target page from disk into memory before inserting, which will cause a lot of random I/O
Innodb has to split pages frequently to allocate space for new rows because writes are out of order. Splitting results in moving a large amount of data, and at least three pages need to be modified at a time
After loading random values (UUID and snowflake ID) into the clustered index (innoDB default index type), sometimes you need to do an OPTIMEIZE TABLE to rebuild the TABLE and optimize the page population. This will take some time. Conclusion: Use InnoDB to insert rows in the incrementally increasing order of primary keys as much as possible, and insert rows as monotonously as possible with the values of the incrementally increasing cluster keys
2.3. Disadvantages of using an auto-increment ID
So there’s no harm in using an incremented ID at all? No, there are several problems with an increment id:
① Once others climb your database, you can get your business growth information according to the self-increasing ID of the database, and it is easy to analyze your business situation
(2) For high concurrent loads, InnoDB will cause obvious lock contention when pressing the primary key to insert. The upper bound of the primary key will become a hot spot for contention because all inserts take place here. Concurrent inserts will cause gap lock contention
If innodb_autoinc_lock_mode is set to innodb_autoinc_lock_mode, innodb_AUTOinc_lock_mode is set to innodb_AUTOinc_lock_mode, innodb_AUTOinc_lock_mode is set to innodb_AUTOinc_lock_mode
Third, summary
This blog first raises questions from the beginning, builds the table to uses the jdbcTemplate to test the different ID generation strategy in the large amount of data insert performance, and then analyzes the different ID mechanism in mysql index structure and advantages and disadvantages, in-depth explanation why uUID and random non-repeat ID in the data insert performance loss. Explains the problem in detail.
In the actual development or according to the official recommendation of mysql, it is best to use the self-increasing ID. Mysql is extensive and profound, and there are many points worth optimizing in the internal need for us to learn.