This is the 13th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

If we can ensure that the database, a citizen corresponds to only one ID number, and the ID number is unique, then if we ask to check the name according to the ID number, as shown in the following SQL:

select * from User where id_card = 'xxxxxxxxxxxxxxxxxxxxxxxxx'
Copy the code

Normally, we would definitely add an index to id_card, but the id number field is too large to be used as the primary key, so should we choose to use it as a unique index or a normal index?

We analyze the performance impact of normal and unique indexes on query and update operations.

What’s the difference between a normal index and a unique index?

Create index statement differences

1.1 Common Indexes

LTER TABLE `test` ;
ADD INDEX `k_title` (`title`) USING BTREE ;
Copy the code

1.2 Unique Index

LTER TABLE `test` ;
ADD UNIQUE INDEX `un_index_title` (`title`) USING BTREE ;
Copy the code

Here we introduce using btree

  • B-tree stores data in a B+ TREE structure, greatly speeding up data query
  • B-tree indexes are better suited for range lookup SQL statements (sequential storage)

B-tree Index usage scenario

  • SQL where act_id= ‘1111_act’

  • For example, idx_actid_name(act_id,act_name) USING BTREE is used as long as the first column of the union index is used. But if the query uses the second column of the union index, act_name, then the SQL cannot use the union index.

  • Where act_name like ’11_act%’

  • SQL queries that match range values, such as where act_date > ‘9865123547215’ (not in and <> cannot use indexes)

Two. Query process

Assume that the statement executing the query is SELECT ID from T where k=5. K is the index field of table T and ID is the primary key of table T. The query looks up the index tree by starting at the root of the B+ tree, searching down the layers to the leaf node, and then, you can think of the data page itself as being positioned by dichotomy. As shown below:

For a normal index, after finding the first record that meets the condition (5,500), the next record is searched until the first record that does not meet the condition k=5 is encountered.

For unique indexes, since the index defines uniqueness, the search stops after the first record that meets the condition is found.

But the performance difference between the two is almost the same.

InnoDB engine reads and writes on a page basis. When a record needs to be read, it is not read from disk, but read into memory on a page basis. Each page size is 16KB. So when a normal index looks for data with k = 5, it has already read all of its data pages into memory, so it only needs to look once to determine the next record. But if k=5 is the last digit on the page, then the next digit is on another page, but it’s still very cheap.

3. Update process

When a data page needs to be updated, it is updated if it is already in memory. If not, InnoDB caches the update operation in Change Buffer. The next time InnoDB needs to access the data page, it reads the data page into memory and performs the change buffer operation related to the page.

Although change Buffer has the word cache, it is persisted in the disk. The operation that applies change Buffer to data pages is called merge. The system also performs merge periodically in the background and when the database is shut down. I personally feel that this operation is very much like a message queue.

Obviously, if update operations can be recorded in the change buffer first to reduce disk reads, the execution speed of statements will be significantly improved. In addition, data reading into memory needs to occupy the buffer pool, so this method can avoid occupying memory and improve memory utilization.

The change buffer pool size can be dynamically set with innodb_change_buffer_max_size, for example, 50, Indicates that the size of the change buffer can only occupy 50% of the buffer pool.

What if we insert a piece of data? Let’s say we insert a record (4,400).

  • In the first case, the target page for which the record is to be updated is in memory.

Unique index: : find the position between 3 and 5, judge that there is no conflict, insert this value, the statement is finished;

Normal index: : finds the position between 3 and 5, inserts the value, and the statement completes.

This situation doesn’t take much time, because it’s just one more judgment.

  • The second case is that the target page for which the record is to be updated is not in memory.

Unique index: : Need to read the data page into memory, determine that there is no conflict, insert this value, statement execution end;

Normal indexes: : The update is logged in the change buffer and statement execution ends.

Unique indexes require data to be read into memory, which is the most expensive operation in a database, so the performance improvement for plain indexes is significant at this point.

The change buffer can only be used in normal index scenarios, not unique indexes. But does the change Buffer under normal indexes always give us a performance boost? The answer is no, it needs to be in line with the business scenario.

Change Buffer caches the action of changes, so when merge is performed with more records in the change buffer, the profit is maximized. So it works best for businesses that write more than they read. In services, when data is read after data is updated, merge is triggered immediately. As a result, I/O times increase and performance deteriorates.

summary

Since unique indexes do not use the optimization mechanism of Change Buffer, it is recommended to give priority to non-unique indexes from the perspective of performance if the business can accept them.