@[toc] When we build index, there are full text index, primary key index, unique index, common index, etc., the front two easy to understand and distinguish, we all know when to use, the back two how to distinguish? How to choose between a unique index and a plain index? Let’s talk about that today.

1. Preparation

Suppose I have the following table:

CREATE TABLE `user` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `address` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `age` int(4) DEFAULT NULL.PRIMARY KEY (`id`),
  UNIQUE KEY `username` (`username`),
  KEY `address` (`address`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Copy the code

There are 100,000 simulations in this table, 100,000 simulations you can create and I won’t bore you.

Select * from username where username is unique; There is also an Address index, which is a normal index.

2. The query

2.1 Common Index Query

Let’s take a look at a query with a normal index first.

Let’s do a simple query:

select * from user where address='1';
Copy the code

According to our previous explanation (index push down, YYds!) , let’s comb through the query steps here:

  1. The server layer of MySQL first calls the storage engine to locate the first address with a value of 1.
  2. MySQL > alter table address (address, address, address, address, address, address, address, address); .
  3. The storage engine returns the read rows to the Server tier.
  4. Since address is a common index, rather than a unique index, there may be more than one record with address as 1. Therefore, on the basis of the first query, it is necessary to continue to scan backward along the one-way linked list inside the leaf node. After scanning the new data, repeat steps 2 and 3.
  5. When a record whose address is not 1 is scanned, the scan is stopped.

Above is our analysis. Let’s take a look at the execution plan:

The type in the execution plan is ref, indicating that our analysis is ok.

2.2 Unique Index Query

Let’s look at unique index queries.

Let’s start with an SQL statement:

select * from user where username='1';
Copy the code

Select * from username where username=’1′; select * from username where username=’1′; select * from username where username=’1′;

Let’s look at the query plan:

The type of the query plan is const compared to that of the normal index.

2.3 PK

So from the description above, it seems that unique indexes perform better in queries? How is the situation really? Let’s analyze it again.

First, in theory, unique indexes do perform better in queries for a simple reason: unique indexes find records that meet the criteria and don’t need to look again. If a normal index finds a record that meets the criteria, it must continue searching until it encounters a record that does not meet the criteria (address not 1). So is the difference significant? To be honest, the advantage is negligible!

Why is that?

  1. For ordinary indexes, although after finding the first record, it is necessary to continue to find the next record, but because the records that meet the conditions are continuous, the index only needs to continue to read back along the one-way linked list between records, which is fast.
  2. Since InnoDB engine reads data page by page rather than page by page (default 16KB per page), what is MySQL’s “back table”? In this article, I have a general introduction to the 16KB problem), so even if you continue to read backwards, it is memory manipulation and very fast.
  3. Some cases are not excluded. For example, the record that meets the condition happens to be the last one on the current page, so a new page of data needs to be loaded. However, this probability is relatively small and can be ignored.

In summary, the impact of unique index and ordinary index on search efficiency can be ignored.

3 Insert or modify

3.1 Preparing Knowledge

3.1.1 buffer pool

There is one buffer pool you need to know.

InnoDB engine stores data in pages. The default size of each data page is 16KB. We can check the size of the page by using the following command:

16384/1024 = 16

That’s exactly 16 kilobytes.

When storing data on a computer, the smallest storage unit is a sector, the size of a sector is 512 bytes, while the smallest unit of a file system (e.g. XFS/EXT4) is a block, the size of a block is 4KB, that is, four blocks make up a page in InnoDB. We in MySQL for the database add, delete, change and check operations, are operating data pages, in plain English, is operating disk.

But think about it, if every operation is to operate the disk, then there will be a large number of disk IO operations, if it is a traditional mechanical disk, will also involve a lot of random IO operations, appalling low efficiency. This seriously affects MySQL performance.

To solve this problem, MySQL introduced buffer pools, also known as buffer pools.

The main function of a buffer pool is to cache index and table data to avoid disk I/O for each operation. The buffer pool can improve the data access speed.

To view the default size of the buffer pool, run the following command:

134217728/1024/1024 = 128

The default size is 128MB, since MySQL is installed in Docker, this allocation is smaller. Generally speaking, if a server is running only one MySQL service, we can set the buffer pool size to 75% ~ 80% of the server memory size.

3.1.2 change buffer

There is also a change buffer that needs to be understood.

Although the buffer pool mentioned above improves the access speed, it does not improve the efficiency of adding, deleting, and modifying. When it comes to adding, deleting, and modifying, disk I/O is still needed, which is also notoriously inefficient.

To solve this problem, MySQL introduced the change buffer. The change buffer used to be called insert buffer, but now it’s called change Buffer, not just for inserts. The change buffer is also valid for delete and update operations. The change buffer is mainly valid for non-unique indexes. If a field is a unique index, it will need to check for uniqueness when updating, and disk IO will still be unavoidable.

Change buffer means that when we need to change the data in the database, we record the change to the memory. When the data is read in the future, we merge the data in the memory into the buffer pool. InnoDB flushes the dirty pages back to disk when the redo log is full, memory is full, or other idle times. In this way, disk I/OS for write operations are reduced and database performance is improved.

We can check the size of change buffer and which operations involve change buffer by running the following command:

  • Innodb_change_buffer_max_size: This configuration represents the ratio of the size of the change buffer to the total buffer pool. The default value is25%, the maximum value is50%.
  • Innodb_change_buffering: This operation indicates which writes will use change Buffer. By default, all indicates all writesnone/inserts/deletes/changes/purgesAnd so on.

However, the change buffer and the buffer pool both involve memory operations and data cannot be persisted. Therefore, when there are dirty pages, if MySQL suddenly hangs, it may cause data loss (because the data in memory has not been written to disk). But when we actually use MySQL, there is no problem with this, so how to solve the problem? That’s the redo log, and Songo will write about redo logs later.

3.2 PK

After reading the introduction of change Buffer above, you should have understood:

  • For non-unique indexes, insert the data directly into the change Buffer. This is an in-memory operation and is very quick.
  • For unique indexes, the data page must be read into memory (this step involves a lot of random I/O, which is inefficient), checked for collisions, and then inserted.

So, obviously, non-unique indexes have an advantage when it comes to inserts.

4. Summary

So for a field that needs to be globally unique, should you use a normal index or a unique index? This I think it is difficult to give you a universal and accurate advice, because database optimization is not absolute many times, to combine their actual business, so, no matter when and where, first meet business needs, on this basis, and then to discuss database optimization.

If you can ensure that the field is commercially unique, you can use a plain index to improve insert/update speed.

However, murphy’s Law states that if you do not use a unique index, there is a high probability that dirty values will appear in this field in the future, so you need to consider the business’s tolerance for dirty values as well.