MySQL Advanced series: How to create a more appropriate index

Don’t know the importance of an index when there is more data in the database, and don’t know the importance of an index when there is more data in the database. This article explains how to create efficient and appropriate indexes

1. Avoid expressions when using indexed columns for queries, and put calculations in the business layer rather than the database layer

Below two SQL result is the same, but the two SQL execution plan is not the same, the efficiency of the index in the type is much less than the const actor_id + 4 expression in the where condition affects the execution plan, please refer to explain the meaning of explanation for the type said one

2. Try to use primary key query instead of other indexes, primary key query does not appear back table query.

All of our tables will basically have primary keys, so usually development can use the index index, can use the primary key index index.

3. Use prefix indexes

Most of the time, our indexes are actually strings, and it is inevitable that there will be long strings, which will cause the index to occupy too much, reducing its efficiency. Especially for long columns like BLOb,text, and VARCHar. Instead of using the full value of the field as the index, just take the first half of the field (this part of the selected prefix index is nearly as selective as the entire column). This can greatly reduce the index space, thus improving efficiency, the disadvantage is reduced index selectivity.

Index selectivity: The ratio (#T) of non-duplicate index values to the total number of data table records, ranging from 1/#T to 1. The more selective the index is, the more efficient the query is, because the data is highly differentiated and more rows can be filtered out. Unique indexes have a selectivity of 1 and have the best performance.

For example, in the email field of the employee table of the company, the suffix of the email of a company is the same, such as xxxx@qq.com. In fact, using email as the index is effective only for the XXXX part, because @qq.com is the same, it is meaningless to the index. Obviously, only using XXXX as the index, its selectivity is the same as that of the whole value. But XXXX as an index obviously reduces the index space.

Here we have the employee table as an example (see table structure and data at the end)

Let’s take the email field index as an example:

The email address of this data is actually the mobile phone number +@qq.com as an example, in fact, the first 11 digits are the same. I use the following SQL to see the selectivity of the data (taking the first 10, 11 and 12 bits, respectively).

The selectivity is 1 for 11 prefixes, and does not change as the field length increases
select count(distinct left(email,10))/count(*) as e10, count(distinct left(email,11))/count(*) as e11,      count(distinctleft(email,12))/count(*) as e12 from employee;
Copy the code

From the figure above, we can see that the selectivity of the first 10, 11 and 12 is 0.14, 1.0 and 1.0 respectively. The index selectivity is the highest 1 in the 11th place, so it is unnecessary to use all of them as indexes, which increases the index space.

Create prefix index
alter table employee add key(email(11));
Copy the code

We can also use count to count frequency (the fewer occurrences, the lower the repetition rate and the higher the selectivity)

Find the frequency of prefix occurrences
select count(*) as cnt,left(email,11) as pref from employee group by pref order by cnt desc limit 10;
Copy the code

4. Use index scans to sort

We often have the need to sort, and order by is used, but order by affects performance. It is sorted by loading data into memory. If the amount of data is too large to store, it can only be processed several times. But the index itself is ordered, through the index to complete the sorting is easier.

Scanning the index itself is fast because it only needs to move from one index record to the next, but if the index does not cover all the columns required by the query, you have to query the table for each row scanned, which is basically random I/O. Therefore, reading data in index order is usually slower than sequential full table scans.

Mysql can use the same index to both sort and find rows. Consider creating such an index if you can.

Mysql can use indexes to sort results only if the index column order is exactly the same as the order by clause, and all columns are sorted in the same direction (either backwards or forward). If a query requires multiple tables to be associated, index sort can only be used if the fields in the ORDER BY clause are all the first tables. The ORDER by query also needs to satisfy the left-most prefix of the composite index, otherwise index sort cannot be used either.

In fact, in the development of the main two points:

The fields in the WHERE condition and the fields in the Order BY can be combined indexes and satisfy the leftmost prefix.
The fields in order by must be in the same order. Both DESC and ASC cannot exist.

5. Indexes for union all,in,or can be used, but in is recommended

As above, union all is executed twice, while in and or are executed only once. Also see that the execution plan of OR and in is the same,\

But we’re looking at their execution time. Use set profiling=1 to see the time details, and use show Profiles to see the time details. As can be seen from the figure below, the time of OR is 0.00612000 and that of IN is 0.00022800, which is still a big difference (the test table data is only 200 rows).

Union All: The query is divided into two stages. In fact, there is still a union. It is recommended to use Union all when the union must be used in normal development, because there are additional steps for distinct removal in union. So try to use union all.

6. Range columns can use indexes

Range conditions: >,>=,<,<=,between

A range column can use an index, but a column following a range column cannot use an index (an index can be used for at most one range column)

Select * from age where age>18 and name=” age “; select * from age where name=” age “;

Once I was asked in an interview whether I could walk a certain index or not. I did not notice it and did not answer successfully at ordinary times. This time I did an experiment in person.

7. The cast will scan all tables

I defined the mobile field in the Employee table to be of type VARCHAR and indexed it, using numbers and strings respectively.

Look at the result: the types are different, and only strings use indexes.

If the type of the condition value is not the same as that defined in the table, then mysql will force the type conversion, but the result will not be the index. During the development of the index, we need to input the corresponding type value according to the type defined by ourselves.

8. The data distinction is not high, and the fields that are frequently updated should not be indexed

Index column updates change the B+ tree, and frequent updates can significantly degrade database performance.
Things like gender (only male and female, or unknown) do not filter data effectively.
If the index is more than 80% distinct, use count(distinct(column name)/count(*).

9. Columns that create indexes cannot be null, which may result in unexpected results

That is, the index field should not be empty as far as possible, there may be some unexpected problems, but it is not likely to be empty in the real work, so according to the actual business, try to avoid this situation.

10. When you need to join tables, it is best to have no more than three tables

Table join is actually a circular nesting match of multiple tables, which affects performance. In addition, the data types of fields to be joined must be consistent to improve query efficiency. Write a special article on the principle of table join.

11. Use limit whenever you can.

Limit doesn’t just use pagination, it essentially limits output.

If limit 1 is applied to only one item of the query, the index pointer will stop when it finds the item that matches the query. If there is no limit, the judgment will continue.

However, if the number of pages after 10000 is 5, you need to be careful, it will be 10000 after 10000, 5, very low efficiency. Tip: If the primary key is sequential, you can get paging data directly from the primary key.

12. Limit the number of indexes in a single table to five

Building/maintaining indexes is also expensive and takes up space. More indexes are not always better. Use them wisely.

13. The number of fields in a single composite index cannot exceed five

The more fields, the larger the index, the more storage space it takes up.

Index is not the more the better, and the index does not need to start the build table when they all designed, premature optimization is not efficient indexing instead, need to understand the business, according to relevant business make a SQL to build relevant index after balance, so consider more comprehensive, set up the index of the more effective and efficient.

These are the little details of index optimization that hopefully will help you write whizzy SQL.

supplement

About not being equal to whether to go to the index

Conclusion: Only primary keys move, unique keys and normal indexes do not move.

I create the unique index employee_num and the joint index employee_num+name in the EMPLOYEE table, and the result is the following.

The employee table structure

CREATE TABLE `employee`  (  
`employee_id` bigint(20) NOT NULL AUTO_INCREMENT, 
`employee_num` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL COMMENT 'Employee Code',
`name` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL COMMENT 'Name of Employee',  
`email` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL COMMENT 'Email', 
`mobile` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL COMMENT 'Mobile phone', 
`gender` tinyint(1) NOT NULL COMMENT 'Gender, 0: male 1: female'.PRIMARY KEY (`employee_id`) USING BTREE, 
INDEX `email`(`email`(11)) USING BTREE,  INDEX `employee_u1`(`employee_num`, `name`) USING BTREE,
UNIQUE INDEX `employee_u2`(`employee_num`) USING BTREE,  INDEX `employee_u3`(`mobile`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 0 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci COMMENT = 'Staff list' ROW_FORMAT = Dynamic;
Copy the code

Employee data is as follows:

INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (10.'001'.Employees' A '.'15500000001@qq.com'.'15500000001'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (11.'002'.Employees' B '.'15500000002@qq.com'.'15500000002'.0);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (12.'003'.Employees' C '.'15500000003@qq.com'.'15500000003'.0);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (13.'004'.'D employees'.'15500000004@qq.com'.'15500000004'.0);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (14.'005'.Employees' E '.'15500000005@qq.com'.'15500000005'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (15.'006'.'F employees'.'15500000006@qq.com'.'15500000006'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (16.'007'.Employees' G '.'15500000007@qq.com'.'15500000007'.0);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (17.'008'.Employees' H '.'15500000008@qq.com'.'15500000008'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (18.'009'.'employees I'.'15500000009@qq.com'.'15500000009'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (19.'010'.Employees' J '.'15500000010@qq.com'.'15500000010'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (20.'011'.'K employees'.'15500000011@qq.com'.'15500000011'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (21.'012'.Employees' L '.'15500000012@qq.com'.'15500000012'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (22.'013'.'M employees'.'15500000013@qq.com'.'15500000013'.0);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (23.'014'.Employees' N '.'15500000014@qq.com'.'15500000014'.1);
Copy the code

Welcome to our official account: Mr. Ji’s Notes

MySQL advanced related more content, such as lock, MVCC, read and write separation, sub-library sub-table, etc.