How can YOU avoid writing slow SQL during development?

This article has participated in the third “topic writing” track at the Diggings Creators Camp. For more details, check out diggings program | Creators Camp.

“Do not know the importance of an index when the library has more data, and do not know the importance of an index when the library has more data. By knowing the tips in this article, you can create efficient and appropriate indexes so that you don’t have to write slow SQL.”

1. When using index columns for query, try not to use expressions, and put the calculation in the business layer rather than the database layer

Actor_id +4 in type is less efficient than const where (actor_id+4 in type)

2. Try to use primary key queries instead of other indexes. Primary key queries do not return to table queries.

All of our tables will have primary keys, so usually develop indexes can use indexes, can use primary key index to use primary key index.

3. Use the prefix index

Most of the time, our indexes are actually strings. Inevitably, there will be long strings, which will cause the index to occupy too much and reduce its efficiency. Especially for long columns like blob,text, varchar. Instead of using the full value of the field as an index, you can just take the first half of the field (the part of the index that is selected is close to the selectivity of the entire column). This greatly reduces the index space and thus improves efficiency, but the disadvantage is that it reduces the selectivity of the index.

Index option: The ratio (#T) of the number of duplicate index values to the total number of table records, ranging from 1/#T to 1. The more selective the index, the more efficient the query, because the data is more distinguishable, more rows can be filtered out. Unique indexes have the selectivity of 1 and are the best performers.

For example, in the email field of the company’s employee table, the suffix of a company’s email is the same, such as [email protected]. In fact, using email as the index is only valid for XXXX, because @qq.com is the same, which is meaningless for the index. Obviously, only XXXX is used as the index, and its selectivity is the same as the whole value. But XXXX as an index obviously reduces the index space.

Let’s use the Employee table as an example (see table structure and data at the end of the article).

Let’s take the email field index as an example:

The email address of this data is actually the mobile phone number [email protected] as an example, in fact, after the first 11 are the same. I use the following SQL to see the selectivity of the data (taking the first 10, 11, and 12 bits, respectively).

-- select 1 for 11 prefixes, add field length, Select count(distinct left(email,10))/count(*) as e10, count(distinct left(email,11))/count(*) as e11, count(distinctleft(email,12))/count(*) as e12 from employee;Copy the code

From the figure above, we can see that the selectivity of the top 10, the top 11 and the top 12 are 0.14, 1.0 and 1.0 respectively. When the index selectivity is at the 11th position, it is the highest 1, so it is not necessary to use all of them as indexes, which increases the index space.

Create prefix index
alter table employee add key(email(11));
Copy the code

We can also use count to calculate the frequency (the fewer the number of occurrences, the lower the repetition rate, and the greater the selectivity)

-- Find the frequency of the prefix
select count(*) as cnt,left(email,11) as pref from employee group by pref order by cnt desc limit 10;
Copy the code

4. Use index scan to sort

We often have the need to sort, use order by, but order BY has a performance impact, it is by loading data into memory to sort, if the amount of data is too large to store, can only be divided into multiple processing. But the index itself is ordered, and it is easier to sort directly through the index.

Scanning the index itself is fast because you only need to move from one index record to the next, but if the index does not cover all the columns required for the query, you have to go back to the table to query the corresponding row for every index record scanned, which is basically random IO. As a result, reading data in index order is generally slower than sequential full table scans.

Mysql can use the same index both for sorting and for finding rows. Consider creating such an index if you can.

Mysql can use an index to sort results only if the index columns are in the same order as the Order BY clause and all columns are in the same order (backwards or forwards). If the query needs to associate multiple tables, index sort can only be used if the fields in the Order BY clause are all from the first table. The order BY query also needs to satisfy the leftmost prefix of the composite index, otherwise index sort cannot be used.

In fact, in the development of the main two points:

Fields in the WHERE condition and fields in order BY can be combined indexes and satisfy the leftmost prefix.
The fields in order BY must be in the same order, and cannot contain desc and ASC.

5. Union all,in,or can use indexes, but in is recommended

As above, union all is executed twice, while in and OR are executed only once. Meanwhile, we can see that the execution plan of OR and in is the same,\

But we’re looking at their execution times. In the figure below, use set profiling=1 to see the detailed time, and use show profiles to see the specific time. As can be seen from the figure below, the time of OR is 0.00612000, while the time of IN is 0.00022800, which is still a big difference (the test table data is only 200 rows).

Union ALL: The query is divided into two stages. In fact, there is a union. It is recommended to use union ALL when the union must be used in daily development, because of the extra deduplicate step in the union. So try to use union all.

6. Range columns can be indexed

Scope conditions: >,>=,<,<=,between

You can use an index for a range column, but not for a column following a range column (an index can be used for a range column at most).

Where age>18 and name=” age “;

Once the interview was asked is not equal to whether can walk a certain index, usually did not pay attention to also did not answer successfully, this time personally do an experiment, about the conclusion see the end of the article.

7. Casts will scan the full table

I defined the Mobile field in the EMPLOYEE table to be of type VARCHar and indexed it, and I looked it up with a number and a string, respectively.

Look at the result: the two types are different, and only strings are indexed.

If the value of the condition is not the same as the type defined in the table, mysql will force the cast, but the result will not be the index. In the index development, we need to input the corresponding type value according to the type defined by us.

8. The data distinction is not high, and the fields that are updated frequently should not be indexed

Index column update will change the B+ tree, frequent update will greatly reduce database performance.
Things like gender (only men and women, or unknown) do not filter data effectively.
Count (distinct(column name))/count(*)

9. The column of the index is not allowed to be null, which may result in unexpected results

That is, the field of the index should not be empty as far as possible, there may be some unexpected problems, but it is not possible not to be empty in the actual work, so according to the actual business to deal with it, try to avoid this situation.

10. When you need to join tables, it is best not to have more than three tables

Table join is actually circular nesting match of multiple tables, which affects performance. In addition, the data types of the fields to be joined must be consistent to improve query efficiency. The principle of table joins behind a dedicated to write a bar.

11. Use limit when you can.

The purpose of limit is not just to use paging, but to limit output.

If you add limit 1 to the index, the index pointer will stop after finding the data that meets the criteria, and will not continue to judge, and will return directly. If there is no limit, it will continue to judge.

However, if the page is used to fetch the 5 limit 10000 and 10005 after 10000, it needs to be careful. It will go through 10000 and fetch the 5 limit, which is very inefficient. Tip: If the primary key is sequential, you can get the paging data directly through the primary key.

12. Try to limit the number of indexes in a single table to 5

Creating/maintaining indexes is also costly and takes up space. More indexes is not always better. Use them wisely.

13. A combined index cannot contain more than five fields

The more fields, the larger the index, and the more storage space it takes.

Index is not the more the better, and the index does not need to start the build table when they all designed, premature optimization is not efficient indexing instead, need to understand the business, according to relevant business make a SQL to build relevant index after balance, so consider more comprehensive, set up the index of the more effective and efficient.

These are the little details of index optimization that will hopefully help you write swish SQL.

supplement

Is not equal to whether to walk the index problem

Conclusion: Only the primary key will walk. Neither the unique key nor the normal index will walk.

I created the unique index employee_NUM and the joint index employee_num+name in the Employee table, and the result is the execution of the following figure.

The employee table structure

CREATE TABLE `employee` (
`employee_id` bigint(20) NOT NULL AUTO_INCREMENT,  `employee_num` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL COMMENT 'Employee code', 
`name` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL COMMENT 'Employee name', 
`email` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL COMMENT 'E-mail', 
`mobile` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL COMMENT 'Mobile phone',
`gender` tinyint(1) NOT NULL COMMENT 'Gender, 0: male 1: female'.PRIMARY KEY (`employee_id`) USING BTREE, 
INDEX `email`(`email`(11)) USING BTREE,
INDEX `employee_u1`(`employee_num`, `name`) USING BTREE, 
UNIQUE INDEX `employee_u2`(`employee_num`) USING BTREE, 
INDEX `employee_u3`(`mobile`) USING BTREE) ENGINE = InnoDB AUTO_INCREMENT = 0 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci COMMENT = 'Staff Table' ROW_FORMAT = Dynamic;
Copy the code

The employee data is as follows:

INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (10.'001'.Employees' A '.'[email protected]'.'15500000001'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (11.'002'.Employees' B '.'[email protected]'.'15500000002'.0);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (12.'003'.Employees' C '.'[email protected]'.'15500000003'.0);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (13.'004'.'D employees'.'[email protected]'.'15500000004'.0);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (14.'005'.Employees' E '.'[email protected]'.'15500000005'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (15.'006'.'F employees'.'[email protected]'.'15500000006'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (16.'007'.Employees' G '.'[email protected]'.'15500000007'.0);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (17.'008'.Employees' H '.'[email protected]'.'15500000008'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (18.'009'.'employees I'.'[email protected]'.'15500000009'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (19.'010'.Employees' J '.'[email protected]'.'15500000010'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (20.'011'.'K employees'.'[email protected]'.'15500000011'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (21.'012'.Employees' L '.'[email protected]'.'15500000012'.1);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (22.'013'.'M employees'.'[email protected]'.'15500000013'.0);
INSERT INTO `sakila`.`employee`(`employee_id`, `employee_num`, `name`, `email`, `mobile`, `gender`) VALUES (23.'014'.Employees' N '.'[email protected]'.'15500000014'.1);
Copy the code