preface

  • I believe everyone has heard of the index, but how many people will really use it? When I write SQL at work, I really think about how this SQL can use the index and how to improve the efficiency of execution.
  • This article describes several principles of index optimization in detail, as long as you can apply them at any time in your work, you will be able to write the most efficient SQL.
  • The brain map of the article is as follows:

Index optimization rules

1. Fuzzy queries leading to like statements cannot use indexes

select * from doc where title like '%XX';Index cannot be used
select * from doc where title like 'XX%';-- Non-leading fuzzy queries can use indexes
Copy the code
  • Because the page search is strictly prohibited left or all fuzzy, if need to use search engines to solve.

2. Union, IN, or can all match the index. It is recommended to use IN

  1. unionThe index can be hit, and MySQL consumes the least CPU.
select * from doc where status=1
union all
select * from doc where status=2;
Copy the code
  1. inCan hit the index, the query optimization CPU ratiounion allMany, but can be ignored, generally recommended to usein.
select * from doc where status in (1.2);
Copy the code
  1. orThe new version of MySQL can match the index and optimize the CPU ratioinMany, not recommended to use frequentlyor.
select * from doc where status = 1 or status = 2
Copy the code
  1. supplement: Some places say inwhereIn conditionsorThe index will fail, causing a full table scan. This is a mistake:
  • (1) All fields used in the WHERE clause must be indexed;

  • If the amount of data is too small, mysql will not use indexes because full table scan is faster than index search.

  • Index_merge_union =on (index_merge_union=on) is enabled in optimizer_switch.

3. The index cannot be used for negative query

  • Negative conditions are:! =, <>, not in, not exists, not like, etc.

  • For example, the following SQL statement:

select * from doc where status ! = 1 and status ! = 2;
Copy the code
  • Can be optimized for in queries:
select * from doc where status in (0.3.4);
Copy the code

4. The left-most prefix principle of the joint index

  • If (a, b, c) established on the three fields of joint index, then he will automatically build a | | (a, b) (group a, b, c) index.

  • The SQL statement for logging in is as follows:

select uid, login_time from user where login_name=? andpasswd=?
Copy the code
  • Can be established(login_name, passwd)The federated index of. Because there is hardly any in businesspasswdThe single condition query requirements, and there are manylogin_nameSingle condition query requirements, so can be established(login_name, passwd)The union index is not(passwd, login_name).
  1. When creating a joint index, the most distinguished field is on the left
  1. When there is a mixed judgment condition of non-equal sign and equal sign, the column of equal sign is put in front of the index. Such aswhere a>? and b=?, then even ifaThe higher degree of differentiation must also be putbAt the top of the index.
  1. When querying with the left-most prefix, it does not mean that the SQL statement’s WHERE order should be the same as the union index.
  • The following SQL statements can also be hit(login_name, passwd)This joint index:
select uid, login_time from user where passwd=? andlogin_name=?
Copy the code
  • But I still suggestwhereFollow the same order as the joint index. Make it a good habit.
  1. ifindex(a,b,c).where a=3 and b like 'abc%' and c=4.aCan be used,bCan be used,cCan’t use.

5. Do not use the column to the right of the range condition in an index

  • The range conditions are:<, <=, >, >=, betweenAnd so on.
  • An index can be used for at most one range column. If there are two range columns in the query condition, the index cannot be used for all of them.
  • Suppose there is a federated index(empno, title, fromdate), then in the following SQLemp_noYou can use an index, andtitlefrom_dateNo index is used.
select * from employees.titles where emp_no < 10010' and title='Senior Engineer'and from_date between '1986- 01- 01' and '1986- 12- 31'
Copy the code

6, do not do any operation (calculation, function) on the index column, otherwise the index will fail and move to the full table scan

  • For example, the following SQL statement, even ifdateSelect * from table where index is created on:
select * from doc where YEAR(create_time) < = '2016';
Copy the code
  • Can be optimized to value calculation, as follows:
select * from doc where create_time < = '2016-01-01';
Copy the code
  • Consider the following SQL statement:
select * from order where date < =CURDATE ();Copy the code
  • It can be optimized as:
select * from order where date < = 'the 2018-01-24 12:00:00';
Copy the code

7. Casts will scan the full table

  • String types without single quotes will invalidate the index because mysql does its own conversion, which is equivalent to an operation on the index column.
  • ifphoneThe fields arevarcharType, then the following SQL cannot match the index.
select * from user where phone=13800001234
Copy the code
  • It can be optimized as:
select * from user where phone='13800001234';
Copy the code

8. Indexes should not be created for columns with frequent updates and low data differentiation

  • Update will change the B+ tree, and the establishment of indexes for frequently updated fields will greatly reduce the database performance.

  • There is no point in creating indexes for the “gender” attribute, which is not very specific. It does not filter data effectively, and the performance is similar to that of a full table scan.

  • Generally, an index can be created when the distinctness is more than 80%. The distinctness can be calculated using count(distinct(column name))/count(*).

9, use overwrite index to query operation, avoid back table, reduce the use of select *

  • Overwrite index: The number of columns to be queried is the same as that of the created index.
  • The data of the column to be queried can be obtained from the index instead of using the row-locator to obtain the data on the row, that is, the column to be queried must be overwritten by the established index. This accelerates the query.
  • For example, to log in to the service, the SQL statement is as follows:
Select uid, login_time from user where login_name=? and passwd=?
Copy the code
  • Can be established(login_name, passwd, login_time)Because of the union indexlogin_timeIt’s already in the index. It’s being querieduidlogin_timeYou don’t have to go torowTo speed up the query.

Index will not contain NULL columns

  • Any column containing a NULL value will not be included in the index, and any column containing a NULL value in a composite index will not be included in the indexNULLValue, the column is invalid for this composite index. So when we design the database, we try to use itnot nullConstraints and default values.

11. Is NULL, is not NULL Cannot use the index

12. If there is an Order by or group BY scenario, take advantage of the order of the index

  1. order byThe last field is part of the composite index and is placed at the end of the index combination order to avoid file_sort, which can affect query performance.
  • For example, for statementswhere a=? and b=? order by cTo create a federated index(a,b,c).
  1. Index order cannot be used if there is a range lookup in the index, as in WHERE a>10 ORDER BY b;Index,(a,b)Unable to sort.

13. Use short indexes (prefix indexes)

  • The columns are indexed, specifying a prefix length if possible. For example, if you have a CHAR(255) column and the column is within the first 10 or 20 characters, do not index the entire column if it is possible to make the prefix index nearly as distinguished as the full column index. Because short indexes not only improve query speed but also save disk space and I/O operations, reducing index file maintenance overhead. You can use count(distinct leftIndex(column name, index length))/count(*) to calculate the distinctness of the prefix index.

  • The disadvantage is that it cannot be used for ORDER BY and GROUP BY operations, nor for overwriting indexes.

  • However, in many cases, it is not necessary to create an index for the whole field. The index length is determined according to the actual text differentiation.

Optimize excessive paging scenarios with delayed correlation or subqueries

  • MySQL is not skippingoffsetRow, let’s takeoffset+NRows, and then return the offset rows before the discard, and return N rows. When the offset is particularly large, it is very inefficient to either control the total number of pages returned, or do SQL rewriting for pages that exceed a certain threshold.
  • The following is an example: Quickly locate the desired itemidSegment, and then correlate:
selecta.* from1 a,(select id from1 whereLimit100000 conditions,20 ) b where a.id=B.i d;Copy the code

15. Limit 1 can improve efficiency if it is clear that only one result will be returned

  • For example, the following SQL statement:
select * from user where login_name=? ;Copy the code
  • It can be optimized as:
select * from user where login_name=? limit 1
Copy the code
  • He knows that there’s only one result, but the database doesn’t know that, so he tells it to stop the cursor actively.

16. Do not join more than three tables

  • The data types of the fields to be joined must be the same. When multiple tables are queried by association, ensure that the associated fields have indexes.

  • For example, the left join is determined by the left, and the data on the left must be available, so the right is our key point, and the index should be built on the right. Of course, if the index is on the left, you can use right join.

17. It is recommended to control the number of single table indexes within 5

18, SQL performance optimization explain type: at least reach the range level, the requirement is ref level, if it can be consts best

  • Consts: There is at most one matching row (primary key or unique index) in a single table and the data can be read during the optimization phase.

  • Ref: Normal Index is used.

  • Range: The index is retrieved by range.

  • When type=index, all physical files of the index are scanned, and the speed is very slow.

The field with unique characteristics in the business, even if it is a combination of multiple fields, must be built into a unique index

  • Don’t think the unique index affects itinsertSpeed, this speed loss is negligible, but the increase in search speed is obvious. In addition, even if the verification control is very perfect at the application layer, as long as there is no unique index, according to Murphy’s Law, dirty data must be generated.

20. Avoid the following misconceptions when creating indexes

  1. The more indexes you have, the better. Create an index if you think you need a query.

  2. Instead, think indexes consume space and seriously slow down updates and additions.

  3. Resist unique indexes, believing that the uniqueness of the business needs to be solved at the application layer through “look first and insert later” method.

  4. Optimize too early, starting to optimize without knowing the system.

Index selectivity and prefix index

  • Since indexes can speed up queries, why not create indexes whenever the query statement needs them? The answer is no. Although indexes speed up queries, they also come at a cost: the index file itself consumes storage space, and the index increases the burden of inserting, deleting, and modifying records. In addition, MySQL also consumes resources to maintain indexes at runtime, so the more indexes you have, the better. You are not advised to create indexes in the two cases.

  • In the first case, if the table has few records, such as one thousand, two thousand, or even a few hundred records, there is no need to create an index and let the query do a full table scan. As for how many records are too many, this individual has his own opinion. My personal experience is that the dividing line is 2000. If the number of records does not exceed 2000, we can consider not to create an index, and if the number exceeds 2000, we can consider indexing.

  • Another case where indexing is not recommended is when the selectivity of the index is low. Selectivity of an index is the ratio of a non-repeating index (also called Cardinality) to the number of entries (#T) in the table:

Index Selectivity = Cardinality / #T
Copy the code
  • And obviously the options are in the range of(0, 1], the higher the selectivity the greater the index value, which is due toB+TreeThe nature of. For example,employees.titlesTable, ifThe title ‘field is often queried separately, so if it needs to be indexed, let’s take a look at its selectivity:
SELECT count(DISTINCT(title))/count(*) AS Selectivity FROM employees.titles;
+-------------+
| Selectivity |
+-------------+
|      0.0000 |
+-------------+
Copy the code
  • Title is less than 0.0001 selective (the exact value is 0.00001579), so there’s really no need to index it separately.

  • An index optimization strategy related to index selectivity is called prefix indexing, which is to replace the entire column with the prefix of the column as the index key. When the prefix length is appropriate, the selectivity of the prefix index is close to that of the full column index, and the size and maintenance cost of the index file are reduced because the index key is shorter. The following uses the employees.employees table as an example to describe how to select and use prefix indexes.

  • Let’s assume that the employees table has only one index <emp_no>, so if we want to search for a person by name, we have to scan the full table:

EXPLAIN SELECT * FROM employees.employees WHERE first_name='Eric' AND last_name='Anido';
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | employees | ALL  | NULL          | NULL | NULL    | NULL | 300024 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
Copy the code
  • It’s obviously inefficient to constantly search for employees by name, so consider indexing. There are two options, build<first_name>or<first_name, last_name>, look at the selectivity of the two indexes:
SELECT count(DISTINCT(first_name))/count(*) AS Selectivity FROM employees.employees;
+-------------+
| Selectivity |
+-------------+
|      0.0042 |
+-------------+
SELECT count(DISTINCT(concat(first_name, last_name)))/count(*) AS Selectivity FROM employees.employees;
+-------------+
| Selectivity |
+-------------+
|      0.9313 |
+-------------+
Copy the code
  • <first_name>‘ ‘<first_name, last_name>Selectivity is good, butfirst_nameandlast_nameThe sum of the length is zero30Is there a way to balance length and selectivity? Consider creating indexes using the first few characters of first_name and last_name, for example<first_name, left(last_name, 3)> ‘
SELECT count(DISTINCT(concat(first_name, left(last_name, 3))))/count(*) AS Selectivity FROM employees.employees; + -- -- -- -- -- -- -- -- -- -- -- -- -- + | Selectivity | + -- -- -- -- -- -- -- -- -- -- -- -- -- + | | + 0.7879 -- -- -- -- -- -- -- -- -- -- -- -- -- +Copy the code
  • The selectivity is good, but it’s still a little far from 0.9313, so add the last_name prefix to 4:
SELECT count(DISTINCT(concat(first_name, left(last_name, 4))))/count(*) AS Selectivity FROM employees.employees;
+-------------+
| Selectivity |
+-------------+
|      0.9007 |
+-------------+
Copy the code
  • The selectivity is ideal, and the index is only of length18, than<first_name, last_name>Short by nearly half, we build this prefix index:
ALTER TABLE employees.employees
ADD INDEX `first_name_last_name4` (first_name, last_name(4));
Copy the code
  • Query by name and compare the results with those before the index was created.
SHOW PROFILES; +----------+------------+---------------------------------------------------------------------------------+ | Query_ID |  Duration | Query | + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 87 0.11941700 | SELECT * FROM employees. Employees WHERE first_name = 'Eric' AND last_name = 'Anido' | | 90 | | 0.00092400 SELECT * FROM employees.employees WHERE first_name='Eric' AND last_name='Anido' | +----------+------------+---------------------------------------------------------------------------------+Copy the code
  • The performance improvements are significant, with query speeds up by more than 120 times.

  • A prefix index is both large and fast, but it cannot be used for ORDER BY and GROUP BY operations, and cannot be used for Covering index (i.e. when the index itself contains all the data required for the query, the data file itself is not accessed).

conclusion

  • This article mainly talks about 20 principles of index optimization, hope readers like it.
  • This articleMind mapsandPDF documentIs ready, in need of partners can follow the wechat public number [code ape technology column] reply keywordsThe index optimizationTo obtain.