1. Introduction
1. Best left prefix
2. Indexes are pushed down
2.1 What is Index Pushdown
Index Conditional Push (ICP)
Index condition pushdown (ICP) is introduced on Mysql5.6 to optimize queries.
-
In the absence of ICP, when a query is performed using a non-primary key index (also known as a normal index or secondary index), the storage engine retries data from the index and returns it to the MySQL server, which then determines whether the data meets the criteria.
-
In the case of using ICP, if there is some indexed column judgment conditions, MySQL server pass this part of the judgment condition to the storage engine, and then by the storage engine through judging index meets the requirements for the MySQL server transfer only when the index accords with a condition the data retrieved is returned to the MySQL server.
Push optimization based on index conditions can reduce the number of times the storage engine queries the base table, and also reduce the number of times the MySQL server receives data from the storage engine.
2.2 Case Analysis
Before you start, prepare a user table (user), where the main fields are ID, name, age, address. Create a federated index (name, age).
Suppose you have a requirement to match all users whose first name is Chen. The SQL statement is as follows:
SELECT * from user where name like Chen '%'
Copy the code
- In accordance with the “best left prefix” principle, the query is performed using the federated index (name, age), which is definitely better than the full table scan.
So the question is, what if there are other conditions? Select * from user where name = Chen and age = 20; select * from user where age = 20;
SELECT * from user where name like Chen '%' and age=20
Copy the code
- How should this SQL statement be executed? The following looks at versions before and after Mysql5.6.
Previous version of Mysql5.6
- Before 5.6, there was no index push optimization, so the process is as follows:
The age field will be ignored and the query will be conducted directly by name. In the tree of (name,age), two results will be found, with id 2 and 1 respectively. Then, the obtained ID value will be returned to the table again and again, so this process needs to return to the table twice.
Mysql5.6 or later
- In version 5.6, the index pushdown optimization was added. The process is shown below:
InnoDB does not ignore the age field, but checks whether the age is equal to 20 inside the index. If the age is not equal to 20, InnoDB will skip it directly. Therefore, InnoDB only matches one record in the index tree (name,age), and then uses this ID to query all data in the primary key index tree.
2.3 practice
Of course, the above analysis is only on principle, we can actual analysis, so Chen installed Mysql5.6 version of Mysql, parse the above statement, as shown in the following figure
- According to the explain parsing result, it can be seen that the value of Extra is Using index condition, indicating that index pushdown has been used.
2.4 summarize
- The optimization of index pushdown on non-primary key indexes can effectively reduce the number of back tables and greatly improve the query efficiency.
- You can disable index pushdown by using the following command. Configuration file modifications are not covered, after all, why disable such a great feature:
set optimizer_switch='index_condition_pushdown=off';
Copy the code
4. Prefix index
4.1 What is a prefix Index
A prefix index is also called a local index. For example, a prefix index is added to the first 10 digits of an ID card. A prefix index is added to the partial information of a column.
4.2 Why prefix indexes
Prefix indexes can effectively reduce the size of index files, so that each index page can store more index values, thus improving the speed of index query. However, prefix indexes have their disadvantages. They cannot be triggered in order BY or Group BY, nor can they be used to override indexes.
4.3 When is it appropriate to use prefix indexes?
Use a prefix index when the string itself is likely to be long and the first few characters start to differ. On the contrary, it is not suitable to use the prefix index. For example, the length of the whole field is 20, and the selectivity of the index is 0.9, but the selectivity of the prefix index for the first 10 characters is only 0.5, so we need to continue to increase the length of the prefix character, but at this time, the advantage of the prefix index is not obvious. There is no need to create a prefix index
For example
If the column to be indexed has many characters, the index will be large and slow. You can only index the beginning part of the column to save index space and improve index efficiency
- Rule: Reduce duplicate index values
For example, ina region table, many area fields start with China. If the index is prefixed with 1 to 5 characters, a large number of index values will be repeated. The lower the repeatability of index values, the higher the query efficiency
Prefix index test
\ CREATE TABLE 'x_test' (\ 'id' int(11) unsigned NOT NULL AUTO_INCREMENT,\ 'x_name' varchar(255) NOT NULL,\ `x_time` int(10) NOT NULL,\ PRIMARY KEY (`id`)\ ) ENGINE=InnoDB AUTO_INCREMENT=4145025 DEFAULT CHARSET=utf8mb4\ \ // \ INSERT INTO x_test(x_name,x_time) SELECT CONCAT(rand()*3300102,x_name),x_time FROM x_test WHERE id < 30000;Copy the code
2 million test data
- Query an index without any indexes
SELECT * FROM x_test WHERE x_name = '1892008.205824857823401.800099203178258.8904820949682635656.62526521254'; ---- Query time :2.253sCopy the code
- Add prefix index (create prefix index with first character)
Alter table x_test add index(x_name(1)) SELECT * FROM x_test WHERE x_name = '1892008.205824857823401.800099203178258.8904820949682635656.62526521254'; -- Query time :3.291sCopy the code
When the prefix index is created using the first character, the query seems to take longer because the index value is too stressed for the first character alone. Two million pieces of data all start with a number so the average 200,000 pieces of data will have the same index.
- Re-create the prefix index this time to create the first 4 characters
alter table x_test add index(x_name(4)); Query the same SQL statement again SELECT * FROM x_test WHERE x_name = '1892008.205824857823401.800099203178258.8904820949682635656.62526521254'; -- Query time :0.703sCopy the code
This time, creating the index in the first 4 bits significantly reduces the repeatable query speed of the index value from 3 seconds to 0.7 seconds
- Two million pieces of data start with a number and there are ten million combinations of data from 0 to 9 with seven digits
In other words, if you index the first 7 bits, you won’t have duplicate index values
alter table x_test add index(x_name(7)); Query the same SQL statement again SELECT * FROM x_test WHERE x_name = '1892008.205824857823401.800099203178258.8904820949682635656.62526521254'; ---- Query time :0.014s (the first execution is in no cache state)Copy the code
4.2 Best Practices
Sometimes you need to index very long columns of characters, which can make the index large and slow. Generally, you can index the beginning part of the characters, which greatly saves the index space and improves the index efficiency. But it also makes the index less selective. The selectivity of an index is the ratio of non-repeating index values (also known as cardinality) to the total number of records in the table, ranging from 1/#T to 1. The more selective the index, the more efficient the query, because the more selective index allows MySQL to filter out more rows during the lookup. The only index selectivity is 1, which is the best index selectivity and the best performance.
In general, the selectivity of a prefix is high enough to meet the query performance. For BLOB, TEXT, or long VARCHAR columns, you must use a prefix index because MySQL does not allow you to index the full length of these columns.
The trick is to choose a prefix long enough to be highly selective, but not too long (to save space). The prefix should be long enough so that the selectivity of the prefix index is close to that of the entire column of the index. In other words, the prefix’s “cardinality” should be close to the “cardinality” of the full column.
To determine the proper length of the prefix, you need to find a list of the most common values and compare them to the list of the most common prefixes. The following example is an official sample database provided by mysql
- Download address is as follows: downloads.mysql.com/docs/sakila…
There is no suitable example in the sample database Sakila, so generate a sample table from the table city, so there is enough data for the demonstration:
mysql> select database();
+------------+
| database() |
+------------+
| sakila |
+------------+
1 row in set(0.00 SEC) mysql> create table city_demo (city varchar(50) not null); Query OK, 0 rows affected (0.02sec) mysql> insert into city_demo (city) select city from city; Query OK, 600 rows affected (0.08 SEC) Records: 600 Duplicates: 0 Warnings: 0 mysql> insert into city_demo (city) select city from city_demo; Query OK, 600 rows affected (0.07 SEC) Records: 600 Duplicates: 0 Warnings: 0 mysql> update city_demoset city = ( select city from city order by rand() limit 1);
Query OK, 1199 rows affected (0.95 sec)
Rows matched: 1200 Changed: 1199 Warnings: 0
mysql>
Copy the code
Because I’m using the RAND () function, your data will be different from mine, but that doesn’t bother you.
Start by finding a list of the most common cities:
mysql> select count(*) as cnt, city from city_demo group by city order by cnt desc limit 10; +-----+--------------+ | cnt | city | +-----+--------------+ | 8 | Garden Grove | | 7 | Escobar | | 7 | Emeishan | | 6 | Amroha | | 6 | Tegal | | 6 | Lancaster | | 6 | Jelets | | 6 | Ambattur | | 6 | Yingkou | | 6 | Monclova | + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + rows in the set (0.01 SEC) mysql >Copy the code
Notice in the query results that each of the above values appears 6-8 times. The frequently occurring city prefix is now found. Start with three prefix letters, then four, five, six:
mysql> select count(*) as cnt,left(city,3) as pref from city_demo group by pref order by cnt desc limit 10; +-----+------+ | cnt | pref | +-----+------+ | 25 | San | | 15 | Cha | | 12 | Bat | | 12 | Tan | | 11 | al- | | 11 | Gar 11 | Yin | | | | | 10 Kan | | | 10 Sou | | | 10 Bra | + -- -- -- -- - + -- -- -- -- -- - + 10 rows in the set (0.00 SEC) mysql > select count (*) as cnt,left(city,4) as pref from city_demo group by pref order by cnt desc limit 10; +-----+------+ | cnt | pref | +-----+------+ | 12 | San | | 10 | Sout | | 8 | Chan | | 8 | Sant | | 8 | Gard | | 7 | Emei | | | 7 Esco | | | 6 Ying, | | | 6 Amro | | | 6 Lanc | + -- -- -- -- - + -- -- -- -- -- - + 10 rows in the set (0.01 SEC) mysql > select count(*) as cnt,left(city,5) as pref from city_demo group by pref order by cnt desc limit 10; +-----+-------+ | cnt | pref | +-----+-------+ | 10 | South | | 8 | Garde | | 7 | Emeis | | 7 | Escob | | 6 | Amroh | | 6 6 | Yingk | | | Moncl | | | 6 Lanca | | | 6 Jelet | | | 6 Tegal | + + -- -- -- -- -- -- -- -- -- -- - + 10 rows in the set (0.01 SEC)Copy the code
mysql> select count(*) as cnt,left(city,6) as pref from city_demo group by pref order by cnt desc limit 10; +-----+--------+ | cnt | pref | +-----+--------+ | 8 | Garden | | 7 | Emeish | | 7 | Escoba | | 6 | Amroha | | 6 | 6 Yingko | | | Lancas | | | 6 Jelets | | | 6 Tegal | | | 6 Monclo | | | 6 Ambatt | + + -- -- -- -- -- -- -- -- -- -- -- -- -- + rows in the set (0.00) sec) mysql>Copy the code
By changing the prefix length above, it is found that when the prefix length is 6, the selectivity of the prefix is close to that of the full column. It’s even the same.
There is, of course, a more convenient way to calculate the selectivity of the full column and make the selectivity of its prefixes close to the selectivity of the full column. The following shows how to calculate the selectivity of a full column:
mysql> select count(distinct city) / count(*) from city_demo; + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | count (distinct city)/count (*) | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 0.4283 + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + row in the set (0.05 SEC) mysql >Copy the code
This is useful for large tables. Here is how to calculate the selectivity for different prefix lengths in the same query:
mysql> select count(distinct left(city,3))/count(*) as sel3, -> count(distinct left(city,4))/count(*) as sel4, -> count(distinct left(city,5))/count(*) as sel5, -> count(distinct left(city,6))/count(*) as sel6 -> from city_demo; + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | sel3 | sel4 | sel5 | sel6 | + -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- + | | 0.3367 0.4075 0.4208 0.4267 | | | + -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- + 1 row in the set (0.01 SEC) mysql >Copy the code
You can see that when the index prefix is 6, the cardinality is 0.4267, which is close to the full column option 0.4283.
In the example above, we have found the appropriate prefix length, so let’s create the prefix index:
mysql> alter table city_demo add key (city(6)); Query OK, 0 rows affected (0.19sec) Records: 0 Duplicates: 0 Warnings: 0Copy the code
mysql> explain select * from city_demo where city like 'Jinch%'; +----+-------------+-----------+-------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | city_demo | range | city | city | 20 | NULL | 2 | Using where | + - + -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set (0.00) sec)Copy the code
You can see that the index you just created is being used correctly.
Prefix indexing is an effective way to make indexes smaller and faster, but it has its disadvantages:
Mysql cannot use its prefix index for ORDER BY and GROUP BY, nor can it use its prefix index for override scans.