Here’s a question from today’s group that might seem confusing at first glance:

Environment: MySQL 5.7.25

1. Create a test table

Test_user (test_user, test_user, test_user, test_user, test_user, test_user);

CREATE TABLE `test_memory` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'primary key id'.`create_time` datetime NOT NULL COMMENT 'Creation time'
  PRIMARY KEY (`id`))ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Copy the code

2. Create indexes

Then add index cT_index to create_time

create index ct_index on test_user(create_time);
Copy the code

3. Analysis of SQL

Select * from test_user where create_time > ‘2007-07-02 13:07:51’; select * from test_user where create_time > ‘2007-07-02 13:07:51’; The time 2007-07-02 13:07:51 is just an arbitrary value:

mysql> explain select * from test_user where create_time > '2007-07-02 13:07:51'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------- -----+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------- -- -- -- -- -- + | | 1 SIMPLE | test_user | NULL | | ALL ct_index | NULL | NULL | NULL | 317844 | | 49.19 Using the where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------- -----+ 1 row in set, 1 Warning (0.00 SEC)Copy the code

I am surprised to see that the type of the execution plan is ALL: I added an index to the create_time column, and the range query is also based on create_time. I did not use the upper index (if I used the upper index, type is range).

Select * from test_user where create_time < ‘2007-07-02 13:07:51’; select * from test_user where create_time < ‘2007-07-02 13:07:51’; Do the analysis, here greater than sign changed to less than sign:

mysql> explain select * from test_user where create_time < '2007-07-02 13:07:51'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------- -----+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------- -- -- -- -- -- + | | 1 SIMPLE | test_user | NULL | | ALL ct_index | NULL | NULL | NULL | 317844 | | 50.00 Using the where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------- -----+ 1 row in set, 1 Warning (0.00 SEC)Copy the code

Same conclusion, no index. So far, I can’t see any pattern, so I changed the query condition of where clause to test more, and changed the year 2007 to 2017:

mysql> explain select * from test_user where create_time < '2017-07-02 13:07:51'; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------- -----+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------- -- -- -- -- -- + | | 1 SIMPLE | test_user | NULL | | ALL ct_index | NULL | NULL | NULL | 317844 | | 50.00 Using the where | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------- -----+ 1 row in set, 1 warning (0.00 SEC) mysql> explain select * from test_user where create_time > '2017-07-02 13:07:51'; +----+-------------+-----------+------------+-------+---------------+----------+---------+------+-------+----------+---- -------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+-------+---------------+----------+---------+------+-------+----------+---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 1 SIMPLE | test_user | NULL | range | ct_index | ct_index | | NULL | 30614 | | 100.00 Using index condition | +----+-------------+-----------+------------+-------+---------------+----------+---------+------+-------+----------+---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set, 1 warning (0.00 SEC)Copy the code

Select * from test_user where create_time > ‘2017-07-02 13:07:51’; select * from test_user where create_time > ‘2017-07-02 13:07:51’; Ct_index = range; ct_index = ct_index; ct_index = ct_index; How does it work?

I found that in the explain report of the three SQL where the index failed, the estimated number of rows scanned was 317,844, that is, close to the full table scan of 320,000 rows (or full table scan, but only the estimated value of 317,844 is shown here). The explain report for the only SQL that uses the index shows 30,614 estimated scans. That should be the beginning of the problem. So I went to the MySQL website.

To find the answer

I finally found the answer in chapter 8 optimization

WHERE Clause Optimization 8.2.1.1 WHERE Clause Optimization 8.2.1.1 WHERE Clause Optimization

Each table index is queried, and the best index is used unless the optimizer believes that it is more efficient to use a table scan. Every index in the table is accessed, and the best one is used, unless the optimizer decides that using a full table query is more efficient than using all queries. At one time, a scan was used based on whether the best index spanned more than 30% of the table, but a fixed percentage no longer determines the choice between using an index or a scan. Once, a full table scan was determined by whether more than 30% of the table was detected using the best index, but now this fixed percentage (30%) no longer determines whether to use an index or a full table scan. The optimizer now is more complex and bases its estimate on additional factors such as table size, number of rows, and I/O block size. The optimizer is now more complex and takes into account more factors such as table size, number of rows, IO block size.Copy the code

If you still feel confused after reading my translation, here is a little more plain English:

The index we built does not always work, there is a query optimizer in the middle, which determines whether a query SQL will run faster with the index, if so, run the index, otherwise do a full table scan. In the past, there was a percentage (30%) that determined whether the SQL would run an index or a full table scan, meaning that if there were 100 rows in total and more than 30 rows were queried by the index, it was better not to run the index. But MySQL doesn't do that anymore. It doesn't just use this percentage to decide whether to drop the index.Copy the code

At this point, let’s go back to the ratio of the number of records to the total number of queries in the above four SQL statements:

mysql> select -> (select count(*) from test_user where create_time > '2007-07-02 13:07:51')/(select count(*) from test_user) as '>2007', -> (select count(*) from test_user where create_time < '2007-07-02 13:07:51')/(select count(*) from test_user) as '< 2007', -> (select count(*) from test_user where create_time > '2017-07-02 13:07:51')/(select count(*) from test_user) as '> 2017', -> (select count(*) from test_user where create_time < '2017-07-02 13:07:51')/(select count(*) from test_user) as '< 2017; + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | > 2007 | | < 2007 > 2017 | 2017 | + -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- + | | 0.2484 0.7515 0.0508 0.9492 | | | + -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- + 1 row in the set (0.30 SEC)Copy the code

Select * from test_user where create_time > ‘2017-07-02 13:07:51’; The query results in only 5% of the total data, which the query optimizer considers to be comparable to the value of a full table scan, so it takes the index.

Finally, it is concluded that the index failure is not because the field type is time type, but because the query optimizer will judge the execution plan of SQL and choose the best and fastest query method. When the cost of index run is higher than that of full table scan, it will not execute SQL by index run.