Welcome to this article on MySQL technology shared by the GreatSQL community. If you have any questions or questions you would like to learn, please leave them in the comments section below

MySQL optimizer index selection myth.

Peng Gao contributed to this report.

1. Problem description

Select * from c1; select * from c2; select * from c1; select * from c2;

Select * from c1 WHERE c1 =? AND c2 = ? To view the execution plan using EXPLAIN, the optimizer prefered the C2 column index of type VARCHAR.

He was puzzled. Shouldn’t c1 columns of type INT be chosen, which seem less costly?

2. Problem recurrence

Create test table T1:

[[email protected]]> CREATE TABLE `t1` (
  `c1` int NOT NULL AUTO_INCREMENT,
  `c2` int unsigned NOT NULL,
  `c3` varchar(20) NOT NULL,
  `c4` varchar(20) NOT NULL,
  PRIMARY KEY (`c1`),
  UNIQUE KEY `k3` (`c3`),
  UNIQUE KEY `k2` (`c2`)
) ENGINE=InnoDB;
Copy the code

Mysql_random_data_load write 10000 rows of data:

Mysql_random_data_load -h127.0.0.1 - ux-px yejr T1 10000Copy the code

View the execution plan:

[[email protected]]> EXPLAIN SELECT * FROM t1 WHERE
 c2 = 1755950419 AND c3 = 'MichaelaAnderson'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: const
possible_keys: k3,k2
          key: k3
      key_len: 82
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
Copy the code

You can see that the optimizer did select the K3 index instead of the “expected” K2 index. Why?

3. Problem analysis

The reason is simple and crude: the optimizer thinks that the cost of both index selections is the same, but that the first index is selected first.

Alter table t2 alter table t2 alter table t2 alter table t2 alter table t2

[[email protected]]> CREATE TABLE `t2` (
  `c1` int NOT NULL AUTO_INCREMENT,
  `c2` int unsigned NOT NULL,
  `c3` varchar(20) NOT NULL,
  `c4` varchar(20) NOT NULL,
  PRIMARY KEY (`c1`),
  UNIQUE KEY `k2` (`c2`),
  UNIQUE KEY `k3` (`c3`)
) ENGINE=InnoDB;
Copy the code

View the execution plan again:

[[email protected]]> EXPLAIN SELECT * FROM t2 WHERE
 c2 = 1755950419 AND c3 = 'MichaelaAnderson'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: const
possible_keys: k2,k3
          key: k2
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
Copy the code

We use EXPLAIN ANALYZE to see the cost comparison between the two execution plans:

EXPLAIN ANALYZE SELECT * FROM T1 WHERE c2 = 1755950419 AND c3 = 'MichaelaAnderson'\G * * * * * * * * * * * * * * * * * * * * * * * * * * * 1 row * * * * * * * * * * * * * * * * * * * * * * * * * * * the EXPLAIN: - > Rows fetched before execution (cost = 0.00.. Rows = 1) 0.00 (the actual time = 0.000. [[email protected]]> EXPLAIN ANALYZE SELECT * FROM T2 WHERE C2 = 1755950419 AND c3 = 'MichaelaAnderson'\G *************************** 1. row *************************** EXPLAIN: -> Rows touchbefore execution (cost=0.00.. Rows = 1) 0.00 (the actual time = 0.000. 0.000 rows = 1 loops = 1)Copy the code

As you can see, the costs are obviously the same.

Using OPTIMIZE_TRACE to check the execution plan, you can also see that the cost of both SQL is the same:

. { "rows_estimation": [ { "table": "`t1`", "rows": 1, "cost": 1, "table_type": "const", "empty": false } ] }, ...Copy the code

Therefore, the optimizer assumes that the selection of any index is the same, depending on which index is higher in order.

This is also supported by the debug trace from the SELECT:

- 1, the T1 table, k3 index in the PRIMARY KEY (` c1 `), UNIQUE KEY ` k3 ` (` c3 `), UNIQUE KEY ` k2 ` (` c2 `) T @ 2: | | | | | | | | opt: (null) : Starting struct T @ 2: | | | | | | | | opt: table: "` t1 `" T @ 2: | | | | | | | | opt: field: "c3" (c3 in the front, so the last use of k3) T @ 2: | | | | | | | | >convert_string T@2: | | | | | | | | | >alloc_root T@2: | | | | | | | | | | enter: root: 0x40a8068 T@2: | | | | | | | | | | exit: ptr: 0x4b41ab0 T@2: | | | | | | | | | <alloc_root 304 T@2: | | | | | | | | <convert_string 2610 T@2: | | | | | | | | opt: equals: "'Louise Garrett'" T@2: | | | | | | | | opt: null_rejecting: 0 T@2: | | | | | | | | opt: (null): ending struct T@2: | | | | | | | | opt: Key_use: optimize= 0 used_tables=0x0 ref_table_rows= 18446744073709551615 keypart_map= 1 T@2: | | | | | | | | opt: (null): starting struct T@2: | | | | | | | | opt: table: "`t1`" T@2: | | | | | | | | opt: field: "c2" T@2: | | | | | | | | opt: equals: "22896242" T@2: | | | | | | | | opt: null_rejecting: 0 T@2: | | | | | | | | opt: null_rejecting: 0 T@2: | | | | | | | | opt: (null): ending struct T@2: | | | | | | | | opt: Key_use: optimize= 0 used_tables=0x0 ref_table_rows= 18446744073709551615 keypart_map= 1 T@2: | | | | | | | | opt: (null): starting struct T@2: | | | | | | | | opt: table: "`t1`" T@2: | | | | | | | | opt: field: "c2" T@2: | | | | | | | | opt: equals: "22896242" T@2: | | | | | | | | opt: null_rejecting: 0 T@2: | | | | | | | | opt: (null): ending struct T@2: | | | | | | | | opt: ref_optimizer_key_uses: ending struct T@2: | | | | | | | | opt: (null): PRIMARY KEY (' c1 '), PRIMARY KEY (' c2 '), UNIQUE KEY (' c3 ') T@2: | | | | | | | | opt: (null): starting struct T@2: | | | | | | | | opt: table: "`t2`" T@2: | | | | | | | | opt: field: C2 (c2 in the front so the use of k2 index) T @ 2: | | | | | | | | opt: equals: "22896242" T @ 2: | | | | | | | | opt: null_rejecting: 0 T @ 2: | | | | | | | | opt: (null): ending struct T@2: | | | | | | | | opt: Key_use: optimize= 0 used_tables=0x0 ref_table_rows= 18446744073709551615 keypart_map= 1 T@2: | | | | | | | | opt: (null): starting struct T@2: | | | | | | | | opt: table: "`t2`" T@2: | | | | | | | | opt: field: "c3" T@2: | | | | | | | | >convert_string T@2: | | | | | | | | | >alloc_root T@2: | | | | | | | | | | enter: root: 0x40a8068 T@2: | | | | | | | | | | exit: ptr: 0x4b41ab0 T@2: | | | | | | | | | <alloc_root 304 T@2: | | | | | | | | <convert_string 2610 T@2: | | | | | | | | opt: equals: "'Louise Garrett'" T@2: | | | | | | | | opt: null_rejecting: 0 T@2: | | | | | | | | opt: (null): ending struct T@2: | | | | | | | | opt: ref_optimizer_key_uses: ending struct T@2: | | | | | | | | opt: (null): ending structCopy the code

4. Question extension

At this point, we can’t help but wonder if the cost of these two indexes is really the same.

Let’s do a simple comparison test with mysqlSLAP:

-- Test 1: Sock --no-drop --create-schema X -i 3 --number-of-queries select mysqlSLap-hlocalhost-uroot-smysqL. sock --no-drop --create-schema X -i 3 --number-of-queries 1000000 -q "set @xid = cast(round(rand()*2147265929) as unsigned); select * from t1 where c2 = @xid" -c 8 ... Average number of seconds to run all queries: 9.483 seconds... -- Test 2: Sock --no-drop --create-schema X -i 3 --number-of-queries select mysqlSLAP-hlocalhost-uroot - smysqL. sock --no-drop --create-schema X -I 3 --number-of-queries 1000000 -q "set @xid = concat('u',cast(round(rand()*2147265929) as unsigned)); select * from t1 where c3 = @xid" -c 8 ... Average number of seconds to run all queries: 10.360 seconds...Copy the code

As you can see, it takes about 7% to 9% more time to run a C3 index than to run a C2 index. (The test results in my environment may vary depending on the environment and data volume.)

The MySQL optimizer still needs to be improved

Test version: GreatSQL 8.0.25 (MySQL 5.6.39 also results).

Enjoy GreatSQL 🙂

This article is published by OpenWrite!