Welcome to this article on MySQL technology shared by the GreatSQL community. If you have any questions or questions you would like to learn, please leave them in the comments section below
MySQL optimizer index selection myth.
Peng Gao contributed to this report.
1. Problem description
Select * from c1; select * from c2; select * from c1; select * from c2;
Select * from c1 WHERE c1 =? AND c2 = ? To view the execution plan using EXPLAIN, the optimizer prefered the C2 column index of type VARCHAR.
He was puzzled. Shouldn’t c1 columns of type INT be chosen, which seem less costly?
2. Problem recurrence
Create test table T1:
[[email protected]]> CREATE TABLE `t1` (
`c1` int NOT NULL AUTO_INCREMENT,
`c2` int unsigned NOT NULL,
`c3` varchar(20) NOT NULL,
`c4` varchar(20) NOT NULL,
PRIMARY KEY (`c1`),
UNIQUE KEY `k3` (`c3`),
UNIQUE KEY `k2` (`c2`)
) ENGINE=InnoDB;
Copy the code
Mysql_random_data_load write 10000 rows of data:
Mysql_random_data_load -h127.0.0.1 - ux-px yejr T1 10000Copy the code
View the execution plan:
[[email protected]]> EXPLAIN SELECT * FROM t1 WHERE
c2 = 1755950419 AND c3 = 'MichaelaAnderson'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: const
possible_keys: k3,k2
key: k3
key_len: 82
ref: const
rows: 1
filtered: 100.00
Extra: NULL
Copy the code
You can see that the optimizer did select the K3 index instead of the “expected” K2 index. Why?
3. Problem analysis
The reason is simple and crude: the optimizer thinks that the cost of both index selections is the same, but that the first index is selected first.
Alter table t2 alter table t2 alter table t2 alter table t2 alter table t2
[[email protected]]> CREATE TABLE `t2` (
`c1` int NOT NULL AUTO_INCREMENT,
`c2` int unsigned NOT NULL,
`c3` varchar(20) NOT NULL,
`c4` varchar(20) NOT NULL,
PRIMARY KEY (`c1`),
UNIQUE KEY `k2` (`c2`),
UNIQUE KEY `k3` (`c3`)
) ENGINE=InnoDB;
Copy the code
View the execution plan again:
[[email protected]]> EXPLAIN SELECT * FROM t2 WHERE
c2 = 1755950419 AND c3 = 'MichaelaAnderson'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: const
possible_keys: k2,k3
key: k2
key_len: 4
ref: const
rows: 1
filtered: 100.00
Extra: NULL
Copy the code
We use EXPLAIN ANALYZE to see the cost comparison between the two execution plans:
EXPLAIN ANALYZE SELECT * FROM T1 WHERE c2 = 1755950419 AND c3 = 'MichaelaAnderson'\G * * * * * * * * * * * * * * * * * * * * * * * * * * * 1 row * * * * * * * * * * * * * * * * * * * * * * * * * * * the EXPLAIN: - > Rows fetched before execution (cost = 0.00.. Rows = 1) 0.00 (the actual time = 0.000. [[email protected]]> EXPLAIN ANALYZE SELECT * FROM T2 WHERE C2 = 1755950419 AND c3 = 'MichaelaAnderson'\G *************************** 1. row *************************** EXPLAIN: -> Rows touchbefore execution (cost=0.00.. Rows = 1) 0.00 (the actual time = 0.000. 0.000 rows = 1 loops = 1)Copy the code
As you can see, the costs are obviously the same.
Using OPTIMIZE_TRACE to check the execution plan, you can also see that the cost of both SQL is the same:
. { "rows_estimation": [ { "table": "`t1`", "rows": 1, "cost": 1, "table_type": "const", "empty": false } ] }, ...Copy the code
Therefore, the optimizer assumes that the selection of any index is the same, depending on which index is higher in order.
This is also supported by the debug trace from the SELECT:
- 1, the T1 table, k3 index in the PRIMARY KEY (` c1 `), UNIQUE KEY ` k3 ` (` c3 `), UNIQUE KEY ` k2 ` (` c2 `) T @ 2: | | | | | | | | opt: (null) : Starting struct T @ 2: | | | | | | | | opt: table: "` t1 `" T @ 2: | | | | | | | | opt: field: "c3" (c3 in the front, so the last use of k3) T @ 2: | | | | | | | | >convert_string T@2: | | | | | | | | | >alloc_root T@2: | | | | | | | | | | enter: root: 0x40a8068 T@2: | | | | | | | | | | exit: ptr: 0x4b41ab0 T@2: | | | | | | | | | <alloc_root 304 T@2: | | | | | | | | <convert_string 2610 T@2: | | | | | | | | opt: equals: "'Louise Garrett'" T@2: | | | | | | | | opt: null_rejecting: 0 T@2: | | | | | | | | opt: (null): ending struct T@2: | | | | | | | | opt: Key_use: optimize= 0 used_tables=0x0 ref_table_rows= 18446744073709551615 keypart_map= 1 T@2: | | | | | | | | opt: (null): starting struct T@2: | | | | | | | | opt: table: "`t1`" T@2: | | | | | | | | opt: field: "c2" T@2: | | | | | | | | opt: equals: "22896242" T@2: | | | | | | | | opt: null_rejecting: 0 T@2: | | | | | | | | opt: null_rejecting: 0 T@2: | | | | | | | | opt: (null): ending struct T@2: | | | | | | | | opt: Key_use: optimize= 0 used_tables=0x0 ref_table_rows= 18446744073709551615 keypart_map= 1 T@2: | | | | | | | | opt: (null): starting struct T@2: | | | | | | | | opt: table: "`t1`" T@2: | | | | | | | | opt: field: "c2" T@2: | | | | | | | | opt: equals: "22896242" T@2: | | | | | | | | opt: null_rejecting: 0 T@2: | | | | | | | | opt: (null): ending struct T@2: | | | | | | | | opt: ref_optimizer_key_uses: ending struct T@2: | | | | | | | | opt: (null): PRIMARY KEY (' c1 '), PRIMARY KEY (' c2 '), UNIQUE KEY (' c3 ') T@2: | | | | | | | | opt: (null): starting struct T@2: | | | | | | | | opt: table: "`t2`" T@2: | | | | | | | | opt: field: C2 (c2 in the front so the use of k2 index) T @ 2: | | | | | | | | opt: equals: "22896242" T @ 2: | | | | | | | | opt: null_rejecting: 0 T @ 2: | | | | | | | | opt: (null): ending struct T@2: | | | | | | | | opt: Key_use: optimize= 0 used_tables=0x0 ref_table_rows= 18446744073709551615 keypart_map= 1 T@2: | | | | | | | | opt: (null): starting struct T@2: | | | | | | | | opt: table: "`t2`" T@2: | | | | | | | | opt: field: "c3" T@2: | | | | | | | | >convert_string T@2: | | | | | | | | | >alloc_root T@2: | | | | | | | | | | enter: root: 0x40a8068 T@2: | | | | | | | | | | exit: ptr: 0x4b41ab0 T@2: | | | | | | | | | <alloc_root 304 T@2: | | | | | | | | <convert_string 2610 T@2: | | | | | | | | opt: equals: "'Louise Garrett'" T@2: | | | | | | | | opt: null_rejecting: 0 T@2: | | | | | | | | opt: (null): ending struct T@2: | | | | | | | | opt: ref_optimizer_key_uses: ending struct T@2: | | | | | | | | opt: (null): ending structCopy the code
4. Question extension
At this point, we can’t help but wonder if the cost of these two indexes is really the same.
Let’s do a simple comparison test with mysqlSLAP:
-- Test 1: Sock --no-drop --create-schema X -i 3 --number-of-queries select mysqlSLap-hlocalhost-uroot-smysqL. sock --no-drop --create-schema X -i 3 --number-of-queries 1000000 -q "set @xid = cast(round(rand()*2147265929) as unsigned); select * from t1 where c2 = @xid" -c 8 ... Average number of seconds to run all queries: 9.483 seconds... -- Test 2: Sock --no-drop --create-schema X -i 3 --number-of-queries select mysqlSLAP-hlocalhost-uroot - smysqL. sock --no-drop --create-schema X -I 3 --number-of-queries 1000000 -q "set @xid = concat('u',cast(round(rand()*2147265929) as unsigned)); select * from t1 where c3 = @xid" -c 8 ... Average number of seconds to run all queries: 10.360 seconds...Copy the code
As you can see, it takes about 7% to 9% more time to run a C3 index than to run a C2 index. (The test results in my environment may vary depending on the environment and data volume.)
The MySQL optimizer still needs to be improved
Test version: GreatSQL 8.0.25 (MySQL 5.6.39 also results).
Enjoy GreatSQL 🙂
This article is published by OpenWrite!