1. Introduction
How many indexes can MySQL use in a single table query?
When I was learning MySQL, I naively thought that I could speed up queries by indexing all the columns involved in the WHERE condition. This was a big mistake. MySQL > select id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id
SELECT id,a,b FROM T WHERE a>100 AND b>200;
Copy the code
The idX_A index stores only the values of column A and id, and cannot determine whether the condition b>200 is valid. Therefore, the idX_A index can only query the table with the ID. Similarly, the IDX_B index stores only the values of column B and id. The condition that a>100 is valid cannot be determined. The idX_B index can only be queried with the ID. As you can see, the biggest overhead is actually the back table operation. The less data matched by the secondary index, the lower the back table overhead. So in theory, the fewer records a>100 and b>200 meet these two conditions, which index MySQL will use. How does MySQL determine the number of records that meet these criteria? Don’t you have to be honest to scan the full table? MySQL uses the method of estimate, based on the statistics of the table or access to a small amount of data in the table to estimate, and calculate the cost of using the two indexes to query respectively, and finally choose the lower cost index solution. How MySQL estimates execution costs is beyond the scope of this article and will be skipped.
We assume that eventually MySQL uses the idX_A index, so the query process actually looks like this:
- InnoDB from
idx_a
The first item in the B+ tree is obtaineda>100
Record, take the id value of the record back to the table query. - Query the complete user record in the table
b>200
If yes, the record is returned to the client. Otherwise, the record is discarded. - InnoDB continue from
idx_a
We get the next one in the B+ treea>100
Repeat the previous process.
With so many indexes, it’s a shame to only use one index per query. Can you use multiple indexes at the same time to complete a query? Index Merge Index Merge Index Merge Index Merge
2. Index Merge
MySQL calls this method of using multiple indexes to perform a single query an index merge. How do we know that the SQL statement we wrote uses index merge? If an index merge is used, the type column should display index_merge, the key column should display all index names, and the Extra column should display what type of index merge is used. As shown below, indexes IDX_A and IDX_B are used to complete the query. The index merge type is Intersection.
table | type | key | Extra |
---|---|---|---|
T | index_merge | idx_a,idx_b | Using intersect(idx_a,idx_b); Using where; Using index |
What? Does index merge have type? Yes, MySQL currently supports three types of index merge:
Index merge type | instructions |
---|---|
Intersection | Merge the intersection of eligible primary key values in multiple secondary indexes |
Union | Select and merge primary key values from multiple secondary indexes |
Sort Union | Rehash and sort the primary key values in multiple secondary indexes, and then set and merge |
Let’s use a concrete example to demonstrate each of the three index merges. Suppose we have table T as follows, where ID is the primary key, and columns A and B are indexed separately.
CREATE TABLE T(
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`a` INT NOT NULL,
`b` CHAR(1) DEFAULT NULL,
KEY `idx_a` (a) USING BTREE,
KEY `idx_b` (b) USING BTREE
)ENGINE=InnoDB AUTO_INCREMENT=1;
Copy the code
You can write a stored procedure, batch insert records into a table, and I’m going to post some code here, which is pretty crude.
CREATE PROCEDURE insertT()
BEGIN
DECLARE i INT DEFAULT 0;
START TRANSACTION;
WHILE i< =10000 do
INSERT INTO T (a, b) VALUES (i,CHAR(rand()*(90- 65.)+65));
SET i=i+1;
END WHILE;
COMMIT;
END;
call insertT();
Copy the code
Columns A and B are normal indexes, and the values are allowed to repeat. You can call the store several times. The final data is: a is repeated up to 10,000, B is repeated between a and Z, and the primary key keeps increasing. Let’s demonstrate this based on the data in this table.
2.1 Intersection computes
SELECT * FROM T WHERE a=1 AND b='A';
Copy the code
For this query, we now know that it can be queried in the following three ways:
- Full table scan to determine whether the two conditions match.
- using
idx_a
The index obtains the ID and queries it back to the table to determine whether condition B is fulfilled. - using
idx_b
The index obtains the ID and queries it back to the table to determine whether condition A is fulfilled.
MySQL supports a fourth query with the Intersection index merge:
- using
idx_a
Index refers to the collection of ids retrieved asid_setA
. - using
idx_b
Index refers to the collection of ids retrieved asid_setB
. - will
id_setA
andid_setB
Let’s take the intersection, let’s call it thetaid_set
. - right
id_set
Query back to the table and return the result to the client.
This process is described in a somewhat problematic way, but the general idea is correct, and the main idea is to help you understand. So the intersection of ids, it doesn’t work that way, because MySQL doesn’t store these sets of ids per se, because it takes up a lot of memory, and we’ll talk about that later.
Intersection index merge Is a method of querying a table by intersecting the primary key values of scanned records from multiple indexes and then returning to the table. EXPLAIN analysis results are as follows:
mysql> EXPLAIN SELECT * FROM T WHERE a=1 AND b='A';
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------------------+
| 1 | SIMPLE | T | NULL | index_merge | idx_a,idx_b | idx_a,idx_b | 4.4 | NULL | 1 | 100.00 | Using intersect(idx_a,idx_b); Using where; Using index |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------------------+
Copy the code
Note that using Intersection to merge indexes is conditional. If all indexes used are secondary indexes, records retrieved through secondary indexes must be sorted by primary key. Why is this a requirement? It mainly has the following two benefits:
- It’s easier to take the intersection of two ordered sets.
- In the case of ordered primary keys, the return table is no longer a simple random I/O, and the return table is more efficient.
Obviously, our query can be merged using the Intersection index. The idX_A index is sorted by a and then by ID. If a=1, the retrieved records are sorted by ID. The idX_B index is sorted by id and then by B. If b=’A’, the retrieved records are sorted by ID. So it fits the bill.
Finally, let’s look at how MySQL takes the intersection of two sets. Assume that idx_a filters ids [1,3,5] and idx_b filters ids [2,3,4]. The process of obtaining the intersection is as follows:
- from
idx_a
Fetch the first record with an ID value of 1. Again fromidx_b
Take the first record, id value is 2, because1 < 2
So the record with ID 1 is discarded. - from
idx_a
Pull out the second record, id value is 3, because2 < 3
, so the record with ID 2 is discarded directly. - from
idx_b
Pull out the second record, id value is 3, because3 = 3
, so 3 is returned to the table for query, and the result is returned to the client. At the same time, the two records with ID 3 are directly discarded. - from
idx_a
Fetch the third record, id value is 5. fromidx_b
Fetch the third record, id value is 4. because4 < 5
Therefore, the record with ID 4 is discarded, and the record with ID 5 is also discarded because both parties have no records. The intersection process ends.
By now you can see why MySQL requires that the records returned by the secondary index be sorted by primary key. This makes the whole intersection process very simple and MySQL does not need to use extra memory to store the set of ids.
2.2 the Union
SELECT * FROM T WHERE a=1 OR b='A';
Copy the code
For this query, we cannot use idX_A OR IDX_B indexes alone, because their conditional relation is OR. Currently, we know only one query method:
- Full table scan, determine whether the two conditions meet one of the client back.
With Union index merge, MySQL can actually use a second query method. The process is as follows:
- using
idx_a
Index refers to the collection of ids retrieved asid_setA
. - using
idx_b
Index refers to the collection of ids retrieved asid_setB
. - will
id_setA
andid_setB
Let’s take the union, let’s call it thetaid_set
. - right
id_set
Query back to the table and return the result to the client.
This process is similar to Intersection, except that Intersection is replaced with union, so it’s easy to understand. The same is not true of the union, but this is just to make sense of it.
In summary, this method is called Union index merge, in which the primary key values of the records scanned from multiple indexes are joined and then queried back to the table. EXPLAIN analysis results are as follows:
mysql> EXPLAIN SELECT * FROM T WHERE a=1 OR b='A';
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- --------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- --------------------------------------+
| 1 | SIMPLE | T | NULL | index_merge | idx_a,idx_b | idx_a,idx_b | 4.4 | NULL | 1016 | 100.00 | Using union(idx_a,idx_b); Using where |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- --------------------------------------+
Copy the code
Similarly, the use of Union index merging is conditional. If all indexes used are secondary indexes, records retrieved through secondary indexes must be sorted by primary key. Why is this a requirement? It mainly has the following two benefits:
- It’s easier to take the union of two ordered sets.
- In the case of ordered primary keys, the return table is no longer a simple random I/O, and the return table is more efficient.
As for why this query can use the Union index, it has already been explained above and will not be repeated here.
Union indexes merge and take Union, much like Intersection. MySQL still doesn’t need to use any extra memory to store these id sets, so you can go through the above process without going into details here.
2.3 Sort the Union
SELECT * FROM T WHERE a=1 OR b> ='Z';
Copy the code
The Union index merge cannot be used for this query because it does not satisfy the condition that the records fetched from the IDX_B secondary index are not sorted by primary key. Therefore, there is only one query method known to us:
- Full table scan, determine whether the two conditions meet one of the client back.
Intersection and Union are used in strict conditions, requiring secondary indexes to extract records in order of primary keys. They cannot be used for this query. If a=1 and b>=’Z’ are used to filter out most of the records, then the query efficiency can be improved.
MySQL wanted to take advantage of these two indexes, so it came up with a solution. Since the primary keys naturally retrieved from the secondary index are not sorted, I will first put them in memory and then use the Union method to query. The process goes like this:
- From the first
idx_b
Extract all qualified records from the index, extract the ID set, first remove and then sort, denoted asid_setB
. - At this time
id_setB
It’s already in order fromidx_a
To retrieve the record ID value in turn, and then go through the normal process of fetching union. - The final ID is consolidated back into the table and the result is returned to the client.
To sum up, this Sort Union index merge is a Sort Union index merge where the primary key values of the records scanned from multiple indexes are sorted and then the query is executed in the same way as a Union index merge. Instead of a Union, there is a manual sorting of primary keys. EXPLAIN analysis results are as follows:
mysql> EXPLAIN SELECT * FROM T WHERE a=1 OR b> ='Z';
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------+
| 1 | SIMPLE | T | NULL | index_merge | idx_a,idx_b | idx_a,idx_b | 4.4 | NULL | 975 | 100.00 | Using sort_union(idx_a,idx_b); Using where |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------+
Copy the code
2.4 Sort Intersection computes
Unfortunately, MySQL does not currently support so-called “Sort Intersection” index merging. We must wonder why there is no Sort Intersection since there is Sort Union. Isn’t that sort by hand and then take the intersection?
I have not found relevant information to explain why it is not supported. I can tell you my understanding. You can think about, what is the nature of the intersection? The general idea is to take two very large sets and turn them into one small set. And what is the nature of the union? The general idea is to take two smaller sets and turn them into one larger set.
Does that make sense? Sorting in memory for two smaller collections is an acceptable overhead. However, the overhead of sorting two large collections in memory is probably more than the overhead of returning to the table, so MySQL might as well just use the “single index + return table” method of query.
3. Summary
Do not naively index all columns involved in the WHERE condition, as this will often make the result worse. Because normally MySQL can only use one index at a time for single-table queries. However, if possible, MySQL can use Index Merge to perform a query using multiple indexes. MySQL supports three types of index merges, namely Intersection, Union, and Sort Union. Intersection, Union, and Sort Union are used in secondary indexes. Intersection and Union are used in strict conditions, requiring records retrieved from secondary indexes to be sorted according to primary keys. Sometimes the criteria are not met, but MySQL really wants to use Index Merge, so it tries to manually Sort in memory itself. This is Sort Union, which only has one more manual Sort procedure than Union. As for why there is no Sort Intersection, the author gives his own reflections. They may not be right, but you can also think about them.