MySQL optimize Index Merge

1. Introduction

How many indexes can MySQL use in a single table query?

When I was learning MySQL, I naively thought that I could speed up queries by indexing all the columns involved in the WHERE condition. This was a big mistake. MySQL > select id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id

SELECT id,a,b FROM T WHERE a>100 AND b>200;
Copy the code

The idX_A index stores only the values of column A and id, and cannot determine whether the condition b>200 is valid. Therefore, the idX_A index can only query the table with the ID. Similarly, the IDX_B index stores only the values of column B and id. The condition that a>100 is valid cannot be determined. The idX_B index can only be queried with the ID. As you can see, the biggest overhead is actually the back table operation. The less data matched by the secondary index, the lower the back table overhead. So in theory, the fewer records a>100 and b>200 meet these two conditions, which index MySQL will use. How does MySQL determine the number of records that meet these criteria? Don’t you have to be honest to scan the full table? MySQL uses the method of estimate, based on the statistics of the table or access to a small amount of data in the table to estimate, and calculate the cost of using the two indexes to query respectively, and finally choose the lower cost index solution. How MySQL estimates execution costs is beyond the scope of this article and will be skipped.

We assume that eventually MySQL uses the idX_A index, so the query process actually looks like this:

InnoDB fromidx_aThe first item in the B+ tree is obtaineda>100Record, take the id value of the record back to the table query.
Query the complete user record in the tableb>200If yes, the record is returned to the client. Otherwise, the record is discarded.
InnoDB continue fromidx_aWe get the next one in the B+ treea>100Repeat the previous process.

With so many indexes, it’s a shame to only use one index per query. Can you use multiple indexes at the same time to complete a query? Index Merge Index Merge Index Merge Index Merge

2. Index Merge

MySQL calls this method of using multiple indexes to perform a single query an index merge. How do we know that the SQL statement we wrote uses index merge? If an index merge is used, the type column should display index_merge, the key column should display all index names, and the Extra column should display what type of index merge is used. As shown below, indexes IDX_A and IDX_B are used to complete the query. The index merge type is Intersection.

table	type	key	Extra
T	index_merge	idx_a,idx_b	Using intersect(idx_a,idx_b); Using where; Using index

What? Does index merge have type? Yes, MySQL currently supports three types of index merge:

Index merge type	instructions
Intersection	Merge the intersection of eligible primary key values in multiple secondary indexes
Union	Select and merge primary key values from multiple secondary indexes
Sort Union	Rehash and sort the primary key values in multiple secondary indexes, and then set and merge

Let’s use a concrete example to demonstrate each of the three index merges. Suppose we have table T as follows, where ID is the primary key, and columns A and B are indexed separately.

CREATE TABLE T(
    `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    `a` INT NOT NULL,
    `b` CHAR(1) DEFAULT NULL,
    KEY `idx_a` (a) USING BTREE,
    KEY `idx_b` (b) USING BTREE
)ENGINE=InnoDB AUTO_INCREMENT=1;
Copy the code

You can write a stored procedure, batch insert records into a table, and I’m going to post some code here, which is pretty crude.

CREATE PROCEDURE insertT()
BEGIN
    DECLARE i INT DEFAULT 0;
    START TRANSACTION;
        WHILE i< =10000 do
            INSERT INTO T (a, b) VALUES (i,CHAR(rand()*(90- 65.)+65));
            SET i=i+1;
        END WHILE;
    COMMIT;
END;
call insertT();
Copy the code

Columns A and B are normal indexes, and the values are allowed to repeat. You can call the store several times. The final data is: a is repeated up to 10,000, B is repeated between a and Z, and the primary key keeps increasing. Let’s demonstrate this based on the data in this table.

2.1 Intersection computes

SELECT * FROM T WHERE a=1 AND b='A';
Copy the code

For this query, we now know that it can be queried in the following three ways:

Full table scan to determine whether the two conditions match.
usingidx_aThe index obtains the ID and queries it back to the table to determine whether condition B is fulfilled.
usingidx_bThe index obtains the ID and queries it back to the table to determine whether condition A is fulfilled.

MySQL supports a fourth query with the Intersection index merge:

usingidx_aIndex refers to the collection of ids retrieved asid_setA.
usingidx_bIndex refers to the collection of ids retrieved asid_setB.
willid_setAandid_setBLet’s take the intersection, let’s call it thetaid_set.
rightid_setQuery back to the table and return the result to the client.

This process is described in a somewhat problematic way, but the general idea is correct, and the main idea is to help you understand. So the intersection of ids, it doesn’t work that way, because MySQL doesn’t store these sets of ids per se, because it takes up a lot of memory, and we’ll talk about that later.

Intersection index merge Is a method of querying a table by intersecting the primary key values of scanned records from multiple indexes and then returning to the table. EXPLAIN analysis results are as follows:

mysql> EXPLAIN SELECT * FROM T WHERE a=1 AND b='A';
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------------------+
| id | select_type | table | partitions | type        | possible_keys | key         | key_len | ref  | rows | filtered | Extra                                                  |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------------------+
|  1 | SIMPLE      | T     | NULL       | index_merge | idx_a,idx_b   | idx_a,idx_b | 4.4     | NULL |    1 |   100.00 | Using intersect(idx_a,idx_b); Using where; Using index |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------------------+
Copy the code

Note that using Intersection to merge indexes is conditional. If all indexes used are secondary indexes, records retrieved through secondary indexes must be sorted by primary key. Why is this a requirement? It mainly has the following two benefits:

It’s easier to take the intersection of two ordered sets.
In the case of ordered primary keys, the return table is no longer a simple random I/O, and the return table is more efficient.

Obviously, our query can be merged using the Intersection index. The idX_A index is sorted by a and then by ID. If a=1, the retrieved records are sorted by ID. The idX_B index is sorted by id and then by B. If b=’A’, the retrieved records are sorted by ID. So it fits the bill.

Finally, let’s look at how MySQL takes the intersection of two sets. Assume that idx_a filters ids [1,3,5] and idx_b filters ids [2,3,4]. The process of obtaining the intersection is as follows:

fromidx_aFetch the first record with an ID value of 1. Again fromidx_bTake the first record, id value is 2, because1 < 2So the record with ID 1 is discarded.
fromidx_aPull out the second record, id value is 3, because2 < 3, so the record with ID 2 is discarded directly.
fromidx_bPull out the second record, id value is 3, because3 = 3, so 3 is returned to the table for query, and the result is returned to the client. At the same time, the two records with ID 3 are directly discarded.
fromidx_aFetch the third record, id value is 5. fromidx_bFetch the third record, id value is 4. because4 < 5Therefore, the record with ID 4 is discarded, and the record with ID 5 is also discarded because both parties have no records. The intersection process ends.

By now you can see why MySQL requires that the records returned by the secondary index be sorted by primary key. This makes the whole intersection process very simple and MySQL does not need to use extra memory to store the set of ids.

2.2 the Union

SELECT * FROM T WHERE a=1 OR b='A';
Copy the code

For this query, we cannot use idX_A OR IDX_B indexes alone, because their conditional relation is OR. Currently, we know only one query method:

Full table scan, determine whether the two conditions meet one of the client back.

With Union index merge, MySQL can actually use a second query method. The process is as follows:

usingidx_aIndex refers to the collection of ids retrieved asid_setA.
usingidx_bIndex refers to the collection of ids retrieved asid_setB.
willid_setAandid_setBLet’s take the union, let’s call it thetaid_set.
rightid_setQuery back to the table and return the result to the client.

This process is similar to Intersection, except that Intersection is replaced with union, so it’s easy to understand. The same is not true of the union, but this is just to make sense of it.

In summary, this method is called Union index merge, in which the primary key values of the records scanned from multiple indexes are joined and then queried back to the table. EXPLAIN analysis results are as follows:

mysql> EXPLAIN SELECT * FROM T WHERE a=1 OR b='A';
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- --------------------------------------+
| id | select_type | table | partitions | type        | possible_keys | key         | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- --------------------------------------+
|  1 | SIMPLE      | T     | NULL       | index_merge | idx_a,idx_b   | idx_a,idx_b | 4.4     | NULL | 1016 |   100.00 | Using union(idx_a,idx_b); Using where |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- --------------------------------------+
Copy the code

Similarly, the use of Union index merging is conditional. If all indexes used are secondary indexes, records retrieved through secondary indexes must be sorted by primary key. Why is this a requirement? It mainly has the following two benefits:

It’s easier to take the union of two ordered sets.
In the case of ordered primary keys, the return table is no longer a simple random I/O, and the return table is more efficient.

As for why this query can use the Union index, it has already been explained above and will not be repeated here.

Union indexes merge and take Union, much like Intersection. MySQL still doesn’t need to use any extra memory to store these id sets, so you can go through the above process without going into details here.

2.3 Sort the Union

SELECT * FROM T WHERE a=1 OR b> ='Z';
Copy the code

The Union index merge cannot be used for this query because it does not satisfy the condition that the records fetched from the IDX_B secondary index are not sorted by primary key. Therefore, there is only one query method known to us:

Full table scan, determine whether the two conditions meet one of the client back.

Intersection and Union are used in strict conditions, requiring secondary indexes to extract records in order of primary keys. They cannot be used for this query. If a=1 and b>=’Z’ are used to filter out most of the records, then the query efficiency can be improved.

MySQL wanted to take advantage of these two indexes, so it came up with a solution. Since the primary keys naturally retrieved from the secondary index are not sorted, I will first put them in memory and then use the Union method to query. The process goes like this:

From the firstidx_bExtract all qualified records from the index, extract the ID set, first remove and then sort, denoted asid_setB.
At this timeid_setBIt’s already in order fromidx_aTo retrieve the record ID value in turn, and then go through the normal process of fetching union.
The final ID is consolidated back into the table and the result is returned to the client.

To sum up, this Sort Union index merge is a Sort Union index merge where the primary key values of the records scanned from multiple indexes are sorted and then the query is executed in the same way as a Union index merge. Instead of a Union, there is a manual sorting of primary keys. EXPLAIN analysis results are as follows:

mysql> EXPLAIN SELECT * FROM T WHERE a=1 OR b> ='Z';
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------+
| id | select_type | table | partitions | type        | possible_keys | key         | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------+
|  1 | SIMPLE      | T     | NULL       | index_merge | idx_a,idx_b   | idx_a,idx_b | 4.4     | NULL |  975 |   100.00 | Using sort_union(idx_a,idx_b); Using where |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+- -------------------------------------------+
Copy the code

2.4 Sort Intersection computes

Unfortunately, MySQL does not currently support so-called “Sort Intersection” index merging. We must wonder why there is no Sort Intersection since there is Sort Union. Isn’t that sort by hand and then take the intersection?

I have not found relevant information to explain why it is not supported. I can tell you my understanding. You can think about, what is the nature of the intersection? The general idea is to take two very large sets and turn them into one small set. And what is the nature of the union? The general idea is to take two smaller sets and turn them into one larger set.

Does that make sense? Sorting in memory for two smaller collections is an acceptable overhead. However, the overhead of sorting two large collections in memory is probably more than the overhead of returning to the table, so MySQL might as well just use the “single index + return table” method of query.

3. Summary

Do not naively index all columns involved in the WHERE condition, as this will often make the result worse. Because normally MySQL can only use one index at a time for single-table queries. However, if possible, MySQL can use Index Merge to perform a query using multiple indexes. MySQL supports three types of index merges, namely Intersection, Union, and Sort Union. Intersection, Union, and Sort Union are used in secondary indexes. Intersection and Union are used in strict conditions, requiring records retrieved from secondary indexes to be sorted according to primary keys. Sometimes the criteria are not met, but MySQL really wants to use Index Merge, so it tries to manually Sort in memory itself. This is Sort Union, which only has one more manual Sort procedure than Union. As for why there is no Sort Intersection, the author gives his own reflections. They may not be right, but you can also think about them.

1. Introduction

2. Index Merge

2.1 Intersection computes

2.2 the Union

2.3 Sort the Union

2.4 Sort Intersection computes

3. Summary

Related Posts

Dig deeper into Golang sync.map

Can I catch multiple Java exceptions in the same catch clause? Java Debug notes

King of Concurrency – Platinum 4: Order – Why semaphores are good for synchronization between threads