Optimization grouping

The most common way to satisfy the GROUP BY clause is to scan the entire table and create a new temporary table where all rows in each GROUP are consecutive, then use that temporary table to discover the GROUP and apply the aggregate function (if any). In some cases, MySQL can do better and avoid using index access to create temporary tables.

The most important prerequisite for using an index, GROUP BY, is that all GROUP BY columns refer to attributes of the same index, and that the index stores its keys in order (for example, this is true for BTREE indexes, but it is the correct HASH for indexes). Whether the use of temporary tables can be replaced by index access also depends on which parts of the index are used in the query, the conditions specified for those parts, and the aggregation function chosen.

There are two ways that GROUP BY can perform queries through index access, which are described in detail in the following sections. The first approach applies grouping operations with all range predicates, if any. The second approach first performs a range scan and then groups the resulting tuples.

In MySQL, GROUP BY is used for sorting, so the server can also apply ORDER BY optimization to grouping. However, relying on implicit or explicit GROUP BY sorting is not recommended. See section 8.2.1.14, sorting by Optimization.

  • Loose index scan
  • Tight index scan
Loose index scan

The most efficient way to handle GROUP BY is to retrieve the grouped columns directly using an index. With this access method, MySQL uses attributes of certain index types (such as BTREE) that are sorted by key. With this property, you can use lookup groups in the index without having to consider all the keys in the index that meet all the WHERE criteria. This access method considers only a portion of the keys in the index and is therefore called a “loose index scan.” Without the WHERE clause, a loose index scan would read as many keys as the number of groups, which could be much smaller than the total number of keys. If the WHERE clause contains range predicates (see the discussion of range join types in Section 8.8.1, “Optimizing queries with EXPLAIN,” p. 81), the “loose index scan” looks for the first key of each group that meets the range condition and again reads the minimum number of possible value keys. This can be done when:

  • The query is on a single table.
  • theGROUP BYThe unique name is the leftmost prefix column that makes up the index with no other columns. (ifGROUP BYThe query isDISTINCTClause, instead of query, all the different attributes refer to the column that makes up the left-most prefix of the index. For example, if the tablet1The index of(c1,c2,c3), loose index scanning is suitable for querying the clauses that haveGROUP BY c1, c2. If the query hasGROUP BY c2, c3(Column is not the leftmost prefix) orGROUP BY c1, c2, c4(c4Not in the index), does not apply.
  • The only aggregation function (if any) used in the selection list isMIN()andMAX()And they all reference the same column. The column must be in the index and must immediately follow the column inGROUP BY.
  • The index exceptGROUP BYAny other parts of the query other than those referenced must be constants (that is, they must be referenced equally with constants),MIN()Or orMAX()Except for the arguments to the function.
  • For columns in an index, you must index the full column value, not just the index. For example, usec1 VARCHAR(20), INDEX (c1(10))Index is used onlyc1Value, and cannot be used for loose index scanning.

If “Loose index scan” applies to queries, the EXPLAIN output will show Using index for group-by in the Extra column.

Suppose idX (C1, C2,c3)table has an index T1 (C1, C2, C3, C4). Loose index scan access methods can be used for the following queries:

SELECT c1, c2 FROM t1 GROUP BY c1, c2;
SELECT DISTINCT c1, c2 FROM t1;
SELECT c1, MIN(c2) FROM t1 GROUP BY c1;
SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2;
SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;
Copy the code

For the reasons given, the following query cannot be performed using this quick selection method:

  • In addition to MIN() or, there are other aggregate functions MAX() :

    SELECT c1, SUM(c2) FROM t1 GROUP BY c1;
    Copy the code
  • Columns in the GROUP BY clause do not form the left-most prefix of the index:

    SELECT c1, c2 FROM t1 GROUP BY c2, c3;
    Copy the code
  • This query refers to the part of the key that follows the GROUP BY part and is not equal to the constant:

    SELECT c1, c3 FROM t1 GROUP BY c1, c2;
    Copy the code

    If the query contains, a loose index scan can be used. WHERE c3 = *const*

In addition to the MIN() and MAX() references already supported, loose index scan access methods can be applied to other forms of aggregate function references in select lists:

  • AVG(DISTINCT), SUM(DISTINCT), and COUNT(DISTINCT) are supported. AVG(DISTINCT) and SUM(DISTINCT) accept an argument. COUNT(DISTINCT) can have more than one column parameter.
  • None in the queryGROUP BYor DISTINCTClause.
  • The loose index scan restrictions described earlier still apply.

Suppose idX (C1, C2,c3)table has an index T1 (C1, C2, C3, C4). Loose index scan access methods can be used for the following queries:

SELECT COUNT(DISTINCT c1), SUM(DISTINCT c1) FROM t1;

SELECT COUNT(DISTINCT c1, c2), COUNT(DISTINCT c2, c1) FROM t1;
Copy the code
Tight index scan

A tight index scan can be a full index scan or a range index scan, depending on the query condition.

It is still possible to avoid creating temporary tables for GROUP BY queries when the conditions for loose index scans are not met. If there are scope conditions in the WHERE clause, this method only reads the keys that meet those conditions. Otherwise, it will perform an index scan. This method is called a “tight index scan” because it reads all the keys in each range defined by the WHERE clause, or scans the entire index without a range condition. For tight index scans, grouping is performed only after all the keys that meet the range criteria have been found.

In order for this approach to work, all columns must have a constant equality condition, GROUP BY, for the part before or between all reference key parts of the query. The constants from the equality condition fill in all the gaps in the search key so that the full prefix of the index can be formed. These index prefixes can then be used for index lookups. MySQL also avoids additional sorting operations if GROUP BY results need to be sorted and have the potential to form search keywords as index prefixes, because all keywords are retrieved in order using prefix search in ordered indexes.

Suppose idX (C1, C2,c3)table has an index T1 (C1, C2, C3, C4). The following queries do not apply to the “loose index scan” access method described earlier, but still apply to the “tight index scan” access method.

  • There is a gap in GROUP BY, but covered BY the following condition: c2 = ‘a’ :

    SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;
    Copy the code
  • GROUP BY is not the first part of the key, but there are conditions that provide constants for that part:

    SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;
    Copy the code

More content welcome to pay attention to my personal public number “Han Elder brother has words”, 100G artificial intelligence learning materials, a large number of back-end learning materials waiting for you to take.