There are many types of MySQL indexes that can provide better performance for different scenarios. The b-tree index is the most common type of MySQL index. The b-tree index is the most common type of MySQL index. This article explains in detail the b-tree index of the underlying structure, the use of principles and characteristics. To save you time, the main content of this article is as follows:

  • The underlying structure of the B-tree index
  • Rule for using b-tree indexes
  • Clustering index
  • Differences between InnoDB and MyISAM engine indexes
  • Loose index
  • Cover index

B-tree indexes

B-tree Index Uses b-tree to store data. Different storage engines implement this method differently. B-tree usually means that all values are stored in order and that each leaf page is the same distance from the root. Figure 1 shows an abstract representation of the B-tree index, which gives an overview of how MySQL’s B-tree index works.

The underlying data structure of the B-tree index is usually B+ Tree. The specific data structure and advantages of the B-tree index are not described in detail here. The following figure shows an abstract representation of the B-tree index, which roughly reflects how MyISAM index works.

MySQL can add a B-tree index to a single column, or add a B-tree index to multiple columns of data. The data of multiple columns is combined and stored in the page of the B-tree according to the order in which the index is added. Assume the following table:

CREATE TABLE People (
      last_name    varchar(50)    not null,
      first_name   varchar(50)    not null,
      birthday     date           not null,
      gender       enum('m'.'f')  not null
      key(last_name, first_name, birthday)
);
Copy the code

For each row in the table, the index contains the values of the last_name, first_name, and birthday columns. The following figure shows how the index organizes data storage.

The B-tree index uses the B-tree as the data structure to store data, and the query rules are determined by the data structure. In general, b-tree indexes are suitable for full key value, key value range, and key prefix lookup, where key prefix lookup only applies to lookup by the leftmost prefix. The query rules for b-tree indexes are as follows:

  • Full value matching: Full value matching refers to matching all columns in the index,

  • Match the leftmost prefix: The preceding index can be used to find all people with the last name Allen, using only the first column in the index.

  • Match column prefixes: You can also match only the beginning of a column’s value. For example, the aforementioned index can be used to find all people with last names beginning with J. Here, too, only the first column of the index is used.

  • Matching range values: For example, the index mentioned earlier can be used to find people with last names between Allen and Barrymore. Here too, only the first column of the index is used.

  • Matches exactly one column and ranges exactly another: The index mentioned above can also be used to find all people with the last name Allen and names beginning with the letter K (Kim,Karl, etc.). The value of last_name matches all values, and the value of first_name matches all values.

Because the nodes of the index tree are ordered, in addition to look-ups BY value, the index can also be used for ORDER BY operations (look-ups in ORDER) in a query. If the ORDER BY clause satisfies several of the query types listed above, the index can also satisfy the corresponding sorting requirements.

Here are some restrictions on b-tree indexes:

  • An index cannot be used unless the search starts in the leftmost column of the index. For example, the index in the above example cannot find a person named Bill or the date of a particular birthday, because neither column is the left-most data column.
  • If there is a range query for a column in the query, all columns to the right of it cannot be found using index optimization.

Clustering index

Clustered index is not a separate index type, but a way of data storage. The details depend on how it is implemented, but InnoDB’s clustered index actually holds the B-tree index and rows in the same structure.

When a table has a clustered index, its rows are actually stored in the leaf pages of the index, which means that rows are tightly stored with adjacent key values.

The following figure shows how records in a clustered index are stored. Notice that the leaf page contains all the data rows of the row, but the node page contains only the index columns.

Clustered indexes can help with performance, but they can also cause serious performance problems. Clustering data has some important advantages:

  • Data access is faster, clustered indexes keep indexes and data in the same B-tree, so getting data from a clustered index is usually faster than looking it up in a non-clustered index.
  • Queries that use an overridden index scan can directly use the primary key value in the page node.

If you can take advantage of these advantages when designing tables and queries, you can greatly improve performance. At the same time, clustering indexes also have some disadvantages:

  • The insertion order depends heavily on the insertion order. The fastest way to insert data into an InnoDB table is by primary key order. Avoid random (discontinuous and highly distributed) clustered indexes with primary key values, such as UUID as primary key, and use AUTO_INCREMENT columns instead.
  • Updating clustered index columns is expensive because InnoDB is forced to move each updated row to a new location.
  • Tables based on clustered indexes can suffer from “page splitting” when new rows are inserted, or when the primary key is updated so that rows need to be moved. When a row’s primary key requires that the row be inserted into a full page, the storage engine splits the page into two pages to accommodate the row, a page split operation. Page splitting causes tables to take up more disk space
  • Secondary indexes can be larger than you think, because leaf nodes in secondary indexes contain primary key columns that reference rows
  • Secondary index access requires two index lookups instead of one.

Index difference between InnoDB and MyISAM

The difference in data distribution between clustered and non-clustered indexes, as well as between the corresponding primary and secondary indexes, is often confusing and surprising. The following figure shows the different indexes and data storage methods of MyISAM and InnoDB.

The distribution of MyISAM data is very simple. It is stored on disk in the order in which data is inserted. The primary key index and secondary index leaf nodes store Pointers to corresponding data rows.

In InnoDB, clustered indexes are “just” tables, so they don’t require separate row storage like MyISAM. Each leaf node of the cluster index contains the primary key and all remaining columns (col2 in this case).

InnoDB secondary indexes are very different from clustered indexes. InnoDB secondary index leaves store primary key values instead of “row Pointers” as “Pointers” to rows.

Loose index scan

MySQL does not support loose index scanning, which means that an index cannot be scanned in a discontinuous manner. In general, MySQL needs to define a starting point and an ending point for an index scan. Even if only a few entries are needed, MySQL still needs to scan each entry in the index.

Let’s illustrate this with an example. Suppose we have the following index (a,b) and the following query:

mysql>SELECT * FROM tb1 WHERE b BETWEEN 2 AND 3;
Copy the code

Because the leading field of the index is column A, but only field B is specified in the query, MySQL cannot use the index and can only find matching rows through a full table scan, as shown in the figure below.

Knowing the physical structure of the index, it is not hard to see that there is a faster way to execute the above query. The physical structure of the index (not the API of the storage engine) can scan the range of column B corresponding to the first value of column A, and then skip to the range of column B corresponding to the second value of column A. The following figure shows what this process would look like if MySQL implemented it.

Notice that there is no need to use WHERE clause filtering at this point, because loose index scanning already skips all unwanted records.

Loose index scanning can be used in some special situations after MySQL 5.0, for example, to find the maximum and minimum values of a group in a group query:

mysql> EXPLAIN SELECT actor_id, MAX(film_id)
        -> FROM sakila.film.film_actor
        -> GROUP BY actor_id;
********************************************* 1. row ***********************************
id: 1
select_type: SIMPLE
table: film_actor
type: range
possible_keys: NULL
key: PRIMARY
key_len: 2
ref: NULL
rows: 396
Extra: Using index for group-by
Copy the code

The Extra field in EXPLAIN shows “Using index for group-by”, indicating that loose index scanning will be used here.

Cover index

In addition to being an efficient way to find data, indexes are also a direct way to get column data. MySQL can use indexes to fetch data directly from columns without having to read rows. If an index contains the values of all the fields to be queried, we call it a “cover index.” Overwriting indexes is a very useful tool that can greatly improve performance. SQL queries that only scan the index without returning to the table have many benefits:

  • The number and size of index entries are usually much smaller than the number and size of data rows, so if you only need to read the index, MySQL will greatly reduce the number of data visits.
  • Because indexes are stored in column order, I/ O-intensive range lookups are much less I/O intensive than random reads of each row from disk.
  • Overwrite indexes are especially useful for InnoDB tables because of InnoDB’s clustered indexes. InnoDB’s secondary index holds the primary key of the row in the leaf node. If the secondary key overwrites the query, it avoids a second query on the primary key index.

When you initiate a query that overwrites an Index (also known as an Index overwrite query), you can see the “Using Index” message in the Extra column of EXPLAIN. For example, the table Sakila. inventory has a multi-column index (store_id, film_id). MySQL > alter table alter table alter table alter table alter table alter table alter table alter table alter table

mysql> EXPLAIN SELECT store_id, film_id FROM sakila.inventory
*********************************1.row***************************************
id:1
select_type:SIMPLE
table:inventory
type:index
possible_keys:NULL
key:idx_store_id_film_id
key_len:3
ref:NULL
rows:4673
Extra:Using Index
Copy the code

Subscribe to the latest articles, please follow my wechat official account

Reference:

  • The data structure and algorithm principle behind MySQL index
  • High Performance MySQL