preface

Indexes were created to improve the efficiency of querying data, just like the contents of a book. For tables in a database, indexes are simply “directories.”

For the series on MySQL, go to the MySQL column

Common index types

  • Hash table
  • Orderly array
  • Search tree

Hash table

The hash table is a structure that stores data in KV form. As long as you input the key, you can find the corresponding value. The idea is very simple. Hash collisions occur when two different keys have the same index position calculated by the hash algorithm. In this case, a linked list is used. During the query, the hash algorithm is used to calculate the index position of the array where the value resides, and then the list is traversed in the linked order until the value corresponding to the key is found

The data stored in the hash table is not sorted in ascending order by key, so if you need to do a range query, you have to scan the whole table using the hash structure.

Therefore, the hash structure is suitable for equivalent query scenarios, such as NoSQL

Orderly array

Ordered arrays perform well in both equivalent and range query scenarios. In equivalent query, we can find the legal bit element according to dichotomy, the time complexity is O(logn); If it is a range query, you can use dichotomy to find the lower limit of the range, and then iterate to the right until you find the first value that does not match the range, and then exit the loop.

For query efficiency alone, ordered arrays are the best data structure. However, to add new data, to insert an element into the array, it would be too expensive to move all the records that follow. Therefore, ordered arrays are only suitable for static storage engines, such as data that is not modified and only needs to be queried.

Search tree

The nodes of the binary search tree are ordered. The left node is less than the root node, and the right node is greater than the root node. Binary search tree performance is very poor in extreme cases, so balanced binary tree query performance is better. But balanced binary tree structure, each read data blocks need to be loaded from disk into memory, in order to reduce I/O operations, as far as possible the disk read less, more balanced tree can solve this problem, each node has child nodes, so the tree height compared with binary tree, much lower, in the process of the query will visit fewer data block, thus improve the search efficiency.


Clustered index/non-clustered index

As we all know, InnoDB uses B+Tree data structure, which is also a balanced multi-tree. InnoDB table data is stored in the form of indexes according to the order of primary key. Each index corresponds to a B+Tree in InnoDB. Indexes are divided into primary key indexes and non-primary key indexes.

In InnoDB, a primary key index is also called a cluster index.

In InnoDB, a non-primary key index is also called a secondary index.

What is the difference between a query based on a primary key index and a normal index?

If the where condition in SQL is the primary key, you only need to search for B+Tree of the primary key index to query the data.

If the WHERE condition in SQL is a common index, search B+Tree of the common index to obtain the value of the primary key, and then search B+Tree of the primary key index to obtain the required data according to the value of the primary key. This process is called backtable.

Therefore, queries based on non-primary key indexes need to scan more than one tree, so primary key queries should be used whenever possible in your application.

Why should the primary key be as small as possible and self-increasing?

B+Tree When inserting a new node, if the value of the inserted node is larger than the value of the current Tree, the node can be directly inserted. If the value of the inserted node is in the middle of the current value in the tree, you need to move some of the data to make room for the new data. If the data page to be inserted is full, according to the ALGORITHM of B+Tree, it needs to apply for a new data page, and then move part of the data, this process is called page splitting. When a node is deleted from the tree, when the utilization of the remaining data in the data page is very low, the data page will be merged, which is the reverse process of splitting.

The reason why primary key increment is required is that every time the node to be inserted is larger than the maximum value in the tree, the append operation will not trigger the splitting of leaf nodes.

The primary key of a non-clustered B+Tree must be as short as possible. The primary key of a non-clustered B+Tree must be as short as possible. The smaller the primary key length, the smaller the leaf nodes of the normal index, and the smaller the space of the normal index.

Cover index

Select * from T where k between 3 and 5 select * from T where k between 3 and 5 select * from T where k between 3 and 5 What is the execution process of this SQL?

  • K is an ordinary index. The index tree of K is traversed to find the lower limit 3 of the range, obtain the corresponding primary key, and then query the data row corresponding to the primary key in the primary key index tree.
  • Then go back to the k index tree and fetch the next value to determine whether the value is within the range of the data to be queried. If yes, fetch the corresponding primary key and go to the primary key index tree to query the corresponding data row.
  • The search ends when data beyond the search range is found

A backtable operation is inevitable because rows exist only on the primary key index, and you can only traverse the primary key index tree.

Select id from T where k between 3 and 5

  • You only need to search through the k index tree to find the value of ID from the leaf node, until you find the data beyond the search range, the search is over, no need to return to the table operation

In this query, index K already covers the query requirement and is called the override index. Because overwritten indexes can reduce the number of tree searches and significantly improve query performance, using overwritten indexes is a common performance optimization method.

The prefix index

One type of index is a federated index, which is an index composed of multiple fields. It is also called a composite index.

If there is a federated index (name,age), the leftmost prefix principle is illustrated with an example.

Based on the federated index above, there are now the following requirements:

  1. Ask to find out all the people whose name is “Zhang SAN”
  2. Demand to find out all the people with the surname “Li”
  3. He asked to find out all the people whose last name was mountain
  4. All persons aged 18 are required to be identified

Select * from (name,age) where (name,age); select * from (name,age) where (name,age);

Select * from SQL where name = ‘% %’ where name = ‘% %’

Select * from SQL where name = ‘% mountain’ where name = ‘% mountain’ where name = ‘% mountain

Select * from SQL where age = 18 and age = 18; select * from SQL where age = 18 and age = 18;

To sum up, when creating a joint index, to arrange the index field order. Otherwise, the index is likely to fail. For the (name,age) federated index, the search conditions that match the search conditions are:

  • Where name =…
  • Where name =… And the age =…
  • Where name =… And the age (in… ,…).
  • order by name,age
  • Where name =… order by age

There are many other ways to invalidate an index. Here are some examples:

  • Or is not an index subsegment (or is valid when both fields are indexes)
  • Left – most N fields not followed
  • Fuzzy query like starts with %
  • Type conversion is required
  • Where index columns are computed
  • The where index column uses a function
  • Not, <>,! = (not equal operator triggers full table scan)
  • The full table scan speed is faster than the index speed

An index pushdown

The leftmost prefix rule is explained above. What happens if you join an index that does not conform to the leftmost prefix rule?

Let’s use the combined index (name, age) as an example, and ask to retrieve “all boys whose first name is zhang and age is 10” in the table. So, the SQL statement reads:

select * from tuser where name like 'a %' and age=10 and ismale=1;

1Copy the code
  • 1

Select * from table where name = ‘%’; select * from table where name = ‘%’; select * from table where name = ‘%’;

Before MySQL 5.6, only the first value that satisfies the name condition can be returned to the table one by one. Find the rows in the primary key index and compare the field values.

In the process of index traversal, index condition pushdown (INDEX condition pushdown) is introduced in MySQL 5.6, which can judge the fields contained in the index first, directly filter out the records that do not meet the conditions, and reduce the number of back to the table.

The following figure shows the execution flowchart before and after MySQL 5.6.





In the first diagram, which is a push-down flow without index, InnoDB does not look at the age value, but simply pulls the “name first word is’ sheet ‘” entry into the table in order. Therefore, you need to return to the table 4 times.

In the flow chart with index push-down, InnoDB checks whether the age is equal to 10 inside the index (name,age). If the age is not equal to 10, InnoDB checks it directly and jumps over it. In our example, only need to ID4, ID5 these two records back to the table for data judgment, only need to return to the table twice.