basis

What is an index

An index is a data structure that presorts the values of one or more columns in a relational database.

The role of indexes

Index is a data structure used for data search. Common data structures used for data search include ordered array, binary search tree, hash table, etc. These data structures are faster to search.

Problems faced

Index design faces the following problems:

  • How to query data faster;
  • Whether range queries are supported;
  • How to reduce the occupancy of space;
  • How to reduce the number of interactions between disks? Because data is stored on disks, the interaction between disks and data is slow.

implementation

B-Tree/B+Tree

B+Tree B+Tree B+Tree

Before we explain B tree and B+ tree, let us consider a problem, the common ordered array, binary search tree query speed is log(n), and we know MySQL InnoDB engine, the default use of B+ tree as an index structure, why?

We know that memory is insecure and volatile storage, while disk is the persistent storage medium. And, because of the price factor, disk is the primary storage for data, and most of the data is actually on disk, not in memory. If you use an ordered array, even though the search speed is log(n), every time you look up the data, you have to make log(n) disk accesses on the disk, which is very slow. The same problem exists with binary trees. Most of the data is stored on disk, and multiple accesses to disk space are inefficient. Now, what about B trees?

The structure of b-tree is shown in the figure above. It can be seen that B-tree is a multi-fork search tree. A node can have more than two nodes. Compared with binary tree, the biggest advantage of B tree is that the middle layer can be stored in memory, and the height of B tree is lower than that of binary tree when the amount of data is the same, which can reduce the search process and speed up the storage. Therefore, B-trees are often used in the design of databases and file systems.

The figure above shows the structure of B+ tree. B+ tree is a structure similar to B tree. The differences between B+ tree and B tree are as follows:

  • The non-leaf nodes of the B tree also have complete row data. The B+ tree only has data on the leaf nodes and only has keys on the non-leaf nodes. The advantage is that the number of keys stored in the memory is larger and the search is faster.
  • The leaf of a B+ tree contains all of the keys, not necessarily all of the keys.
  • The leaf nodes of the B+ tree are connected through a linked list, and the range query is faster.

A hash index

MySQL hash indexes, as shown in the figure above, are implemented based on hash tables and use the zipper method to handle data conflicts. The retrieval efficiency of hash indexes is very high, but hash indexes also have some disadvantages:

  • Hash indexes cannot process range queries, and range queries are slow.
  • Hash indexes cannot avoid sorting;
  • Hash indexes cannot be queried using partial index keys. In a combination index scenario, multiple index keys need to be combined to calculate the hash value.
  • Table scanning is inevitable with hash indexes. Even if data is located using hash, the original data needs to be compared to find the correct data.
  • Hash index Performance is not necessarily higher than that of a B-tree index when a large number of hash values are equal.

Due to the disadvantages mentioned above, even though the search speed of hash index is fast, it is not necessarily better than B+Tree when used.

The index classification

The index type

The primary key index

The value of a column or combination of columns (fields) in a database table uniquely identifies each row in the table. This column is called the primary key of the table. In a database, when a primary key is defined, a primary key index is automatically created, which is a specific type of unique index.

Normal index

The most basic index type, with no restrictions such as uniqueness.

The only index

A unique index is one in which no two rows are allowed to have the same index value.

Clustered index/non-clustered index

Clustered index: also called clustered index. The physical order of rows in a table is the same as the logical (index) order of key values. A table can contain only one clustered index.

Non-clustered index: also called non-clustered index. In a non-clustered index, the physical order of the records in a database table can be different from the index order. There can only be one clustered index in a table, but each column in the table can have its own non-clustered index.

Joint index

A federated index is an index that consists of multiple fields. Of these, only the first field is ordered. The other fields are ordered only if the preceding fields are equal, but as a whole, they are unordered.

Performance optimization

Back to the table

In MySQL, the primary key index is automatically created for the primary key, and all data is contained on the primary key index. For all other non-primary key indexes, non-primary key indexes contain only the index key and primary key data. Therefore, after a primary key is queried on a non-primary key index, data needs to be queried on the primary key index using the primary key. This process is called “back table”.

Cover index

As mentioned above, when the select statement queries the content in the non-primary key index already contains, does not need to enter the primary key index to search, is called “overwrite index”.

An index pushdown

When multiple conditions are juxtaposed and contained in the index data, MySQL uses “index push” to match the index, filtering to reduce data and speed up the query.

The leftmost prefix matches

The left-most prefix matching principle is an important principle applied to the joint index. That is, the query will be matched from left to right in the order that the joint index is created. It can be understood through the following features.

  • For federated indexes, MySQL will keep matching to the right until it encounters a range query (>, <, between, like) and stops matching. Such as a = 3 and c > b = 4 and 5 and d = 6, if the building is (a, b, c, d) the order of the index, so d is to use less than the index, but if the building is (a, b, d, c) of this order index, then there is no problem, and a, b, You can switch the order of d at will.
  • = and in can be out of order, such as a = 3 and b = 4 and c = 5 to create (a, b, c) index can be in any order.
  • If the index order is (a, b), the query condition where b = 5 cannot use the index, which best reflects the left-most matching feature.