Index type
-
B + tree
-
Why B+ tree and not B tree?
Let’s first look at the structural differences between B trees and B+ trees
-
- You can see:
- B trees have satellite data (a row of data in a data table) on each node, whereas B+ trees only have satellite data on leaf nodes. This means that the same size of disk sector, B+ tree can store more leaf nodes, disk I/O fewer times; It also means that the search efficiency of B+ tree is more stable, and the fastest time complexity of B tree data query is O(1).
- Each node of a B tree appears only once, and all nodes of a B+ tree appear in leaf nodes. All leaf nodes of a B+ tree form an ascending linked list suitable for range-range lookup, whereas B trees do not.
-
What is the difference between MyISAM and InnoDB’s B+ tree index implementation (clustered index and non-clustered index)?
First, you need to understand clustered and non-clustered indexes.
- Clustering index
In a clustered index, the leaf page contains all the data for the row, and the node page value contains the index column. InnoDB aggregates data by primary key. If no primary key is defined, a unique non-empty index column is selected instead. If there is no such index, InnoDB implicitly defines a primary key as the cluster index.
In clustered indexes, there are secondary indexes in addition to primary key indexes. Leaf nodes in secondary indexes do not store row Pointers, but primary key values and use them as Pointers to rows. This means that the storage engine needs to find the primary key value of the leaf node in the secondary index, and then find the corresponding row in the clustered index based on this value, also known as “back table”. Of course, you can override indexes to avoid back tables, or InnoDB’s adaptive indexes can reduce such rework.
PS: Each leaf node in the cluster index contains not only the complete row of data, but also the transaction ID and rollback pointer for the transaction and MVCC.
- Non-clustered index
The primary key index and secondary index of a non-clustered index are not structurally different in that both store “row Pointers” to the physical address of the data on leaf nodes.
Advantages and disadvantages of clustered indexes
-
advantages
- Keep related data together (such as a user ID that aggregates all of a user’s mail), otherwise each data read may result in a disk IO
- Data access is faster, indexes and data are kept in the same B+ tree, and it is usually faster to get data in a clustered index than to look it up in a non-clustered index
- Override queries make direct use of primary key values in page nodes
-
disadvantages
- If all data can be placed in memory, sequential access is no longer necessary and clustered indexes have no advantage
- The insert speed depends on the insert order, random inserts will split the page, causing holes, and rebuild the TABLE with the OPTIMIZE TABLE
- Maintaining index changes with every insert, update, or delete can be costly
- Secondary indexes can be larger than expected because the node contains primary key columns that reference rows
-
The hash index
Hash indexes are implemented based on hash tables and are valid only for queries that exactly match all columns of the index, meaning that hash indexes are suitable for equivalent queries.
Concrete implementation: for each row of data, the storage engine will calculate a hash code for all index columns, hash index will store all the hash code in the index, at the same time in the hash table to store Pointers to each row of data.
In MySQL, only the Memory engine explicitly supports hash indexes, and of course the Memory engine also supports B-tree indexes.
PS: The Memory engine supports non-unique hash indexes and resolves conflicts by storing multiple pointer records with the same hash in a linked list.
- Adaptive hash index
InnoDB notices that when certain index values are used very frequently, it creates a hash index in memory on top of the B+ tree index, which gives the B+ tree index some of the benefits of hash indexes, such as fast hash lookups.