preface

In the previous article on MySQL indexes, we did not mention clustered indexes and non-clustered indexes. In this article, we will take a look at what they are and their advantages and disadvantages.

Clustered index & non-clustered index

The difference between a clustered index and a non-clustered index is whether a leaf node holds a whole row of records


InnoDB uses clustered indexes for primary keys, MyISAM uses non-clustered indexes for both primary keys and secondary indexes.


The following diagram illustrates the difference between a clustered index table (InnoDB) and a non-clustered index (MyISAM) :



Clustered index and non-clustered index


1. To
Non-clustered index tableTable data and indexes are stored in two parts, and there is no difference between primary and secondary indexes. B+ tree is used as the storage structure of the index, all nodes are indexes, and leaf nodes store the data corresponding to the index + index.
For 2.
Cluster index tableFor example (left), table data is stored along with primary keys, primary key index leaves store row data (containing primary key values), and secondary index leaves store row primary key values. B+ tree is used as the storage structure of the index. Non-leaf nodes are all index keywords, but the keywords in non-leaf nodes do not store the specific content or content address of the corresponding record. The data on the leaf node is the primary key and the specific record (data content).


Advantages of clustered indexes

1. When you need to retrieve a certain range of data, using clustered indexes is also better than using non-clustered indexes.
2. It is theoretically faster to find the target data through a clustered index than a non-clustered index, because the non-clustered index will have one more target record addressing, namely one more I/O, when it locates the corresponding primary key.
3. Queries that use overridden index scans can directly use primary key values in page nodes.


Disadvantages of clustered indexes

1.
Insertion speed depends heavily on insertion order, the fastest way is to insert according to the order of primary keys. Otherwise, pages will split and performance will be seriously affected. Therefore, for InnoDB tables, we generally define an incremented ID column primary key.
2.
Updating the primary key is expensive because it causes the updated row to move. Therefore, for InnoDB tables, we generally define primary keys as not updatable.
3.
Secondary index access requires two index lookups, the first to find the primary key value and the second to find the row data based on the primary key value.
The leaf nodes of secondary indexes store primary keys rather than row Pointers (non-clustered indexes store Pointers or addresses). This reduces the maintenance of secondary indexes in the event of row movement or data page splitting, but allows the secondary indexes to take up more space.
4.
Inserting new values with clustered indexes is much slower than inserting new values with non-clustered indexesAs insert to ensure the primary key can’t repeat, determine the primary key can’t repeat, adopt the way under the different index can make a big performance gap of clustering index to iterate through all the leaf nodes, the clustering index also judge all leaf nodes, but the cluster index of leaf nodes in addition to a primary key value and record, record a much bigger than the size of the primary key. This can result in expensive I/O for clustered indexes to determine whether the primary key carried by the new record is duplicate.