preface

I read a lot of blogs about indexes, and they’re pretty much the same. However, I still do not understand some concepts about indexes, such as B-tree index, Hash index, unique index… Maybe there are many people like me, who do not understand the concept of b-tree, B+Tree, etc., so that the answer in the interview is not the question!

What is the index?

Indexes are data structures that help MySQL efficiently retrieve data.

What does an index do?

Improve the efficiency of data query.

Index: A fast, ordered lookup data structure! Indexes affect lookups after WHERE and ordering after order by.

Classification of indexes

1. Storage structure: BTree index (B-tree or B+Tree index), Hash index, full-index, and R-tree index.

2. From the application level to divide: common index, unique index, composite index.

3. According to the physical order of the data and the logical (index) order relation of the key value: clustered index, non-clustered index.

1) the index is stored in the form described in

2) is the classification carried out in the process of index use, and the two are the division at different levels. However, the index type usually refers to the division at the application level.

Just like the phone category, Android phone, IOS phone and Huawei phone, Apple phone, OPPO phone.

  • Normal index: that is, an index contains only a single column. A table can have multiple single-column indexes
  • Unique index: The value of the indexed column must be unique, but empty values are allowed
  • Compound index: That is, an index contains multiple columns
  • Clustered index (clustered index) : not a separate type of index, but a way of storing data. The details depend on the implementation, InnoDB’s clustered index is actually a b-tree index (technically B+Tree) and rows in the same structure.
  • Non-clustered index: if it’s not a clustered index, it’s a non-clustered index.

The underlying implementation of indexes

Mysql default storage engine InnoDB only explicitly supports B-tree (technically B+Tree) indexes. For frequently accessed tables, InnoDB transparently creates adaptive hash indexes, that is, creating hash indexes based on B-tree indexes can significantly improve search efficiency. It is transparent to clients. Out of control, implicit.
Don’t talk about storage engines, just implementation (abstraction)

A Hash index

Based on hash table implementation, only queries that accurately match all columns of the index are valid. For each row of data, the storage engine computes a Hash code for all columns of the index. Hash indexes store all hash codes in the index and store Pointers to each row in the index table.

B-tree speeds up data access because the storage engine does not need to perform a full table scan to obtain data, which is distributed among nodes.

B-tree is an improved version of the B-tree and is the storage structure used by the database index index. The data is on the leaf nodes, and sequential access Pointers are added, with each leaf node pointing to the address of the adjacent leaf node. Compared with B-tree, only two nodes need to be searched and traversed. B-tree obtains all nodes, which is more efficient than B+Tree.

Example: Suppose you have a table of students with id as the primary key

Implementation in MyISAM engine (secondary index is also implemented in this way)

Implementation in InnoDB





The problem

Q: Why do index structures default to B-tree instead of Hash, binary, or red-black?

Hash: Fast location, but no sequence, high I/O complexity.

Binary tree: the height of the tree is not uniform, cannot self-balance, search efficiency is related to data (tree height), and IO cost is high.

Red-black tree: The tree height increases with the data volume, and the I/O cost is high.

Q: Why is the official recommendation to use self-growing primary keys as indexes?

In combination with the characteristics of B+Tree, the auto-added primary key is continuous, and the page splitting is minimized during the insertion process. Even if the page splitting is required, only a small part of the page splitting will be performed. And it can reduce the movement of data, every insert is inserted to the end. The idea is to reduce the frequency of splitting and movement.

Insert continuous data:



Insert discontinuous data





The last

Welcome to pay attention to my public number [programmer chasing wind], the article will be updated in it, sorting out the data will be placed in it.