An Index is a data structure that helps MySQL to efficiently retrieve data from a database. I’m not going to generalize about indexes being like dictionaries, but how to really understand indexes.
According to the characteristics of the column field information, a data structure is used to store these characteristics, which is the index, as shown in the figure above: We need to store the Value of the second column through the binary lookup tree. Each node exists in the form of key-vlaue, and the Value is regarded as Key and the physical address as Value. Then, as long as the characteristics of the binary lookup tree, the corresponding physical address can be found quickly, and the data can be retrieved.
The common indexes of a database are Btree, B+Tree, Hash index, etc. Today, we mainly discuss Btree and B+Tree
B Tree
Let’s look at the definition of Btree:
- The root node contains at least two children
- Each node in the tree contains a maximum of m children (m>=2)
- Every node except the root node and leaf node has at least CeiL (m/2) children, where CeiL means upper limit
- All leaf nodes are in the same layer
Suppose that each non-terminal node contains n keyword information, where
- Ki (i=1.. N) is the keyword, and the keywords are sorted in ascending order K(i-1) < Ki
- The number of keywords n must be [Ceil (m / 2) -1] <= N <= m-1
- Pointers to non-leaf nodes: P[1], P[2]… P [M]; Where P[1] points to the subtree whose keyword is less than K[1], P[M] points to the subtree whose keyword is greater than K[m-1], and other P[I] points to the subtree whose keyword belongs to (K[i-1], K[I])
It looks very convoluted, but it’s actually quite easy to understand with the following illustration:
Due to the characteristics of B-tree, the algorithm for retrieving data by key in B-tree is very intuitive: binary search is performed from the root node first, and data of the corresponding node is returned if it is found; otherwise, recursive search is performed on the node pointed to by the pointer of the corresponding interval until the node is found or the null pointer is found. The former is found successfully and the latter fails.
However, the insertion and deletion of new data records will destroy the properties of the B-tree. Therefore, the Tree needs to be split, merged, and transferred to maintain the properties of the B-tree. The insertion and deletion of new data records will not be discussed here.
B+Tree
B+Tree is a variant of B Tree. There are many variants of B-tree, among which B+Tree is the most common. For example, MySQL widely uses B+Tree to achieve its index structure. The definition of a B+ tree is basically the same as that of a B tree, except that:
- Non-leaf nodes have the same number of subtree Pointers and keywords
- The subtree pointer of the non-leaf node P[I] points to the subtree of the keyword value [K[I], K[I +1])
- Non-leaf nodes are only used as indexes, and data is stored in leaf nodes
- All leaf nodes have a chain pointer to the next leaf node
B+Tree is more suitable for indexing
1. The disk read and write cost of B+Tree is lower
B+Tree does not contain Pointers to keywords, that is, it stores index information instead of data. As a result, a larger number of keywords can be stored in a B+Tree. As a result, more keywords need to be searched for when a Tree reads data into the memory at a time
2. The query efficiency of B+Tree is stable
B + Tree due to the internal node is not directly points to the file content, but only in the leaf node keyword indexing, so any keyword search must walk a road from the root node to leaf node, all the keyword query path length is the same, lead to each data query efficiency also is almost the same, so the query efficiency is more stable
3. B+Tree is more conducive to database scanning
B Tree improves disk I/O performance but does not solve the problem of element traversal efficiency, while B+Tree only needs to traverse leaf nodes to solve the scan of all keyword information, so it is very beneficial for the range query frequently used by the database
This is why B+Tree is chosen as the dominant index data structure
A Hash index
Some database storage engine also supports a Hash of the data structure as its index Hash structure, you must have been very familiar with, is according to the operation of Hash function, it only through a positioning can find the data you need, than B + Tree index, need from the root node to the leaf node, and then to leaf node can access to the data we need, Hash indexes are theoretically more efficient than B+Tree indexes because they may require multiple IO accesses.
But Hash also has some very obvious downsides:
1, can only meet “=”, “IN”, cannot use the range query
2. Cannot be used to avoid data sorting operations
3, cannot use partial index building to query
Table scan cannot be avoided
5. When a large number of Hash values are equal, the performance is not necessarily higher than that of b-tree indexes