1. Why use indexes?

  • Indexes can greatly reduce the amount of data that the storage engine needs to scan.
  • An index can turn random IO into sequential IO.
  • Indexes help us avoid using temporary tables for grouping, sorting, and so on.

2. Think about 🤔. What are the underlying data structures of indexes and what are their advantages and disadvantages?

Common data structures for indexes are as follows: 1. Hash structure. 2, B+Tree structure

Index structure

advantages

disadvantages

Hash structure

Amount of data equivalent in hours High query efficiency

1, index could not complete sort. 2. No interval query. 3. Some indexes cannot be used. 4. A large number of Hash value conflicts cause performance problems.

B + Tree structure

1. Reduce the amount of scanned data. 2. Turn random IO into sequential IO. 3. Disadvantages of Hash

Take up physical space

3. Why 🤔 is a B+Tree?

1, Binary Search Tree: (Binary Search Tree) Disadvantages: The height of the Tree is not constrained, resulting in high query efficiency and time complexity O(n).

Binary search tree

Balance Binary Search Tree (AVL Tree) Improved query complexity (the height difference between the left and right subtrees cannot be greater than 1), but the tree height ==I/O times. Even if the left and right subtrees are leveled, I/O problems caused by the height cannot be received. Moreover, each disk block (node/page) is too small to make full use of the I/O data exchange feature.

Balanced binary trees

3, B-tree structure (multi-path balanced Tree) : disadvantages:

Multipath balanced tree

A b-tree of order M is defined as a tree of order N (m=n+1). If a node has at most N keys, the node can have at most N +1 child nodes. A b-tree of order M has the following characteristics: Keywords (n), path/order (m), degree () 1. Each node has at most m child nodes. 2. Each node except root node and leaf node has at least Ceil(M /2) children. 3. If the root node is not a leaf node, there must be at least two children. 4. All leaf nodes are in the same layer and do not contain other keyword information. 5. Number of keywords n: ceil(m/2)-1 <= N <= m-1 6. Ki (I =1,... N) indicates the keyword in ascending order. 7. Pi (I = 1,... N) is the pointer to the child root node. The keywords of all nodes of the subtree pointed to by P(i-1) are smaller than ki, but larger than K (i-1) 8. Each non-terminal node contains n keywords (P0,P1... Pn, k1,... Kn)Copy the code

Level (m) : P1, P2, P3 Keyword (n) : N <= M-1 Height: xx Figure: Level =3, keyword =2. The default minimum disk block size of mysql is 16 KB. The size of an INT ID as a keyword is 4 bytes +4 bytes. Therefore, number of keywords = Disk block space/ID: The maximum number of keywords =(16 x 1024)/(4+4)=2048, the degree <= path <=2048+1=2049. Check the size of the mysql page: show variables like ‘innodb_page_size’;

4, B+Tree structure

Disadvantages:

Differences between B+Tree and B-tree:

1. B+ node keyword search adopts closed interval. 2. The B+ non-leaf node does not store data related information, but only the references of keywords and child nodes. 3. The data corresponding to the B+ keyword is saved in the leaf node. 4, B+ leaf nodes are sequentially arranged, and adjacent nodes have a sequential reference relationship.Copy the code

The advantage of B + Tree:

The B+ tree is a variant of the B- tree (PLUS version) multi-way absolutely balanced search tree, which has the advantages of the B- tree. B+ tree sweep library, table stronger ability. B+ trees have more disk read and write capabilities. B+ trees are more capable of sorting. B+ tree query efficiency is more stable (depending on your opinion, wisdom).Copy the code

Author: biudefu links: www.jianshu.com/p/171ba693f…