Origin of the problem:

InnoDB, Msyql, InnoDB, InnoDB, InnoDB, InnoDB, InnoDB, InnoDB, InnoDB, InnoDB, InnoDB, InnoDB, InnoDB, InnoDB, InnoDB

Ask questions:

!!!!!!!!! B minus B Tree B minus B Tree

WTF, what is b-tree? Not so what is B+ -tree? What’s the difference? Isn’t our usual binary search tree Log (n) time? Isn’t that good?

Start solving problems:

Preliminary knowledge

Disk I/o

· The system reads disks by reading out the basic unit of disks — blocks. Bits of data on the same disk block are read at once. Disk read IO is a mechanical action that takes about 100,000 times longer than memory read. Therefore, disk I/O speed is a major indicator of index performance.

1. Review the Binay Search Tree

! High performance Mysql to give up B+-Tree](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/f38e1c505c6641799a45cb1ff97ca152~tplv-k3u1fbpfcp-zoom-1.image )

As shown in the figure, is a binary search tree whose time complexity is O(Log(n)). Now we want to perform a search, so consider the best case scenario, search 0009, that is, search the root node with the keyword BST, read IO once; The worst-case scenario is obviously the deepest bottom leaf node of the tree (N disk IO for N depth).

For example, to index 0010, four disk IO operations are required, which is a lot less than other data structures.

! High performance Mysql to give up B+-Tree](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/1507e2d7dc314452981df3a416b5fe36~tplv-k3u1fbpfcp-zoom-1.image )
! High performance Mysql to give up B+-Tree](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/aba95352ea274818908be90b479e40e7~tplv-k3u1fbpfcp-zoom-1.image )
! High performance Mysql to give up B+-Tree](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/6f815a095e06402ebf65b587c9400c34~tplv-k3u1fbpfcp-zoom-1.image )
! High performance Mysql to give up B+-Tree](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/a9f97531c4e74c1e997a10039a4607e7~tplv-k3u1fbpfcp-zoom-1.image )

Can it be optimized and how:

· If optimization is to optimize the BST worst case

· The worst case is determined by the depth of the tree, that is, it is order N BST, so we have to compress it

2. From the previous thinking, we can derive b-tree:

B-tree – Balanced multi-path search Tree

· B-tree is a balanced search Tree designed for external storage devices such as disks.

· Data in b-tree structure enables the system to efficiently find the disk block where the data resides. To describe a B-tree, first define a record as a binary group [key, data]. Key is the key value of the record, corresponding to the primary key value in the table, and data is the data in a row except the primary key. The key values are different for different records. (A key is a unique and immutable marker of an index, and the content is stored in data.)

An M-level B-tree has the following characteristics:

Each Tree has a number of attribute concepts, and the ones highlighted here are the ones we use, which are relevant to our understanding of b-trees.

1. Each node has a maximum of M children.

  1. Each node except the root node and leaf node has at least Ceil(m/2) children.
  2. If the root node is not a leaf node, there are at least 2 children
  3. All leaf nodes are in the same layer and contain no other keyword information
  4. Each non-terminal node contains n keywords (P0,P1… Pn, k1,… Kn)
  5. The number of keywords n is ceiL (m/2)-1 <= N <= M-1
  6. Ki (I = 1,… N) indicates the keyword in ascending order.
  7. Pi (I = 1,… N) is the pointer to the child root node. The keywords of all nodes of the subtree pointed to by P(i-1) are less than ki, but greater than K (i-1).
! High performance Mysql to give up B+-Tree](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/da291f68c67845fa94896ac9dde4c437~tplv-k3u1fbpfcp-zoom-1.image )

In the figure above, this is a B-tree. Purple is Key, yellow is data, and blue is pointer. If you look carefully, you can find disk blocks

Each node occupies the disk space of a disk block. A node has two keywords in ascending order and three Pointers to the child root node. The Pointers store the address of the disk block where the child node resides. The three scope fields divided into two keywords correspond to the scope fields of the data of the subtree pointed to by the three Pointers. For the root node, the keywords are 17 and 35. The data range of the subtree to which the P1 pointer points is less than 17, the data range of the subtree to which the P2 pointer points is 17-35, and the data range of the subtree to which the P3 pointer points is greater than 35.

Example Lookup 60

Simulate the process of finding keyword 60:

Locate disk block 1 based on the root node and read it into memory.

[disk I/O operation 1] find the pointer P3 to disk block 1 when keyword 60 is greater than 17,35. Locate disk block 4 based on pointer P3 and read it into memory.

[disk I/O operation 2] find pointer P1 of disk block 3 when keyword 60 is less than 65,87. Locate disk block 9 according to P2 pointer and read into memory.

Find keyword 60 in the keyword list of disk block 9. Analyzing the above procedure, we found that three disk I/O operations and three memory lookups were required.

Since the keyword in memory is an ordered table structure, dichotomy lookup can be used to improve efficiency. Three disk I/O operations affect the efficiency of the entire B-tree search. B-tree enables each disk I/O to obtain data from the memory, improving the query efficiency.

!!!!!!!!! : pay attention to more than the BST before on each disk page index comparison, but because of disk pages have been read disk IO operation read the memory, so the memory IO operations than disk IO operation time saving a lot of is not an order of magnitude can be neglected, so the disk IO operations are still the most important key index of the performance.

Obviously, this time we did three disk I/OS, one less than the previous BST. Imagine that in a complex B-tree index, there are many fewer disk I/OS, so B-tree is better for indexing.

Can you optimize it again?

Iphone6/7/8 upgrade is iphone6+/7+/8+, yes add + (plus), it seems to do so.

B + – Tree:

B+Tree is an optimization based on B-Tree to make it more suitable for external storage index structure. InnoDB storage engine uses B+Tree to achieve its index structure.

The key-data pair of the b-tree is stored in disk, and then read into memory by disk IO operation, so in disk page, if the data is very large, if it is big data !!!!

If the data is large, the disk page cannot be loaded, which will reduce the number of keys on a page, or increase the depth of the B-tree, which will still increase the disk IO query, computer science elders have a bold idea, so I put all the data in the Tree leaf node?

In B+Tree, all data record nodes are stored on leaf nodes of the same level in order of key value size, but not on leaf nodesonlyStores key information, which greatly increases the number of keys stored on each node and reduces the B+Tree height.

B+Tree is different from B-tree in the following aspects:

Non-leaf nodes only store key-value information. There is a chain pointer between all leaf nodes. Data records are stored in leaf nodes.

B + – Tree diagram

! High performance Mysql to give up B+-Tree](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/ecf8549efbf14e38afeae16e5608cf2c~tplv-k3u1fbpfcp-zoom-1.image )

There are usually two head Pointers on a B+Tree, one to the root node and the other to the leaf node with the smallest keyword, and there is a chain-ring structure between all the leaf nodes (that is, data nodes). Therefore, there are two kinds of lookup operations on B+Tree: a range and paging lookup on the primary key, and a random lookup starting from the root node.

Clustered indexes, or scope look-ups, start at the following nodes.

!!!!!!!!! Difference between B-tree and B+ -tree :

· The disk read and write of B+Tree is lower, because non-leaf nodes can store more index keys, and key indexes are more concentrated in the same layer, thus reducing the disk I/O read and write times.

· B+Tree has more stable query efficiency, since the final node pointing to the file content is the index of the keyword in the leaf node, so any key search only has to go through the index path from the root node to the leaf node, the path is similar (depth), so it is more stable (at the bottom in the best and worst case).

Mysql is a relational database, so it often accesses an index according to an interval. The leaf nodes of the B+ tree set up a chain pointer in order to enhance the interval access.

One last question:

Most importantly, why does Mysql use B/B+ trees to implement indexes?

· Mysql is a disk-based database. Indexes are stored on disk in the form of index files

· The process of indexing involves disk I/O consumption, which is several orders of magnitude better than memory I/O consumption. That is, the disk I/O consumption involved in indexing should be minimized when searching for keywords.

Additional questions:

Why does the operating system, database, or Mysql database engine use B+ -tree more?

· Because b-tree still does not solve the keyword traversal, so the time complexity of keyword traversal on B-tree is very high, but B+ -tree is not the same, can directly traversal leaf nodes, support range index, because OS DB DB ENGINE often use range index. So B+ -tree is better.

Now first share some MySQL some learning structure diagram, because there are too many pictures, so xiaobian first arrange these pictures, other drops if necessary

Receive way: forward + attention, private xiaobian [learn] can be immediately obtained

! High performance Mysql to give up B+-Tree](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/2fc14c87355843c38c068080e9f17e51~tplv-k3u1fbpfcp-zoom-1.image )
! High performance Mysql to give up B+-Tree](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/e479b36764d242a3834ef9f46a161189~tplv-k3u1fbpfcp-zoom-1.image )