Today is February 3, 2020, the first “winter vacation” after graduation, I am sitting in the yard basking in the sun, doing nothing and writing a blog

This article addresses a problem

Mysql uses b+ tree as storage structure. Why not binary trees, red black trees, B trees?

We first construct an application scenario, we have 1kW of data need to be stored in a table, so how can we design to make the query speed as fast as possible

The visual data structures for our experiments are obtained from the following website: www.cs.usfca.edu/~galles/vis…

Ok, let’s see how the binary tree can store 1kw of data. Suppose I have a table, and the table has only one field in it, and it’s incrementing. Let’s see what happens with the binary tree

So, we see that in this case, the binary tree directly degenerates into a linked list, and if we want to find 5 records, we need to search 5 times, and n records need to search n times, which is O(n), not enough for our needs

Let’s look at red black trees

We see red and black tree is characteristic of a self balancing, at the expense of insert performance is solved the problem of the degenerate into a linked list, but with the increase of record tree, the tree height will increase, so, we want to find 1 kw data, still want to look for many times, corresponding to mysql on every read a tree node needs a IO, Is there a better way?

Ok, let’s look at the B tree

The obvious difference is that each node stores multiple data. In fact, the logic is very simple. If you want to keep the height fixed, you need to work horizontally

What’s the difference between a B + tree and a B tree? Use a picture from the Internet to illustrate

Specifically, mysql

  1. There are redundant indexes for easy lookup
  2. Only store data on leaf nodes, 16K memory (mysql innodb_page_size default size) can hold more data, lower height, faster query
  3. Leaf node added a two-way linked list, convenient range query

Therefore, we can understand why mysql uses B + tree as the underlying data structure, 1kW of data, the index application is reasonable, maybe 3 or 4 IO can locate the record.