Why are most database indexes implemented using B+ trees? This involves complex theoretical knowledge of data structures, operating systems, computer storage hierarchies and so on, but don’t worry, this article will give you the answer in 20 minutes.

This article is the last in a series of database indexing articles that includes the following four articles:

  1. What is a database index? Xinhua Dictionary to help you understand

  2. Database index penetration – in-depth

  3. 20 minutes database index design practice – practice

  4. Why is database indexing implemented with B+ tree? –

This series covers a series of knowledge of database index from theory to practice, one-stop solution from understanding to understanding of the whole process, I believe that each article can bring you more in-depth experience.

Why use B+ trees?

You may have heard of an example in math class where the best way to find a particular number in a bunch of sorted numbers is a technique called binary search. The specific process is to find the number in the middle of these numbers, and then compare the target number is greater than or less than this number; Then continue to look for the first half or the second half of the number based on the result.

This is similar to a binary tree in a data structure. A binary tree is a structure where each node in a tree can have at most two children, whereas a B+ tree can have N children per node.

We won’t go into binary trees here, but we need to know that balanced binary trees are the most efficient data structures in memory.

However, most indexes in common databases are implemented by B+ tree. Why use B+ trees instead of binary trees to index databases when binary trees are the most efficient?

Computer storage hierarchy

The storage structure in the computer is divided into several parts, from top to bottom can be roughly divided into registers, cache, main memory, auxiliary memory. Main memory, or what we call memory; Secondary storage, also known as external storage, is common on disks and SSDS, which can be used to store files. In this storage structure, each level of storage is much slower than the previous level, so the faster the program accesses the data in the upper level of storage, the faster it will be.

Those who have programming experience know that the basic operation in the process of program operation is memory, and the access to data in external storage often needs to write some file reading and writing code to achieve. This is precisely because the CPU’s calculation speed is much faster than the storage I/O speed (input/output speed), because the CPU has to wait for the next batch of data to come in after each calculation. The shorter the wait, the faster the computer will run.

Therefore, for database indexes, because of the large amount of data, it is basically saved in external storage, so the cost of database reading an index node is very large. In the case of the same amount of data, we can know that the more values contained in a single node of the B+ tree, the fewer the total number of nodes needed in the tree, so that the number of nodes needed to visit a data query is less.

If you are not familiar with B+ trees, you can find the answer in this article – database index through.

If we think of binary trees as special B+ trees (B+ trees with only one value per node and two Pointers before and after), then we can conclude that ** Because a B+ tree has more values in its nodes (multiple values) than a binary tree (one value), fewer nodes are needed to query in a B+ tree. ** Then if the cost per read is the same, since total cost = number of reads * cost per read, we can prove that the query cost of a B+ tree is much lower than that of a binary tree.

Node read cost

But we know that reading more data is definitely more expensive, so why is it better to index a database using B+ trees than binary trees? This requires some further operating system knowledge to explain.

In modern operating systems, the units used to read data from external memory to memory are generally called “pages”, and each read requires an integer number of “pages”, not half a page or 0.8 pages. The size of a page depends on the operating system. A typical page size is 4KB=4096 bytes. So whether we want to read 1 byte or 2KB, we end up reading a full 4KB page, so the cost of reading a node depends on the number of pages that need to be read.

In this case, if the size of a node is less than the size of a page, then a portion of the time is spent reading data we don’t need at all (data outside the node), which is a lot of time wasted in binary trees; If the size of a node is larger than a page, even if it is an integer multiple of a page, then we may find the pointer in the middle of a node that we need to go to the next node, so that the data after the pointer is read for nothing, and if we do not need the data, we may not have to read a few pages.

So, in summary, database indexing using B+ trees with nodes exactly the size of a page of the operating system is the most efficient option.