The article directories

  • 1. The nature of indexes
    • (a) why database index can not use binary search tree?
    • (2) why red black tree is not suitable for database index?
    • (3) clustered index and non-clustered index
  • MySQL index implementation (Abstract)
    • (1) MyISAM index implementation
    • InnoDB index implementation:

1. The nature of indexes

Indexes are ordered data structures that help MySQL efficiently retrieve data.

If there is no index for the table shown above, when we perform the query:

select * from table1 where Col2 = 23;
Copy the code

I’m going to compare it from beginning to end, and then I’m going to find it, and I’m going to look it up seven times, and it’s very slow.

That’s where indexes come in. They help you quickly find the elements in a particular column.

Suppose we create an index for Col2:

And assume that our index is a binary sort tree (real database base does not use binary sort trees, here is a simple demonstration example).

You can see that this is a binary sort tree, and the time complexity is similar to binary search. You can skip half of the data each time.

When we execute again:

select * from table1 where Col2 = 23;
Copy the code
  • 1. Find the left ⬅️ subtree of 34;
  • Select * from the right ➡️ subtree of 22;
  • 3, compare to 23, equal to 23, return the result of this row.

And you can see that there were three searches.

If there’s a lot of data, there’s actually an advantage.

(a) why database index can not use binary search tree?

According to the above demonstration, it is also possible to look at binary search tree, it is also quite fast. But why isn’t it appropriate to use it at the bottom of a database? This is a common interview question.

We can demonstrate: www.cs.usfca.edu/~galles/vis…

Let’s suppose we index Col1 and then insert the binary search tree in order: 1, 2, 3, 4, 5, 6, 7;

And you can see it degenerates into a linked list.

When we query 7, the time complexity becomes a singly linked list.

From big to small:

The summary is as follows:

  • If the bottom of the database uses binary search tree, it will degenerate into a single linked list when the data is extreme, so it is not appropriate;

Imagine if we were to use the index data structure of a binary search tree for the increment column. That’s the extreme. It’s all on one side.

(2) why red black tree is not suitable for database index?

A red-black tree is also called a binary balanced tree

Red-black trees should be familiar to Java developers as the underlying data structure in HashMap in JDK8 uses red-black trees.

Red-black trees are used in the JDK, so why not index data structures in databases?

So again, let’s say we add Col1 to the index of the red-black tree.

The process is dynamically demonstrated as follows:

If we execute:

select * from table1 where Col1 = 7;
Copy the code

Dynamic demonstration is as follows:

As you can see, it took us four queries to find it. There is still a larger effect than before adding this index, at least not all scans.

Conclusion:

As you can see, almost every insertion adjusts the binary tree, keeping the height balanced. If the data volume is very large, it is also very time consuming, so red black trees are not suitable.

(3) clustered index and non-clustered index

Before answering this question, let’s take a look at the storage mode of Mysql underlying data files. Here we compare MyISAM and InnoDB engines.

1. MyISAM engine

As shown in the figure above, the MyISAM engine table stores three file formats on the hard disk: “.frm “, “.myd “, “.myi “;

  • FRM: indicates the frame of the entire data table, that is, the structure of the table.
  • “.myd “: D stands for data, and this file is stored data.
  • “.myi “: I stands for Index, and this file is the stored Index.

You can see that the table structure, data, and index are separated. This is nonaggregate.

2. InnoDB engine

Unlike MyISAM, InnoDB only stores two types of files on the hard disk:.frm and.ibd.

  • FRM: table structure/frame.
  • Ibd: stores table indexes and data.

You can see that the InnoDB engine stores both data and indexes in a single file, which is called a clustered index.

MySQL index implementation (Abstract)

In MySQL, indexes are implemented in the storage engine layer. Different storage engines implement indexes in different ways. Let’s discuss the index implementation of MyISAM and InnoDB storage engines.

(1) MyISAM index implementation

MyISAM engine uses B+Tree as the index structure, and the data field of the leaf node stores the address of the data record. The principle diagram of MyISAM index is as follows.

Select Col1 as Primary key from MyISAM; select Col1 as Primary key from MyISAM;

You can see that MyISAM’s index file only holds the address of the data record.

In MyISAM, there is no difference in structure between primary and Secondary keys, except that the primary index requires a unique key, while Secondary keys can be repeated.

If we create a secondary index on Col2, the structure of this index is shown below. Also a B+Tree, data field holds the address of the data record. Therefore, the index retrieval algorithm in MyISAM is to search the index according to THE B+Tree search algorithm. If the specified Key exists, the value of its data field is extracted, and then the corresponding data record is read with the value of the data field as the address.

InnoDB index implementation:

InnoDB also uses B+Tree as an index structure, but the implementation is quite different from MyISAM.

The first big difference is that InnoDB’s data files are themselves index files.

MyISAM index files and data files are separate, and the index file only holds the address of the data record.

In InnoDB, the table data file itself is an index structure organized by B+Tree, and the data field of the leaf node of this Tree holds complete data records. The key of this index is the primary key of the table, so the InnoDB table data file itself is the primary index.

Below is a diagram of the main index (which is also a data file) of InnoDB. You can see that the leaf node contains the complete data record. This kind of index is called a clustered index. InnoDB requires tables to have primary keys (MyISAM does not have primary keys) because InnoDB data files themselves are aggregated by primary keys. If this is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key. MySQL automatically generates an implied field for the InnoDB table as the primary key. This field is 6 bytes long and of type long integer.



The second difference from MyISAM index is that InnoDB’s secondary index data field stores the value of the primary key of the corresponding record rather than the address. In other words, all InnoDB secondary indexes refer to the primary key as the data field. The following figure shows a secondary index defined on Col3. The ASCII code of English characters is used as the comparison criterion. Clustered index implementations make searching by primary key very efficient, but secondary index searches require retrieving the index twice: first, retrieving the secondary index for the primary key, and then using the primary key to retrieve records from the primary index.



Understand the different storage engines index is implemented for the proper use of and optimize the index are very helpful, know the InnoDB index after implementation, for example, it is easy to understand why not suggest using long field as the primary key, because all the auxiliary index reference primary index, long primary index will make auxiliary index becomes too large. For example, using non-monotonic fields as primary keys is not a good idea in InnoDB, because InnoDB data files themselves are a B+Tree. Non-monotonic primary keys cause data files to be constantly split and adjusted to maintain B+Tree features when new records are inserted, which is inefficient. Using an increment field as a primary key is a good choice.