Talk about the underlying principles of indexing in MySQL

One, foreword

To have a lot of readers to me recently some interview questions, and generally I’ll give him a Lao zhou of e-books, but some readers feedback interview topic I think is really a classic, and Lao zhou in the series of articles written, in line with the attitude of responsible for readers, I wrote a few I think with more classic interview questions, let us in this classic problem is no longer back eight-part essays, You dive into the underlying principles and data structures. You’ll feel comfortable answering the question no matter which company asks it.

Today this article Tencent an interview question: Say MySQL in the underlying principles of index, no matter which company first, if Lao zhou is the interviewer, MySQL, I probably will use this way to see, because it can examine if you understand the nature of MySQL, index optimization, and now many people easily index the underlying is what I believe that there are quite many people are not particularly understand.

It doesn’t matter, follow Lao Zhou’s lead, this one we completely get MySQL index.

What is the index?

An index is a structure that sorts the values of one or more columns in a database table. Using an index can speed up the query of specific data in a database.

The purpose of an index is to improve query efficiency, just like a table of contents of a book. With a 1000-page book, if you want to find a single point of knowledge quickly, is that too hard to do without the help of a table of contents? Similarly, for a database table, the index is its “table of contents.”

Classification of indexes

Common index types are: primary key index, unique index, normal index, full text index, composite index

3.1 Primary Key Index

That is, the primary index is created based on the primary key pk_clolum (length). No duplicates and no null values are allowed.

ALTER TABLE 'table_name' ADD PRIMARY KEY pk_index('col');Copy the code

3.2 Unique Index

The value of the column used to build the index must be unique, allowing null values.

ALTER TABLE 'table_name' ADD UNIQUE index_name('col');Copy the code

3.3 Common Indexes

The basic index type in MySQL that allows the insertion of duplicate and null values in the column where the index is defined.

ALTER TABLE 'table_name' ADD INDEX index_name('col');Copy the code

3.4 Full-text Index

Full-text index type FULLTEXT, built with columns of large text objects, supports full-text lookup of values on the columns that define the index, allowing the insertion of duplicate values and null values in the columns of these indexes. Full-text indexes can be created on columns of type CHAR, VARCHAR, or TEXT. Only the MyISAM storage engine in MySQL supports full-text indexing.

ALTER TABLE 'table_name' ADD FULLTEXT INDEX ft_index('col');Copy the code

3.5 Composite Indexes

An index created on a combination of fields in a table is used only if the left field of those fields is used in a query condition. The left-most prefix rule is followed when using composite indexes, and values in multiple columns of composite indexes are not allowed to have empty values.

ALTER TABLE 'table_name' ADD INDEX index_name('col1','col2','col3');Copy the code

Note that it is important to follow the leftmost prefix rule!! The columns most commonly used for retrieval or sorting are placed at the left, descending in order. The combined index is equivalent to creating three indexes col1, COL1COL2, col1COL2COL3, while COL2 or COL3 cannot use the index.

Common models of indexes

Indexes appear to improve query efficiency, but there are many ways to achieve indexes, so the concept of index model is introduced here. There are many data structures that can be used to improve read and write efficiency. Here I will introduce you to a few common data structures, which are hash tables, ordered arrays and search trees. Search tree can be divided into binary search tree, balanced binary tree, B tree, B+ tree.

There are two ways to implement indexes in MySQL: BTREE and HASH, which depend on the storage engine of the table. MyISAM and InnoDB storage engines only support BTREE indexes, while MEMROY/HEAP storage engines support HASH and BTREE indexes.

The index model that Zhou said here is not only for MySQL ha, common several will be described, so that you have a more global understanding of the way to achieve index.

4.1 a hash table

Hash table is a kind of key-value storage structure. As long as we input the value to be searched (key), we can find its corresponding value (value). The idea of hashing is very simple, you put the value in an array, you use a hash function to convert the key to a certain location, and then you put the value in that location in the array.

When multiple key values are converted into hash functions, the same value may occur. One way to handle this is to pull out a linked list. Have you noticed that a HashMap is actually such a data structure?

Suppose you are maintaining a table of user ids and names, and need to find the corresponding name based on the user ID. The corresponding hash index would look like this:

User_id2 and user_id3 hash to 4. This is a hash conflict, but it doesn’t matter. When you want to query user_ID2 name, hash first to get 4, iterate sequentially to find name2.

It’s important to note that the value of user_IDn is not incremented, so one of the nice things about this is that it’s very quick to add users, so you just append later; But there are disadvantages, because the hash index is not ordered, so the interval query is very slow.

Therefore, the hash table structure is suitable for scenarios where only equivalent queries are available, such as Memcached and other NoSQL engines.

4.2 Ordered Array

The hash table above is not suitable for range queries, and ordered arrays perform very well in both equivalent and range query scenarios.So let’s look at the time of ordered arrays for the equivalent query and the time of range query, for example if you want to look up the name of user_id2, you can get it quickly using dichotomy, and this time is order log of N. For example, if you want to query between user_id2 and user_id5, use binary lookup to find user_id2 (if no user_id2 exists, then find the first user greater than user_id2), and iterate right. Exit the loop until the first one is greater than user_id5.

The query scenario is great, but ordered arrays also have the disadvantage of being a bit cumbersome when updating data. For example, if you insert a user in the middle, all the records after that user move backwards, which is expensive.

So, ordered array indexes are only good for static storage engines, like if you want to store all the population information of a city in a given year in history, historical data that can’t be modified.

4.3 binary tree

A binary tree is a tree in which no node can have more than two sons. Binary tree is also the most classical tree structure, the following trees are based on the improvement of binary tree.

Let’s first look at the characteristics of binary trees:

The average time complexity of binary trees is order log(N).
Each node can have a maximum of two child nodes

General binary tree:

Worst-case binomial tree, chain case:Implementation:

Since a binary tree node can have at most two child nodes, we can save links directly to them. The declaration of a tree node is structurally similar to the declaration of a double-linked list, in that the node is a structure consisting of information about an element plus two references (left and right) to other nodes. As follows:

public class BinaryNode {
    Object element; // The data in the mode
    BinaryNode left; // Left child
    BinaryNode right; // Right child
}
Copy the code

There are many online said that the common index data structure is binary tree, in fact, not rigorous enough, the application of binary tree in the index is mainly binary search tree and the subsequent evolution of balanced binary tree, B tree, B+ tree, etc.. Binary trees have many important applications that have nothing to do with search, such as compiler design.

4.4 Binary search tree

Search tree ADT (Abstract Data Type), also known as binary search tree, is an application of Data structure used in index.

Let’s take a look at the characteristics of binary search trees:

The average time complexity of binary search tree is order log(N).
Each node can have a maximum of two child nodes
The value of the left child node is less than the value of this node, and the value of the right child node is greater than the value of this node.

Let’s take a look at the following structure:

Because of the binary search tree, we know very quickly that the left-hand graph is a binary search tree; The middle tree is not, because a node of 7 appears to the left of the root node of 6, which does not meet the characteristics of the third point of a binary tree. The tree on the right is a chained binary search tree.

Implementation:

A binary lookup tree requires all items to be sorted. Using the Java language as an example, we can implement the Comparable interface to achieve this sort property. Two items in the tree can always be compared using the compareTo method. The structure is as follows:

public class BinarySearchTree<T extends Comparable<? super T>> {
    private static class BinaryNode<T> {
        T element; // The data in the mode
        BinaryNode<T> left; // Left child
        BinaryNode<T> right; // Right child

        BinaryNode(T t) {
            this(t, null.null);
        }

        BinaryNode(T t, BinaryNode<T> lt, BinaryNode<T> rt) {
            this.element = t;
            this.left = lt;
            this.right = rt; }}// ...
}
Copy the code

4.5 Balanced binary search tree

Balanced binary search Tree is also called AVL Tree (Adelson-Velsky-Landis Tree), the emergence of balanced binary search Tree is to solve the above binary search Tree may appear chain scene.

Let’s first look at the characteristics of balanced binary search tree:

The average time complexity of binary search tree is order log(N).
Each node can have a maximum of two child nodes
The height difference between the left and right subtrees of each node is no more than 1

Let’s take a look at the following structure:

The AVL tree on the left has a height difference of <=1 between the two subtrees of any node; The one on the right is not an AVL tree, because the height difference between the two subtrees of 7 is 2 (root 2 is 3, root 8 is 1).

Implementation:

public class AVLTree<T extends Comparable<? super T>> {
    private AVLTreeNode<T> rootNode; // root node

    private static class AVLTreeNode<T> {
        T element; // The data in the mode
        BinaryNode<T> left; // Left child
        BinaryNode<T> right; // Right child
        int height; // Height

        AVLTreeNode(T t) {
            this(t, null.null);
        }

        AVLTreeNode(T t, BinaryNode<T> lt, BinaryNode<T> rt) {
            this.element = t;
            this.left = lt;
            this.right = rt;
            this.height = 0; }}// ...
}
Copy the code

A balanced binary search tree can be rotated so that the height difference between the left and right subtrees of each node is no more than 1, which avoids the linkage problem. The DISK I/O size is determined by the tree height. The more data there is, the more I/O times there are, and the slower the I/O times are.

4.6 BTree

The above balanced binary search tree may be because when the data volume is relatively large, the tree height will be high, and the IO times will be more, which will affect the performance. Therefore, B tree is to solve the problem of large data can also make I/O few trees.

BTree is a balanced search multi-fork tree, let the degree of the tree be 2d (d>1), and the height be h, then BTree must satisfy the following conditions:

Each leaf node has the same height, equal to h.
Each non-leaf node is composed of N-1 keys and n pointer points, where D <=n<=2d, key and point are separated from each other, and both ends of the node must be keys.
Leaf node Pointers are null
Non-leaf nodes are binary groups [key, data], where key represents the key as the index and data is the data in the row where the key value resides.

The BTree structure is as follows:

BTree can use binary search, search complexity is H * O(log(N)), generally speaking, the height of the tree is very small, generally about 3, so BTree is a very efficient search structure.

4.7 B + Tree

B+Tree is a variant of BTree. Let d be the degree of the Tree and H be the height of the Tree. The differences between B+Tree and BTree mainly lie in:

Non-leaf nodes in a B+Tree do not store data, only key values.
The leaf node of B+Tree has no pointer. All key values appear on the leaf node, and the key value stored by key corresponds to the physical address of data data.
Each non-leaf node of B+Tree consists of n key values and N pointer points

The structure of B+Tree is as follows:

4.8 B+Tree with Sequential Access Pointers

Generally, the B+Tree structure used in database system or file system is optimized on the basis of classical B+Tree, adding sequential access Pointers.

As shown in the figure above, a B+Tree with sequential access Pointers is formed by adding a pointer to each leaf node of the B+Tree that points to adjacent leaf nodes. The purpose of this optimization is to improve the performance of interval access. If you want to query all data records with keys from 18 to 49, after finding 18, you can access all data nodes at one time by traversing the sequence of nodes and Pointers, which greatly improves the interval query efficiency.

MySQL > select * from B+Tree

Generally, indexes themselves are too large to be stored in memory, so they are often stored on disk as index files. Therefore, the disk I/O consumption of index lookups is several orders of magnitude higher than memory access, so the most important indicator of a data structure as an index is the progressive complexity of disk I/O operations during lookups. In other words, indexes should be structured to minimize the amount of disk I/O needed to access a lookup.

5.1 Disk I/O and Prefetch

Mentioned above to access the disk, then here first a brief introduction of disk IO and pre-reading, disk read data by mechanical movement, every time the time spent reading data can be divided into seek time, rotational delay, transmission time three sections, seek time refers to the magnetic arm moved to specify the time required to track, the mainstream disk generally under 5 ms; Rotation delay is the disk speed we often hear, for example, a disk 7200 revolution, means that it can rotate 7200 times per minute, that is, one second can rotate 120 times, the rotation delay is 1/120/2 = 4.17ms; Transfer time refers to the time it takes to read or write data from or to the disk, usually in the order of a few tenths of a millisecond and negligible compared to the first two. So the time to access a disk, namely the time to access a disk IO, is about 5+4.17 = 9ms, which doesn’t sound bad, but considering that a 500-mips machine can execute 500 million instructions per second, because instructions rely on electrical properties, In other words, one IO can execute 400,000 instructions, and the database is used for hundreds of millions or even tens of millions of levels of data. The time of 9 milliseconds each time is obviously a disaster. The following is a comparison of computer hardware delay for your reference:Considering that disk I/O is a very expensive operation, computer operating systems have been optimized to read not only the current disk address data, but also adjacent data into the memory buffer during an I/O, because the principle of local prefetch tells us that when a computer accesses the data of an address, Adjacent data will also be accessed quickly. Each IO read is called a page. The specific data size of a page depends on the operating system, and it is usually 4K or 8K. In other words, when we read the data in a page, we actually only have one IO. This theory is very helpful for the data structure design of the index.

5.2 Performance Analysis of B-/+ Tree Indexes

As mentioned above, disk I/O counts are generally used to evaluate index structures. Based on the definition of the B-tree, a maximum of H nodes need to be accessed in a search. The designers of the database system took advantage of disk prefetch by setting the size of a node equal to one page so that each node could be fully loaded with only one I/O. In order to achieve this, the following techniques are also needed to implement the B-tree in practice:

Each time a new node is created, it requests space for a page directly. This ensures that a node is physically stored on a page, and computer storage allocation is page-aligned, so that a node needs only one I/O.

In b-tree, a search requires a maximum of H-1 I/ OS (resident memory of the root node). The progressive complexity is O(h)=O(logdN)O(h)=O(logdN). In general practice, the degree D is a very large number, usually more than 100, so h is very small (usually no more than 3). (H represents the height of the tree & d represents the degree of the tree, that is, the maximum degree of each node in the tree)

In summary, using B-tree as index structure is very efficient.

In red-black trees, the H is significantly deeper. Because nodes (parent and child) that are logically close may be physically far away and cannot take advantage of locality, the I/O progressive complexity of red-black Tree is also O(h), and the efficiency of red-black Tree is significantly lower than that of B-tree.

B+Tree is more suitable for storing external indexes. From the above analysis, it can be seen that the larger d is, the better the index performance is, and the upper limit of the outgo degree depends on the size of key and data in the node:

dmax=floor(pagesize/(keysize+datasize+pointsize))
Copy the code

Floor means round down. Since the data field is removed from the nodes in B+Tree, the nodes can have a larger outage and better performance.

MySQL storage engine

MyISAM and InnoDB storage engine are both B+Tree index structures, but there are some differences in their implementation.

6.1 MyISAM

MyISAM engine uses B+Tree as the index structure, and the data field of the leaf node stores the address of the data record. Here is the MyISAM index schematic:

Select Col1 as Primary key from MyISAM; select Col1 as Primary key from MyISAM; You can see that MyISAM’s index file only holds the address of the data record. In MyISAM, there is no difference in structure between primary and Secondary keys, except that the primary index requires a unique key, while Secondary keys can be repeated. If we create a secondary index on Col2, the structure of this index is as follows:

Also a B+ tree, the data field holds the address of the data record. Therefore, the index retrieval algorithm in MyISAM is to search the index according to THE B+Tree search algorithm. If the specified Key exists, the value of its data field is extracted, and then the corresponding data record is read with the value of the data field as the address.

MyISAM’s index is also called “non-clustered” to distinguish it from InnoDB’s clustered index.

6.2 the InnoDB

InnoDB also uses B+Tree as an index structure, but the implementation is quite different from MyISAM.

The first big difference is that InnoDB’s data files are themselves index files. MyISAM index files and data files are separate, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+Tree, and the data field of the leaf node of this Tree holds complete data records. The key of this index is the primary key of the table, so the InnoDB table data file itself is the primary index.

Above is a diagram of the main index (which is also a data file) of InnoDB. You can see that the leaf node contains the complete data record. This kind of index is called a clustered index. InnoDB requires tables to have primary keys (MyISAM does not have primary keys) because InnoDB data files themselves are aggregated by primary keys. If this is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key. MySQL automatically generates an implied field for the InnoDB table as the primary key. This field is 6 bytes long and of type long integer.

The second difference from MyISAM index is that InnoDB’s secondary index data field stores the value of the primary key of the corresponding record rather than the address. In other words, all InnoDB secondary indexes refer to the primary key as the data field. For example, here is a secondary index defined on Col3:

The ASCII code of English characters is used as the comparison criterion. Clustered index implementations make searching by primary key very efficient, but secondary index searches require retrieving the index twice: first, retrieving the secondary index for the primary key, and then using the primary key to retrieve records from the primary index.

6.3 the problem

InnoDB stores data only on the primary key leaf node of the index tree, but does not store data in other index trees.

It’s actually very simple, because InnoDB needs to save storage space. There may be many indexes in a table, and InnoDB will create index trees for each indexed field. If the index tree for each field stores specific data, the index data files for the table will become very large (extremely redundant). In terms of saving disk space, it’s not really necessary to store specific data in every field index tree, and it’s worth saving a lot of disk space at the expense of less query performance by taking this seemingly “redundant” step.

6.4 Comparison between MyISAM and InnoDB

InnoDB supports transactions, MyISAM does not. This is one of the big reasons MySQL changed its default storage engine from MyISAM to InnoDB.
InnoDB supports foreign keys, while MyISAM does not. Converting an InnoDB table with foreign keys to MYISAM will fail.
InnoDB is a clustered index, MyISAM is a non-clustered index. The files in the clustered index are stored on the leaf node of the primary key index, so InnoDB must have a primary key, which is very efficient. But secondary indexes require two queries, first to the primary key and then to the data through the primary key. Therefore, the primary key should not be too large, because if the primary key is too large, the other indexes will be too large. While MyISAM is a non-clustered index, data files are separated and indexes hold Pointers to data files. Primary and secondary indexes are separate.
InnoDB does not store the exact number of rows in a table. Select count(*) from table requires a full table scan. MyISAM uses a variable to hold the number of rows in the entire table, which can be read quickly.
InnoDB’s minimum lock size is row lock, MyISAM’s minimum lock size is table lock. An update statement locks the entire table, blocking all other queries and updates, and thus limiting concurrent access. This is one of the main reasons MySQL changed its default storage engine from MyISAM to InnoDB.