What is the index?

A database index is a sorted data structure in a database management system to assist in quick queries to update the data of tables in a database. Indexes are usually implemented using B trees and variants of B+ trees (mysql’s common index is B+ trees).

The benefits of creating indexes

(1) You can create indexes to improve system performance during query

② By creating unique indexes, the uniqueness of each row in a database table can be guaranteed

③ When grouping and sorting clauses are used for data retrieval, the time of grouping and sorting in query can be reduced

Disadvantages of creating indexes

① Creating and maintaining indexes takes time, and the time increases with the amount of data

(2) Indexes need to occupy physical space. If clustering indexes are to be established, more space is needed

③ It takes time to add, delete, and modify data in a table because indexes are maintained dynamically

Which columns should be indexed on

① On columns that need to be searched frequently

② on the column that is the primary key

③ Often used in connected columns, these columns are mainly some foreign keys, can speed up the connection

④ columns that often need to be searched by scope

⑤ columns that often need to be sorted

⑥ is often used in the column above the WHERE clause

Which columns should not be indexed on

Select columns that are rarely used in the query

② For columns with few data values. For example, the gender column of the personnel table, the bit data type column

③ For columns defined as text,image. Because of the large amount of data in these columns ④ when the modification performance is much greater than the search performance. Because when indexes are added, search performance is improved, but modification performance is reduced

Hashes are faster than trees, so why should the index structure be a tree?

For hashes, such as HashMap, the average time complexity of query/insert/modify/delete is O(1); Trees, such as balanced binary search trees, the average time complexity of query/insert/modify/delete is O(lg(n)); A hash index is faster than a tree index, so why is the index structure designed to be a tree? It is true that hashed indexes are faster for SQL requests with single-row queries, but for grouped, sorted, and compared SQL queries, the time complexity of hashed indexes degrades to O(n), while the “ordered” nature of trees can still maintain O(log(n)) efficiency. Why do database indexes use B+ trees?

Binary tree

When the data volume is large, the height of the tree is high, but the query is slow when the data volume is large. Each node stores a record, which may result in many disk IO for a query;

B tree

Is no longer characterized by binary search, but m-fork search, leaf nodes and non-leaf nodes store data. B-trees were created as data structures to implement indexes because they exploit the “locality principle” perfectly. What is the locality principle? The logic of locality principle is as follows :(1) memory read and write blocks, disk read and write slowly, and much slower; (2) Disk prefetch: Disk read and write does not read on demand, but prefetch on page. Data is read one page at a time and more data is loaded each time. If the data to be read in the future is in this page, disk I/O can be avoided and efficiency can be improved. Voiceover: Typically, a page of data is 4K for an operating system and 16K for MySQL. (3) The principle of locality: software design should try to follow the “data read concentration” and “use a data, the probability of using its nearby data”, so that disk prefetch can fully improve disk IO;

Why are B-trees good for indexing?

(1) Due to the bifurcation of M, the height can be greatly reduced; (2) Each node can store J records. If the node size is set to page size, such as 4K, the prefetch feature can be fully utilized to greatly reduce disk IO;

B + tree

(1) Non-leaf nodes no longer store data, and data is only stored on leaf nodes of the same layer; Voice-over: In a B+ tree, the path from the root to each node is the same length, which is not the case in a B tree. (2) Between leaves, a linked list is added to obtain all nodes, and middle-order traversal is no longer needed; These improvements make B+ trees have better characteristics than B trees :(1) range search, after locating min and Max, the middle leaf node is the result set, without middle order backtracking; Voice-over: Range queries are used a lot in SQL, and this is the biggest advantage of B+ trees over B trees. (2) The leaf node stores the actual record line, which is relatively close and suitable for disk storage with large data volume; Non-leaf node storage record PK, used for query acceleration, suitable for memory storage; (3) If the non-leaf node does not store the actual record, but only stores the KEY of the record, then the B+ tree can store more indexes with the same memory;