Tips: It’s long and not very well formatted

Comrades who think it is worth reading it, I suggest that they continue reading it when they are free

Thank you to Mr. Qingshan of Gupo College for literacy

  • version

    At present, the commonly used version of enterprise development: 5.5, 5.6, 5.7

    5.5 released in 2010, Innodb became mysql’s default storage engine from this release, with commit/rollback, ACID compatibility, row-level locking, etc

    5.6, released in 2012, began support for full-text indexing and optimization for EXPLAIN statements (support for INSERT UPDATE delete statements)

    5.7, released in 2015, is optimized for performance and ease of use

    Mysql 8.0 is now officially launched as MySql8.0, which is 2 times faster than 5.7, with a number of improvements and performance optimizations

  • InnoDB indexing principle

    What is a database index? A database index is a sorted data structure in a database management system to assist in rapid query and update of data in a database table

The indexStore index field 并 Indicates the location of the service data on the disk

= = = = = = = = = = = = = = = = = = = = Innodb storage structure index beginning = = = = = = = = = = = = = = = = = = =

Guess: Querying data from a table without an index requires a row scan of the table

What kind of storage structure is used at the bottom of the index to achieve fast query?

===> Binary Search Tree

2. Binary search tree requires that the nodes of the left subtree be smaller than the parent node, and the nodes of the right subtree be larger than the parent node 3. The data inserted and queried by binary search tree is faster than the convenience of row by row. However, there is a problem that the efficiency of search is related to depth. In extreme cases (inclined tree, only left node or right node), the complexity will degenerate to O (n), as shown in the figure

Conclusion: Binary search tree has serious defects and is not suitable for storage structure of index

So is there a tree that’s less of a depth difference between the left and right subtrees, that’s more balanced?

Balanced Binary Search Trees (AVL Tree)

1. Balanced binary tree On the basis of binary search tree, the depth absolute value of the left and right subtrees cannot exceed 1 2. Balance is achieved by turning left or right

If AVL balanced binary tree is used to store the index, each tree node stores one row of the index, as shown in the figure (right subtree error, ignore).

Index data is also stored in disks. If you want to search for service data based on an index, you must load the index into the memory first and then search the index in the memory for data comparison.

At the operating system level, loading data from disk to memory follows the principle of locality, which states that when a CPU accesses memory, both instructions and data, the accessed storage units tend to be clustered in a small contiguity area.Copy the code

Loading data from disk to memory takes time. To minimize I/O operations, the operating system generally adopts the prefetch mode. The ** prefetch length is usually pages (page, 4kb; Pages in Innodb storage engine are integer multiples of 16KB) **.

In the figure above, each loading of index data in a node of AVL tree into memory takes IO, but the key value + data disk address + child node reference is almost impossible to be 16KB, which is a waste of space. Moreover, in the era of mechanical hard disks, each disk IO takes about 10ms to address. The more AVL tree nodes are accessed, the more time consumed, and the efficiency of millions of data will be unpredictably low.

Conclusion: If we use AVL balanced binary tree to store index efficiency is not good enough, I/O times increase with tree depth, and tree nodes waste space.

To solve the problem of AVL balanced binary tree, is there a tree whose depth is not so deep that each tree node can store more index data?

Balanced Tree ===> Balanced Tree

1. The contents of nodes in B tree are the same as those in AVL tree, but one node in B tree stores multiple [key values + data disk addresses + child node references]. 2. The number of subtrees possessed by nodes is called Degree. Assuming that there are N key values in each node, Degree = N + 1 3.AVL ensures balance by left-rotation/right-rotation, while B-tree maintains balance by splitting/merging 4. With the same amount of data, the depth of B tree is far less than that of AVL tree, and nodes with B number store more index dataCopy the code

So far, if the storage structure of index B is used as the index, it is quite necessary. The tree depth is not particularly large, which means the IO times are reduced, and each node does not waste space by storing only one index data like AVL tree. It may only need to interact with disk for three or two times to find the data to be sought. MySql uses B as its index structure.

In fact, Mysql, including many file systems and other database systems (oracle,sqlserver, etc.) use an evolved version of B Tree: B+Tree

===> (B+Tree)

The degree of B+ tree (number of subtrees) is equal to the number of keys stored by nodes. 2. A node of B+ tree is a page in Innodb engine =16K. This means that all data lookups must take the same number of IO to reach the leaf node before they can be retrieved. At the same time, non-leaf nodes can store more index data ([key value + child node reference]). 3.B+ tree leaf nodes add Pointers to adjacent leaf nodes, forming an ordered linked list structure, improving the efficiency of full-table sequential scanning or range query (compared with range query, Instead of searching down from the root node each time, this is done through an ordered linked list of leaf nodes.)Copy the code

B+Tree compared with B Tree:

  • B+Tree can solve any problem that B Tree can solve
  • B+Tree Enables the library to scan tables
  • B+Tree The disk read and write capability is higher
  • B+Tree has better sorting capability
  • B+Tree efficiency is more stable

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Assume that the size of a record is 1KB and the primary key bigint is 8bytes

In Innodb, Pointers (Pointers to subtrees) have 6bytes, so each index unit :[key + pointer = 14bytes]

With the page size of 16KB in Innodb storage engine, non-leaf nodes can store index cells (Number of node keys)= (16 * 1024) / 14 = 16384/14 = 1170

Degree (number of subtrees)= number of node keys = 1170

, if the tree depth is 2, the retrieval of 2000W data can be completed, as shown in the figure:

Innodb data storage

2. Ibd stores data and indexes (in Innodb, indexes are data and data are indexes). All data is stored in the leaf node of the tree with primary key index B+, which organizes data storage by primary key indexCopy the code

B+ tree leaves store business data in Innodb, while in MyIsam they store business data addresses

Clustered index (clustered index)

1. The logical order of index keys and values is consistent with the physical storage order of complete data. This kind of index is clustered index 2. In Innodb, primary key indexes are clustered indexes, and all other indexes are non-clustered indexesCopy the code

Secondary index (secondary index)

1. The leaf node of the secondary index stores the key value of the primary key index. 2.Copy the code

The dispersion of the columns

Column_name :count(distinct(column_name)):count(*)Copy the code

Fields with low discretization (duplicates to too many columns, such as gender) do not need to be indexed, even though the Optimizer ** may decide not to use index retrieval

Union index – leftmost matching principle

alter tabel t add INDEX `comidx_a_b_c` (a,b,c);
Copy the code

where a=? .

where a=? and b=?

where a=? and b=? and c=?

where c=? and a=? and b=? (acb, cba,The Optimizer optimizes SQL)

You can use a federated index in all of these cases

The first brother cannot die, the middle brother cannot be broken

Cover index

When the column of the query is contained in the used index, it is directly obtained from the index, and does not need to return to the table, which improves efficiency. This is called overwriting index

When the EXPLAIN statement is used to query the SQL statement execution plan, if “Using Index” appears in the extra field in the explain statement execution result, the SQL statement overwrote the Index

Index basic usage precautions

Index column length should not be too long. The more the index is, the better. Too many indexes will affect the performance of the updated table (new, delete, update). Index Condition Pushdown (ICP) : NOT Like,! =, <>, NOT IN, index may fail try to specify the query field, use select *Copy the code

Whether the index is used is ultimately determined by the Optimizer. In practice, it is recommended to check and optimize SQL statements by combining the query plan of the Explain statement to view SQL statements

= = = = = = = = = = = = = = = = = = = Innodb storage structure * * * * index finished = = = = = = = = = = = = = = = = = = =