This is the 23rd day of my participation in Gwen Challenge
Search “Program Yuan Xiaozhuang” on wechat, and you will get a full set of Python full stack tutorials, as well as the constantly updated e-books and interview materials prepared by Xiaozhuang
preface
In the table query operation we mentioned the concept of an index, this article will introduce the index implementation principle and a simple introduction to B+ tree. The MySQL index is often asked in interviews
The index profile
An index is like a table of books. It can help us locate the data we need quickly. It can optimize the query function.
Introduction of B + tree
Tree graph is a kind of data structure, which is composed of N (N >=1) finite nodes to form a set with hierarchical relationship. It’s called a tree because it looks like an upside-down tree, meaning it has roots up and leaves down. Each node has zero or more children; Nodes that have no parents are called root nodes; Each non-root node has one and only one parent; In addition to the root, each child can be divided into multiple disjoint subtrees.
B+ tree is a binary search tree, and then evolved from balanced binary tree, B tree, can control the height of the tree to control the I/O times of searching data. The number of I/OS depends on the level of the B + number.
Index implementation Principle
In the database of B + tree height, be in commonly 2 to 4 layer is to find a record only need at most 2 to 4 IO operations, database B + tree index is divided into clustering index and auxiliary index, aggregation index and auxiliary index, its internal is the form of a B + tree, which is balanced, different is the leaf node to store data.
Clustering index
The building premise of cluster index
MySQL’s InnoDB engine uses the primary key column as the clustered index column when creating a table, similar to the title of a book catalog. If no primary key column is specified when creating a table, innoDB automatically selects a unique column as the clustered index column. If neither primary key column is specified, innoDB will generate a hidden clustered index.
The role of clustering indexes
With clustered index, the inserted rows in the same area will be stored in disk in the order of the primary key value, which provides conditions for sequential IO. When primary key column is used as the query condition, it can help us quickly find data. Theoretically, when B+ tree is three layers, It takes only 3 IO to lock data pages in leaf nodes based on primary key columns.
Clustering index building process
Leaf node: The data page (page, 16KB) where the data rows are located. Since the clustering index stores the data in order, the data page of the leaf node is generated directly, and the data of the leaf node is the complete data of each record.
Branch node: the range of primary key values is extracted as the branch node and the range of leaf nodes is saved.
Root node: Generates root nodes based on branch nodes and saves the range of branch nodes.
Secondary index
The role of secondary indexes
A secondary index is an index that uses ordinary columns and needs to be artificially created, similar to a subheading under a big heading in a book catalog. The secondary index is used to optimize queries that use columns other than the non-clustered index column as conditions. But the disadvantage of secondary index is when using auxiliary index query data due to secondary index leaf node is only part of the data, sometimes need to be back to the table, operation is also need through the auxiliary index of the query results to complete data is obtained by clustering index line, it is important to note that create secondary index, to insert the data in the table speed will slow down, Each index requires disk space, and the more indexes, the more disk space is required.
Secondary index building process
Leaf node: take out the cluster index column and the secondary index column, sort according to the secondary index column, and generate the data page of the leaf node, which is partial data and only contains the data of the secondary index and the cluster index.
Branch node: The range of the secondary index column is extracted as a branch node to save the range of the leaf node.
Root node: generates the root node according to the branch node, points to the data page of the branch node, and saves the scope of the lower branch node.
The type of secondary index
Secondary index is divided into single column index and joint index, here is a detailed introduction to the joint index. A joint index is a secondary index composed of multiple non-primary key columns. For example, to create a joint index of (A, B, c), it is equivalent to creating a, AB, and ABC indexes.
Using combined index need to follow the principle of the most left, in the generated nodes, only the following as a node, therefore must include the following query conditions, thus establish joint index must choose duplicate values when the least listed as the following, and as long as the query conditions when the query contains most of the following would go joint index. SQL > select * from associative index;
-- syndication index full coverage:
select * from t1 where a= b= c= / a in b in c in /b= a= c=
-- Partial coverage:
select * from t1 where a= b= / a= / a= c=
select * from t1 where a= b< = > = like and c=(select ab index only)-- Not covered:
select * from t1 where b= c=
Copy the code
Back to the table operation
Table rows in MySQL are eventually stored on many pages. Innodb storage engine organizes table rows into contiguous pages of each section according to clustered index. These contiguous pages become the leaf nodes of clustered index, i.e. the leaf nodes of clustered index store the original table data. So a back table operation is a back cluster index.
Constructing auxiliary index, is the primary key and auxiliary index column value according to the auxiliary index column sorting building auxiliary index B tree structure, when using auxiliary index as a condition of the query, will first scan auxiliary index of B + tree, if secondary index can completely cover our query results back to the table don’t need for operation, if you can’t completely cover, The primary key value obtained can only be scanned back to the cluster index (back table) to obtain the desired result.
The index operation
With all this said, how do you view and create indexes? Look at the following SQL:
Select * from index;
descTable name;-- PRI Cluster index MUL auxiliary index UNI unique index
show index fromThe name of the table.-- More detailed index information
Create index ();
alter tableThe name of the tableaddIndex idx_na(column name);-- Single-column index,idx_na is just the name of the index
alter tableThe name of the tableaddThe index idx_n_c_ (column name1And the column name2); -- Syndication index
alter tableThe name of the tableaddIndex idx_d(column name (5)); -- Prefix index built from the first five characters
Drop index
alter tableThe name of the tabledrop index idx_na;
Copy the code
Index specification
In order to make the index more efficient, when creating the index, we must consider which fields to create the index and what type of index to create. Therefore, index design is very important in the development of the project. The following lists some indexes:
Create table must have a primary key column, generally an unrelated column;
SQL > create secondary index where, order by, group by, join on;
③ Limit the number of indexes. More indexes is not better.
④ You can use percona Toolkit to delete unused or rarely used indexes.
⑤ If an index is added to a table with a lot of data, it is recommended to perform the operation during off-peak hours.
⑥ Try not to create an index on a column whose value is frequently updated, which may cause index failure.
Not going to the index
Although sometimes create clustered index also create secondary index, but sometimes after the query speed is still very slow, the reason is very likely to be the query statement did not go index, the following xiaohuang also lists some do not go index of the case of the query, for your reference:
(1) There is no query condition, or the query condition is not indexed;
② The result set of the query exceeds 15-30% of the total number of rows in the table, and the optimizer feels that there is no need to go to the index, which is related to the pre-read ability and parameters of the database;
(3) Index failure, statistical data is not true, the index has the ability to maintain, for the table content changes more frequently, statistical information is not accurate, there may be index failure, generally delete reconstruction.
④ The query condition uses the function on the index column, or the index column to perform operations, including (+, -, *, /,! Etc.);
(5) Implicit conversion leads to index invalidation, which should be paid attention to. This is a common mistake in development. Check whether the data type is consistent with the data type defined in the table.
⑥< >, not in index (secondary index, separate > < in May go index may not go, try to combine the business to add limit,or and IN use different conditions to test, choose which specific scheme).
Mysql > select * from ‘%_’ where ‘%_’ = ‘%_’ where ‘%_’ = ‘%_
Execution plan analysis
When an SQL statement is executed, the parser generates the execution plan and the optimizer selects the execution plan. Can the developer see the execution plan of an SQL statement? The answer is yes, through execution plans, and by analyzing execution plans, developers can optimize SQL statements as needed.
The execution plan is the final execution plan obtained by the optimizer according to the built-in cost calculation algorithm. The execution plan can be viewed through the following SQL statement:
-- Either way
desc sqlStatements; explainsqlStatements;+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
| 1 | SIMPLE | city | NULL | ALL | NULL | NULL | NULL | NULL | 4046 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
Copy the code
The developer can then analyze the results according to the above execution plan. How to analyze the results depends on the meaning of each field
Table: table involved in the execution plan. For multi-table queries, it helps to determine the slow table for optimization.
Type: indicates the query type, including full table scan and index scan.
Mysql > select * from '%&'; mysql > select * from '%&'; mysql > select * from '%&'; Index <range<ref<eq_ref<const(system) index<range<ref<eq_ref<const(system) index<range<ref<eq_ref<const(system) index<range >= <= like in or between and For id queries not in! Eq_ref: in a multi-table join, the join condition of an undriven table is primary key or unique key const(system): cluster index equivalence queryCopy the code
Possible_keys: Possible_keys: All possible indexes, including all indexes related to this query
Key: indicates the last selected index
If there are more than one condition clause in the query, use the joint index. Single-column index does not take effect. select * from city where countrycode='CHN' order by population; Index (Countrycode, population) does not work Copy the code
Key_len: the coverage length of the federated index. It evaluates the application length of the federated index. For federated indexes such as index(a, B, c), key_len can help determine how many parts of the federated index the query missed.
Key_len Calculation rule: in the case of full coverage: key_len = A length + B length + C length The length depends on the data type and character set, indicating the maximum value stored in a column. There is no constraint. Not NULL, a single byte is required to store whether the column value is non-empty
Stored value length of numeric column:
Number type/Non-null (not NULL) is no tinyint 1 1 + 1 int 4 4 + 1 bigint 8 8 + 1 For example, a utF8 character set contains a maximum of 3 bytes. Varchar requires 1-2 bytes to store the length of bytes
Character type/Non-null (not null) is no char(10) 3 * 10 3 * 10 + 1 varchar(10) 3 * 10 + 2 3 * 10 + 2 + 1
Rows: Rows to be scanned for this query
-Blair: Extra information
Using filesort: indicates that file sorting is used in this query, indicating that indexes are not used in the sorting operation
conclusion
The article was first published on the wechat public account Program Yuan Xiaozhuang, and synchronized with nuggets and Zhihu.
The code word is not easy, reprint please explain the source, pass by the little friends of the lovely little finger point like and then go (╹▽╹)