An Index is a data structure that helps MySQL obtain data efficiently. Database query is one of the most important functions of the database, we all hope that the speed of the query data can be as fast as possible, so the designer of the database system will optimize from the point of view of the query algorithm, this article to do a systematic combing of the index, I hope to help you.
Index classification can be carried out from many angles, the following are divided into three dimensions respectively from data structure, physical storage and business logic.
B+ tree index (O(log(n))) B+ tree index (O(log(n)
(2) Hash index can only meet the requirements of “=”,”IN” and “<=>” query, and cannot use range query. Its retrieval efficiency is very high, and the index can be located at one time, unlike b-tree index, which needs to access the page node from the root node to the branch node for many IO access. The FULLTEXT index is now supported by MyISAM and InnoDB engines
(4) R-tree index is used to create SPATIAL index for GIS data types
(1) Body content is sorted and stored according to a specific dimension, which is clustered index. The Innodb storage engine stores rows in the order of clustered index dimensions. Innodb tables are also called index tables. Because row records can be sorted by only one dimension, a table can have only one clustered index. (2) Non-clustered index indexes are described by binary tree data structure. Clustered indexes can be understood as follows: the leaf nodes of the index are data nodes. A leaf node that is not a clustered index is still an index node but has a pointer to the corresponding data block. Non-clustered index Index items are stored sequentially, but the corresponding contents of index items are stored randomly. Here’s an example:
create table student ( id INT UNSIGNED AUTO_INCREMENT, name VARCHAR(255), PRIMARY KEY(id), KEY(name) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; In this table, the primary key ID is the clustered index of the table, and the primary key name is the non-clustered index. Each row in the table is stored sorted by clustered index ID; Name =’Arla’; name=’Arle’; name=’Arle’; The name index table node is sorted by name and retrieves the primary key for each row of data. The clustered index table, sorted by primary key ID, retrieves the true contents of each row of data.
A primary key index is a special type of unique index that does not allow empty values
(2) Common index or single-column index (3) Multi-column index (compound index) A compound index refers to an index created on multiple fields. The index is used only when the first field is used in the query condition. Follow the leftmost prefix set when using composite indexes
(4) Unique index or non-unique index (5) Spatial index A spatial index is an index built on fields of spatial data types, which are GEOMETRY, POINT, LINESTRING, and POLYGON in MYSQL.
MYSQL is extended with the SPATIAL keyword to enable the creation of SPATIAL indexes using the same syntax used to create regular index types. Columns that create a spatial index must be declared NOT NULL. A spatial index can only be created in a table whose storage engine is MYISAM.
Second, the index creation way CREATE TABLE table_name [col_name data type] [unique | fulltext | spatial] [index | key] index_name [asc | desc] Unique | fulltext | spatial for optional parameters, respectively, says that the index and full-text index and spatial index; Col_name is the column for which the index is to be created. This column must be selected from multiple defined columns in the data table. Index_name Specifies the name of the index. This parameter is optional. If this parameter is not specified, col_name is the index value by default. Length is an optional parameter, indicating the length of the index. The index length can be specified only for a string field. Create table table_name(id int(11), name vARCHar (20), sex Boolean, INDEX(id) ); View table structure
show create table table_name; You can make the EXPLAIN statement to see if the index is being used
Explain select * from table_name where id = 1\G (2) create table index2(id int unique, name varchar(20), unique INDEX index_2(id asc) ); A full-text index can only be created on a char, vARCHar, or text field. Also, only the MyISAM storage engine supports full-text indexing.
create table idnex3( id int, info varchar(20), FULLTEXT INDEX index3_info(info) )ENGINE=MyISAM; Create table index4(id int, Subject varchar(255), index index4_st(subject(10))); Note here that the length of the subject is 255, but the index index4_st is only 10. The purpose of this is to improve the speed of the query. For character data, you can query only the preceding characters instead of all the information.
Create table index5(id int, name varchar(20), sex char(4), index index5_ns(name.sex)); create table index5(id int, name varchar(20), sex char(4), index index5_ns(name.sex)); Here we can see that the index_NS index has been created on the name and sex fields.
Create an index named index7_id on ids in table example0().
create index index7_id on example0(id); Create UNIQUE index index_name on table_name(name); Create FULLTEXT index index_name on table_name(info); Create INDEX index_name ON table_name(name(10)); Create INDEX index_name ON table_name(name,sex); Alter table create index indx_name; alter table create index indx_name
alter table table_name ADD INDEX index_name(name(20)); Alter table table_name ADD UNIQUE INDEX index_name(id); Alter table table_name ADD FULLTEXT INDEX index_name(info); Alter table table_name ADD INDEX index_name(name(4)); Alter TABel table_name ADD INDEX index_name(name.sex); DROP INDEX index_name ON table_name; At present, most database systems and file systems use B-tree or its variant B+Tree as the index structure, so how to maintain the index Tree?
Lookup is a very important concept in data structures and algorithms.
1. To search one by one; Simple implementation; Slow binary search: ordered; Simple; Requirements to be ordered, insert very slow HASH lookup: fast query; Take up space; Not very good for storing large-scale data binary lookup trees: inserts and queries are fast (log(n)); Unable to store large-scale data, complexity degradation balanced tree: to solve the BST degradation problem, the tree is balanced; When the nodes are very many, the tree is still very high multi-path search tree: one father multiple children node (degree); B-tree a b-tree is a multi-path search Tree (not binary) :
Define any non-leaf node to have at most M sons; And M > 2; The number of sons of the root node is [2, M]; The number of sons of non-leaf nodes except root nodes is [M/2, M]. Each node stores at least M/2-1 (rounded) and at most M-1 keywords; (at least 2 keywords) Number of keywords of non-leaf node = number of Pointers to son -1; Keywords of non-leaf nodes: K[1], K[2]… [M, K – 1); And K[I] < K[I +1]; Pointers to non-leaf nodes: P[1], P[2]… P [M]; Where P[1] points to the subtree whose keyword is less than K[1], P[M] points to the subtree whose keyword is greater than K[m-1], and other P[I] points to the subtree whose keyword belongs to (K[i-1], K[I]). All leaf nodes are located in the same layer; Each k corresponds to a data. For example :(M=3) is equivalent to a 2-3 tree. A 2-3 tree is a tree where each node has either 2 children and 1 data element, or 3 children and 2 data elements, while a leaf node has no children and 1 or 2 data elements.
BTree_Search(node, key) { if(node == null) return null; foreach(node.key) { if(node.key[i] == key) return node.data[i]; if(node.key[i] > key) return BTree_Search(point[i]->node); } return BTree_Search(point[i+1]->node); } data = BTree_Search(root, my_key); (1) The characteristic keyword set of B-tree is distributed in the whole tree; Any keyword appears and only appears in one node; The search may end at a non-leaf node; Its search performance is equivalent to a binary search in the whole set of keywords. Automatic hierarchical control; (2) Self-control of B-tree Each internal node in b-tree will contain a certain number of key values. Typically, the number of key values is selected between D and 2D. In practice, key values take up most of the space in a node. Factor 2 guarantees that nodes can be split or combined. If an internal node has 2d keys, the process of adding a key to the node will split the 2d key with 2 D keys and add the key to the parent node. Each split node requires a minimum number of keys. Similarly, if an internal node and its neighbor both have D keys, a key will be deleted by merging it with the neighbor. Deleting this key results in the node having D-1 keys; The merge with the neighbor adds d keys, plus one from the parent of the neighbor node. The result is a fully populated 2D key value.
(3) The construction process of the B-tree the following is the sequence of insertion into the B-tree
1 7 8 8 6 3 6 21 5 15 15 6 32 23 45 65 7 8 65 4
The number of subtree Pointers and keywords of non-leaf nodes is the same. Subtree pointer P[I] of non-leaf nodes, pointing to the subtree whose keyword value belongs to [K[I], K[I +1]) (b-tree is the open interval); add a chain pointer to all leaf nodes; all keywords appear in leaf nodes; inner nodes do not store data, but only store keys as follows :(M=3)
(1) Characteristics of B+ : all keywords appear in the linked list of leaf nodes (dense index), and the keywords in the linked list are exactly ordered; Impossible to hit on non-leaf nodes; The non-leaf node is equivalent to the index of the leaf node (sparse index), and the leaf node is equivalent to the data layer storing (keyword) data. More suitable for file indexing systems; (2) The construction process of B+ tree is as follows: insert into B+ tree in sequence
1 7 8 8 6 3 6 21 5 15 15 6 32 23 45 65 7 8 65 4
Therefore, the disk I/O consumption of index lookups is several orders of magnitude higher than memory access, so the most important indicator of a data structure as an index is the progressive complexity of disk I/O operations during lookups. In other words, indexes should be structured to minimize the amount of disk I/O needed to access a lookup.
Suppose each disk block can store exactly one node of the B-tree (exactly two file names). A BTNODE node represents one disk block, and a subtree pointer is the address of another disk block.
(1) Simulate B+ tree search process, let’s simulate the search process of file 29:
(2) the query efficiency of B+-tree is more stable because the non-endpoint is not the node that ultimately points to the file content, but the index of the keyword in the leaf node. So any keyword lookup must take a path from root to leaf. The length of all keyword query paths is the same, resulting in the same query efficiency of each data.
Index query is an important record query method in the database, whether to enter the index and set up an index on those fields should be combined with the actual database system to consider the query requirements, the following gives some general principles in practice:
Index fields that are often used as filters; SQL > create index on GROUP BY, ORDER BY; It is unnecessary to build indexes on fields with fewer different values, such as gender fields; Avoid indexing for frequently accessed columns; Build indexes on the columns (primary/external) used for the join; Building compound indexes on frequently accessed columns, but it should be noted that the order of building compound indexes should be determined according to the frequency of use. Non-clustered indexes are created by default, but clustered indexes are best considered if they contain a finite number (not very many) of unique columns; Conduct a wide range of queries; Making full use of indexes can reduce the number of I/0 table scans and effectively avoid searching the entire table. Data columns commonly used in WHERE clauses; Order by, group by, distinct; If a composite index is created, the index fields must be in the same order as those following the keywords; otherwise, the index will not be used. For columns that are rarely involved in a query, do not index columns with a high number of duplicate values; Do not index columns of data types defined as text, image, and bit; Avoid indexing for frequently accessed columns; Limit the number of indexes on a table. For a table with a large number of update operations, the number of indexes should not exceed 3 and at most 5. Indexes improve access speed, but too many indexes can affect data update operations. For composite indexes, the index is built based on the frequency of field occurrences in the query condition. In a composite index, records are first sorted by the first field. For records with the same values in the first field, the system sorts the records by the values in the second field, and so on. Therefore, the composite index can be used only when the first field of the composite index appears in the query condition. Therefore, the system can maximize the use of the composite index and play the role of the index by placing the field with high application frequency in front of the composite index. A single index scan can only be used with conditions that use operators in the indexed field AND index operator class, AND those conditions are joined by AND.
If there is an index on (a, b), then conditions like WHERE a = 5 AND b = 6 can use the index, but conditions like WHERE a = 5 OR b = 6 cannot use the index directly.
A query like WHERE x =42 OR x = 47 OR x = 53 OR x = 99 can be decomposed into four independent scans on X, each scan using a condition, and finally the results of these scans OR are grouped together to produce the final result.
As another example, if we have separate indexes on x AND y, a query like WHERE x = 5 AND y = 6 can be decomposed into several clauses that use separate indexes, AND then the results can be combined to produce the final result.
If there is an OR in a condition, even if there is an index in the condition, it will not be used (this is why the or is rarely used). For multi-column indexes, not the first part (the first part), the index will not be used. If mysql estimates that a full table scan is faster than an index, do not use the index. If mysql estimates that a full table scan is faster than an index, do not use the index. An index on two or more columns is called a composite index.
For composite indexes: Mysql uses fields in the index from left to right. A query can use only a portion of the index, but only the leftmost portion. For example, the index is the key index (a,b,c). A, can support a | b | a, b, c three combinations are looking for, but does not support b, c to look up. An index is useful when the leftmost field is a constant reference.
So when creating a composite index, you should carefully consider the order of the columns. Composite indexes are useful when searching for all or only the first few columns in an index; A composite index is not useful when a search is performed only on any of the following columns.
How to check the use of the index here record two ways, respectively
Show status like ‘Handler_read%’; You may note:
Handler_read_key: The higher this value is, the better. The higher this value is, the number of times the index is queried. Handler_read_rnd_next: The higher this value is, That query inefficient + — — — — — — — — — — — — — — — — — — — — — — – + — — — — — — — — — — — — — — + | Variable_name Value | | + — — — — — — — — — — — — — — — — — — — — — — – + — — — — — — — — — — — — — — + | Handler_read_first | 153 | | Handler_read_key | 364 | | Handler_read_next | 425 | | Handler_read_prev | 598 | | Handler_read_rnd | 605 | | the Handler_read_rnd_next | 860571 | + — — — — — — — — — — — — — — — — — — — — — — – + — — — — — — — — — — — — — – + 6 rows in the set (0.00) SEC) ———————————————— By analyzing these values, we can view the current index usage:
Handler_read_first: the number of times the first item in the index is read. If it is high, it indicates that the server is performing a lot of full index scans; For example, SELECT col1 FROM foo, assuming col1 has an index (the lower the value, the better). Handler_read_key: If the index is working, this value represents the number of times a row is read by the index value, while a lower value indicates a modest performance improvement because the index is not used very often (a higher value is better). Handler_read_next: Number of requests to read the next line in key order. This value is increased if you query indexed columns with range constraints or if an index scan is performed. Handler_read_prev: Number of requests to read the previous row in key order. ORDER BY… ORDER BY… DESC. Handler_read_rnd: Number of requests to read a line based on a fixed position. This value is higher if you are executing a large number of queries and need to sort the results. You may be using a lot of queries that require MySQL to scan the entire table or your join is not using keys correctly. This high value implies inefficiency and should be remedied by indexing. Handler_read_rnd_next: Number of requests to read the next line in the data file. This value is higher if you are doing a lot of table scans. This usually means that your table index is incorrect or that a written query does not utilize the index. 2. Check the schemA_unused_INDEXES library for useless indexes in the SYS library.
root@localhost [sys]>select * from schema_unused_indexes; +——————-+————-+————+ | object_schema | object_name | index_name | +——————-+————-+————+ | sysbench_testdata | sbtest1 | k_1 | | sysbench_testdata | sbtest10 | k_10 | | sysbench_testdata | sbtest3 | k_3 | | sysbench_testdata | sbtest4 | k_4 | | sysbench_testdata | sbtest5 | k_5 | | sysbench_testdata | sbtest6 | k_6 | | sysbench_testdata | sbtest7 | k_7 | | sysbench_testdata | sbtest8 | k_8 | | Sysbench_testdata | sbtest9 | k_9 | + — — — — — — — — — — — — — — — — — — – + — — — — — — — — — — — — – + — — — — — — — — — — — — + 9 rows in the set (0.00 SEC) EXPLAIN shows how mysql uses indexes to process select statements and join tables. It helps to choose better indexes and write better queries.
Create a new table,
CREATE TABLE IF NOT EXISTS article (id int(10) unsigned NOT NULL AUTO_INCREMENT, author_id int(10) unsigned NOT NULL, category_id int(10) unsigned NOT NULL, views int(10) unsigned NOT NULL, comments int(10) unsigned NOT NULL, title varbinary(255) NOT NULL, content text NOT NULL, PRIMARY KEY (id) ); Execute the query,
EXPLAIN SELECT author_id FROM article WHERE category_id = 1 AND comments > 1 ORDER BY views DESC LIMIT 1
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: article type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3 Extra: Using where; Using filesort 1 row in set (0.00 SEC) type is ALL Using filesort appears in Extra, which is the worst case scenario.
Type: this is the important column that shows what type of connection is used. The best to worst join types are const, eq_reg, ref, range, indexHE, and ALL possible_keys: shows possible indexes that can be applied to this table. If empty, there is no possible index. You can select an appropriate statement from the WHERE statement for the relevant domain
Key: indicates the actual index. If NULL, no index is used. In rare cases, MYSQL will select an index that is underoptimized. In this case, USE INDEX (indexName) in the SELECT statement to force the USE of an INDEX or IGNORE INDEX (indexname) to force MYSQL to IGNORE INDEX key_len: the length of the INDEX used. Ref: Shows which column of the index is being used, if possible, a constant rows: The number of rows that MYSQL considers necessary to check to return the request data Extra: Additional information about how MYSQL parses the query. We’ll discuss this in Table 4.3, but the bad examples you can see here are Using temporary and Using filesort, which means MYSQL cannot use indexes at all, resulting in slow retrieves. Including (from left to right, from worst to best) : | | All index | range | ref | eq_ref | const, system | null |
The system table has only one row: the SYSTEM table. This is a special case of the const join type const: the maximum value of a record in the table matches the query (the index can be a primary key or a unique index). Since there is only one row, this value is actually a constant, because MYSQL first reads this value and treats it as a constant. It uses ref when the query uses all of the index primary or unique keys: this join type occurs only when the query uses keys that are not unique or primary keys or parts of those types (for example, using the leftmost prefix). For each row union of the previous table, the entire record is read from the table. This type depends heavily on how many records are matched against the index – the less the better range: This join type uses the index to return rows in a range, such as what happens when you look up something using > or < index: ALL: This join type does a full scan of each previous record union (better than ALL, because indexes are generally smaller than table data). ALL: This join type does a full scan of each previous record union, which is generally bad and should be avoided as much as possible
Distinct: once MYSQL finds a joint match for a row, it does Not search. Not exists: MYSQL optimizes LEFT JOIN. Once it finds a row matching the LEFT JOIN standard, it no longer searches Range Checked for each Record (index map:#). MYSQL checks which index to use and uses it to return rows from the table. This is one of the slowest connections Using indexes. Using filesort: When you see this, the query needs to be optimized. MYSQL takes additional steps to discover how to sort the rows returned. It sorts all rows based on the connection type and the row pointer that stores the sort key value and matching condition for all rows. Column data is returned from a table that only uses the information in the index without reading the actual action. This occurs when all columns requested for the table are part of the same index. In this case, MYSQL needs to create a temporary table to store the results. This usually happens when ordering BY on different sets of columns, not GROUP BY. Using WHERE uses the WHERE clause to restrict which rows will match the next table or be returned to the user. This can happen if you do not want to return ALL rows in the table and the join type is ALL or index, or if the query has a problem explaining the different join types (in order of efficiency)
Pay attention to xiaobian, follow-up update more dry goods!