The index
Indexes are data structures that help mysql efficiently retrieve data. The database index is like the table of contents of a book, which can speed up the query of the database.
Advantages: Improve the speed of database retrieval, reduce the IO access times of database; Sort data by index, the index (single column, combined) columns will be automatically sorted; Where index columns are processed at the storage layer, index push down (ICP); Overwrite index, select column is index, do not return table.
Disadvantages: Occupies disk space; Add, delete and modify operations need to maintain the index; Mysql not only stores data, but also stores or modifies index data.
classification
Single index
An index contains only one column, but a table can have multiple single-column indexes.
Normal index
The basic index type in mysql, which has no restrictions and allows the insertion of duplicate and null values into columns in the defined index.
The only index
The values in an index must be unique. The difference from a primary key index is that the column values in an index can be null.
The primary key index
Special index type that does not allow null values. Innodb storage engine must have a primary key index. If you do not specify a primary key index when creating a table structure, unique columns are used as primary key indexes by default. If no unique column exists, the pseudo-column is automatically generated as the primary key index. Therefore, it is better to specify the primary key index in the table structure of InnoDB storage engine. It is better not to use uUID as the primary key, which is not easy to sort and takes up space.
Composite index
Create an index for multiple fields in a table. The index is used only if the leftmost prefix condition is met during query.
The full text indexing
Full-text indexes, available only in myISAM storage engine, can only be used on CHAR,VARCHAR, and TEXT fields. It is not recommended because it takes up more physical space and reduces the maintainability of modifications.
use
Create indexes
# single-column index - plain index
create index index_name on table(column(length));
alter table table_name add index index_name (column(length)) using btree;
# single-column index - unique index
create unique index index_name on table(column(length));
alter table table_name add unique index index_name(column(length)) using btree;
# Single-column index - full-text index
create fulltext index index_name on table(column(length));
alter table table_name add fulltext index_name(column(length)) using btree;
# composite index
alter table demeter add index index_title_time(title(50), time(10)) using btree;
Copy the code
Remove the index
drop index index_name on table;
Copy the code
View index
show index from table_name;
Copy the code
The principle of
The storage structure of the index
Storage structure
- Indexes are implemented in storage engines, and different storage engines use different indexes
- Myisam and InnoDB storage engines only support B+TREE indexes
- The Memory/Heap storage engine supports hash and BTREE indexes
B tree and B plus tree
B tree is defined
B tree is also called balanced multipath search tree. A b-tree of order M (meaning a node has at most m subtrees)
- Each node in the tree contains at most m children (m>=2)
- Every node except root and leaf has at least [ceil(m / 2)] children (ceil is an upper bound function)
- If the root is not a leaf, there are at least two children
B tree here
Based on the hierarchy shown above, how many IO will occur when 29 data is queried:
-
Find disk block 1 and load it into memory (one IO)
-
If 29 is greater than 17 and less than 35, it is the disk block pointed to by P2. Find disk block 3 and load it into memory (one IO)
-
If 29 is greater than 26 and smaller than 30, it belongs to the disk block pointed to by the P2 pointer of the current disk block. Find disk block 8 and load it into memory (one IO)
-
A matching 29 was found in disk block 8
Therefore, three I/O operations are performed during the search, greatly improving the data search performance.
B + tree here
A B-tree is a multi-fork balanced lookup tree designed for disks or other storage devices. The following is an example diagram of its data structure:
- The height of a B tree is usually 2-4. The height of a B tree directly affects the I/O read and write times.
- If it’s a three-tier structure — it can support up to 20 gigabytes of data; If it’s a four-tier structure — the data supported can reach tens of T.
The difference between B tree and B+ tree:
- Whether a non-leaf node stores data, B tree stores data, B+ tree does not store data.
- B tree both non-leaf and leaf nodes store data.
- Only leaf nodes of a B+ tree store data, and the data is stored on a row, and the data is pointed to, that is, in order.
Non-clustered index (MyISAM)
The index and data are stored in two files (index file and data file). The leaf node on the primary key index stores the address of the data and obtains the corresponding data through this address.
The primary key index
Secondary index
Clustered Index (InnoDB)
Indexes and data are stored in a file, and the primary key index leaves store the complete data.
The primary key index
Secondary index
The secondary index leaf node only stores the value of the current index column and the value of the primary key. If the select column does not meet the requirements, the primary key index will be searched again (back table).
failure
-
Left-most prefix rule
-
Do not perform calculations or use functions on the index
-
Use overwrite indexes whenever possible
-
Do not use unequal (! =, <>), which causes the primary key index to use the range index, and the secondary index to be invalidated.
-
The is not NULL statement cannot be used for primary key indexes, which results in a full table scan
-
Index fields use like and do not start with wildcards
-
Index field string types are quoted
-
Do not use OR for index fields
Summary (assuming index(a, b, c))
Where clause Whether the index is in use where a = 3 Using the -a where a = 3 and b = 5 Use -a, b where a= 3 and b = 5 and c = 4 Use -a, b, c Where b = 3 or b = 3 and c = 4 or c = 4 Do not use where a = 3 and c = 5 Using the -a where a =3 and b > 4 and c = 5 Use -a, b where a =3 and b like ‘haha%’ and c = 4 Use -a, b, c where a =3 and b like ‘%haha’ and c = 4 Using the -a where a =3 and b like ‘%haha%’ and c = 4 Using the -a where a =3 and b like ‘h%aha%’ and c = 4 Use -a, b, c
The execution plan
Parameters that
# Explain has 10 columns of information, respectively:Id, select_Type, table, type, possible_keys, key, key_len, ref, rows, ExtraCopy the code
id
- Each SELECT statement is automatically assigned an identifier
- According to the operation sequence of the table, there are the following cases
- The ids are the same: the ids are executed from top to bottom
- Different ids: A larger ID indicates a higher priority
- A different simultaneous existence with the same ID
- Id null: indicates that this is a result set and does not need to be queried
select_type
The query type is used to distinguish common query, joint query, and sub-query.
-
simple
Represents a simple SELECT query that does not require a union or subquery.
-
primary
A select that requires either a union operation or a subquery, located in the outermost unit query.
-
subquery
All subqueries except those contained in the FROM clause.
-
dependent subquery
Similar to the dependent Union, this subQuery query is affected by external table queries
-
union
Select * from select_type; select * from select_type; select * from select_type
-
dependent union
As with union, it appears in a union or union all statement, but this query is affected by an external query
-
union result
The result set containing the union, in the union and Union all statements, because it does not participate in the query, the ID field is null
-
derived
Subqueries that appear in the FROM clause are also called derived tables, which in other databases may be called inline views or nested select
table
- Displays the query table name, or alias if used
- If no table operation is involved, NULL is displayed
- Angle brackets (<>) indicate a temporary table, followed by N, which is the ID of the planned execution
type
# get worse from left to rightSystem, const, eq_ref, ref, fulltext, ref_OR_NULL, unique_subquery, index_subquery, range, index_MERGE, index, ALLCopy the code
Indexes are available for ALl types except ALl. At a minimum, use range level indexes.
-
system
A table has only one row of data or an empty table.
-
const
When using a unique or primary key index that returns the equivalent WHERE condition that the record must be a row, type is usually const.
-
eq_ref
Keywords: connect field primary key or unique index. This type is commonly used in relational queries, where each row of the previous table matches the row of the later table.
-
ref
For non-unique indexes, use ** equivalent (=) ** to query for non-primary keys. Or a query using the left-most prefix rule index.
-
fulltext
Note that full text index is a high priority. If both full text index and normal index exist at the same time, mysql preferentially uses full text index regardless of the cost
-
ref_or_null
Similar to the REF method, but with the addition of null comparisons. Not much actually
-
unique_subquery
Used for in-form subqueries in WHERE that return unique values that do not duplicate values
-
index_subquery
For in subqueries that use a secondary index or a list of IN constants, subqueries that may return duplicate values can be de-duplicated using the index
-
range
Index range scanning, common in queries using the >,<,is NULL,between,in,like operators, etc
-
index_merge
Select * from ref_or_NULL; select * from ref_or_NULL; select * from ref_or_NULL
-
index
Keyword: Condition is the node that appears in the index tree. There may not be an exact index match. Index full table scan, which scans the index from beginning to end, is common for queries that do not need to read data files using index columns, or queries that can be sorted or grouped using indexes.
-
all
The full table scans the data files and then filters them at the Server layer to return the records that meet the requirements
possible_keys
Possible indexes for this query
key
This query is really the index used
key_len
- The length of the index used to process the query
- Single-column index: The entire index length
- Multi-column index: Count as many as you use (here you can analyze multi-column index usage based on this value)
- This value only counts index usage for where conditions, not group by and order BY.
ref
- Use constant equivalence queries, where const is displayed
- Join query, the execution plan of the driven table shows the associated fields of the driven table
- Expressions or functions, or implicit conversions, may appear as func
rows
Shows the estimated number of rows scanned in the execution plan, not exact (InnoDB is not exact, myISam is exact, because innoDB uses MVCC)
Extra
-
no tables used
Query without from or from dual; Join queries that use the not in() subquery or not exists operator are called anti-join queries, that is, a join query queries the inner table first and then the outer table, whereas an anti-join query queries the outer table first and then the inner table
-
using fifilesort
-
This occurs when the index is not available for sorting. This is common in order by and group by statements
-
Note MySQL uses an external index sort instead of reading data in index order.
-
A sort operation in MySQL that cannot be done using an index is called “file sort”.
-
-
using index
The query data can be obtained directly through the index without going back to the table.
- A SELECT query using a Covering Index avoids accessing rows of a table
- If Using Where is also present, the index is used to perform the lookup index key value
- If Using Where is not present at the same time, the index is used to read data rather than perform lookup actions
-
using temporary
Indicates that temporary tables are used to store intermediate results.
-
MySQL uses temporary tables for query results order by and group by
-
Temporary tables can be memory temporary tables or disk temporary tables, but cannot be identified in the execution plan. You need to view the status variables used_tmp_table and used_tmp_disk_table.
-
Distinct, using the distinct keyword (index field) in the SELECT section
-
-
using where
- Indicates that not all records returned by the storage engine meet the search criteria and need to be filtered at the server layer
- Query conditions are divided into restricted conditions and check conditions. Before 5.6, the storage engine can only scan data based on the restricted conditions and return data, and then the Server layer filters data based on the check conditions and returns data that truly matches the query. 5.6.x supports THE ICP feature, which can push the inspection conditions down to the storage engine layer. Data that does not meet the inspection conditions and restrictions will not be read directly, which greatly reduces the number of records scanned by the storage engine. The extra column shows the using index condition.
Usage scenarios
Need to index
- The primary key automatically creates a unique index.
- Fields that are frequently queried should be indexed.
- In a multi-table association, the associated fields should be indexed, and indexes should be created on both sides.
- Query sorted fields should be indexed.
- Overwrite indexes do not need to be returned to the table, as appropriate.
- For statistics or grouping fields, indexes should be created.
You don’t need indexes
- Tables have fewer records and do not need to create indexes (which are also expensive).
- Frequently updated fields do not need to create indexes (indexes need to be maintained).
- You do not need to create indexes to query infrequently used fields.