Index: A data structure used by a storage engine to quickly locate records, analogous to the table of contents of an 800-page book
5.1 Index Type
5.1.1 B-tree: the most common index. If the index type is not specified, the default index is B-tree. Most mysql engines support b-tree, except Archive.
The depth of the TREE is related to the amount of data in the table.Copy the code
Limitations: 1. An index cannot be applied if the search does not start in the leftmost column of the index. 3. If there is a range query on a column, all the columns to the right of the range query cannot be found using index optimization.Copy the code
5.1.2 Hash index: Implemented based on hash table, only queries that accurately locate all columns can locate data. A hash code is produced for each row of records. Hash index stores all calculated hash codes in the index, and at the same time stores Pointers to each record in the hash table (rainbow table). Only Memory engines support hash indexes. Classic “star “schema for data warehouses
1. Hash index stores hash codes and row Pointers, not field values. 2. Range lookup cannot be performed. Partial partial index matching is not supported. Hash conflict exists. If a hash conflict occurs, the storage engine iterates through all data queriesCopy the code
InnoDB has a special function: adaptive hash. When the engine notices that some index values are frequently referenced, it will create another hash index in memory based on the B-tree index to improve query efficiency. Users can choose to open or close the index.
When storing a large number of long data such as urls, you can consider using hash functions such as CRC32 (10 digits) to process and store urls, saving disk space, indirectly reducing the existence of disk pages, and reducing the index hierarchy. Be careful not to use a demanding hash function such as MD5, because the resulting hash code will be longer than intended.
When the amount of data is large, it is easy to have hash conflicts, which cannot be satisfied only according to hash functions, as shown below.
5.1.3 Spatial Data Index
MyISAM engine supports spatial data index, which can be used as geographic data storage. Different from the B-tree index, the prefix is not required for the query, and the query can be combined with any dimension. You must use Mysql’s GIS functions to maintain the data.
5.1.3 Full-text index
A full-text index is a special kind of index that looks up the keywords of the text, not the value of the comparison index.
5.2 Advantages of indexes
- 1. Greatly reduces the amount of data scanned by the server
- 2. Help the server avoid sorting and temporary tables
- 3. Change random I/OS to sequential I/OS
Note: Indexes are useful only when the data volume reaches a certain level. For small tables, a full table scan is a quick and easy approach
5.3 High-performance Index Policies
5.3.1 Separate columns
Independent columns mean that indexed columns cannot be part of an expression or arguments to a function
5.3.2 Prefix index and index selectivity
Sometimes you need to index very long character columns, which makes indexing very slow and can be solved in two ways
- The first is hash processing
- You can index the beginning part of the character
Disadvantages:
- Mysql cannot use prefix indexes to perform order by and group BY operations
- Can’t do overwrite scan
5.3.3 Multi-column indexes
If you see index merges using Explain, you need to check yourself
- When multiple columns are and and or, it takes a lot of CPU and memory resources to execute, and the optimizer discounting this time in the cost of the query, resulting in an underestimation of the execution plan even as fast as a full table scan
5.3.4 Selecting an appropriate index sequence
Place the most selective column first.
The data volume of A is less than that of a, and the identification is high.
5.3.5 Cluster Index
This is a type of index storage, not an index type, that holds the B-tree index and rows in the same structure.
- A table can have only one clustered index
- Storage engines are responsible for implementing indexes, so not all storage engines support clustered indexes
- This maximizes IO – intensive performance, but if the data is in memory, the order of access is irrelevant and clustered indexes have no advantage
- Insertion speed validation depends on insertion order
- Updating columns in a clustered index is costly because InnoDB is forced to move each updated row to a new location
- Tables based on clustered indexes may face page splitting “” when new rows are inserted, or when primary key updates need to move rows
- Clustered indexes can cause slow full table scans, especially when traffic is sparse, or data storage is discontinuous due to page splitting
- The secondary index is larger because the leaf node of the secondary index contains the primary key column of the reference row
- Secondary indexes require two index lookups (back to table)
When sequential primary keys can be problematic: In high concurrency, sequential primary key inserts in Innodb can cause significant contention, with the upper bound of the primary key becoming hot. And large insertion concurrent insertion will lead to a gap lock contention. Another hot competition is the AUTO_CREATEMENT lock mechanism.
5.3.4 Overwriting an index
Index design is usually based on the column after the WHERE condition. This is only part of index optimization, but when considering global index query, you need to consider whether the data can be queried directly through the index, so that there is no need to return to the table operation.
- Not all indexes can be covered indexes. An overwrite index must store the value of an index column. A hash index, a spatial index, and a full-text index do not store the value of an index column. Mysql can only use a B-tree index to overwrite an index
5.3.7 Using index scan for sorting
Mysql has two ways of sorting
1. Perform sorting operations
2. Scan by index sequence
Mysql can only use an index to sort columns in the same order as the order by clause, and all columns are sorted in the same direction.
5.3.8 Compressing Indexes (MyISAM)
MyISAM uses prefix compression to reduce index size so that indexes take up less disk space. By default, only strings can be compressed, or integers can be compressed through parameter Settings. Compressing the index size can make the footprint smaller, but can make some operations slower. Because the compressed prefix of each value depends on the preceding value, MyISAM lookups cannot use binary lookups in the index and can only be scanned from scratch. Positive order is still ok, reverse order is very bad.
5.3.9 Redundant and duplicate Indexes
Mysql allows multiple indexes to be created on the same column. Duplicate indexes need to be maintained separately, and the optimizer has to refer to each index individually when optimizing queries, which affects performance
5.4 Index case studies
- Supports multiple filtering criteria
- Avoid multiple range queries
- Optimization of sorting
5.5 Maintaining Indexes and Tables
1. Find and repair the damaged table
2. Update index statistics: mysql’s query optimizer uses two apis to learn about the distribution of index values in the storage engine to determine how to use indexes.
- Records_in_range () gets the approximate number of records in this range by passing two boundary values to the storage engine.
- Info (), which returns various types of data, including the cardinality of the index
3. Reduce index and data fragmentation: B-tree indexes may be fragmented, which slows down the query speed.
- Line of fragmentation
- Interrow fragmentation: Pages that are logically ordered, or rows that are not stored sequentially on disk, interrow fragmentation has a big impact on operations such as full table scans and clustered index scans,
- Free space fragmentation: The index data page has a large amount of free space, which causes InnoDB to avoid short row fragmentation. Innodb moves and shrinks short rows and rewrites them to a fragment