First, a classic service end test questions, see you can

Select * from table where a=1 group by b order by C; If each field has a single-column index, does the index take effect? If it’s a composite index, can you tell us a few things? “

This article is a knowledge review of MySQL index, including some concepts of index, the structure of B tree, and the principle of index and some knowledge of index strategy, wish good

Review of index basis

What is the index

  • MYSQL officially defines an Index as: an Index is a data structure that helps MYSQL obtain data efficiently. Therefore, the essence of an Index is a data structure

  • The purpose of index is to improve the efficiency of query, can be analogous to dictionary, train station table, book catalog, etc.

  • In addition to the data itself, the database also maintains a data structure that satisfies a specific lookup algorithm. These data structures refer to (point to) the data in a way that enables the implementation of advanced lookup algorithms on top of these data structures. This data structure is called an index.

    In fact, there are many common index models, hash table, ordered array, all kinds of search trees can achieve index structure

    Here is an example of a possible indexing approach (binary search tree)

    On the left side of the image above is a simple student score table with only two columns: STUDENT ID and score (the far left is the physical address of the data).

    Such as we want to quickly check result of the student, by building a to the right of the binary search tree when indexing, index node is the result data, node pointer to the corresponding data record physical address, so that you can use binary search within a certain complexity to obtain to the corresponding data, thus the rapid information retrieval the eligible students.

  • Indexes themselves are also very large and cannot be stored in memory. They are usually stored on disk in the form of index files

advantage

  • Indexes greatly reduce the amount of data that the server needs to scan (improve data retrieval efficiency)
  • Indexes help servers avoid sorting and temporary tables (reducing the cost of sorting data and reducing CPU consumption)
  • Indexes convert random I/O to sequential I/O (reduces database IO costs)

disadvantage

  • An index is also a table that holds the primary key and index fields and points to records in the entity table, so it is also memory intensive
  • While indexes greatly speed up queries, they slow down the speed of updating tables, such as INSERTS, UPDATES, and DELETES. MySQL will update the index file every time it updates a column that has been added to the index

The index classification

Below is a table the club’s official website: dev.mysql.com/doc/refman/…

Let’s look at the classification of indexes from three perspectives

Logically

  • Primary key index: A primary key index is a special unique index that does not allow empty values
  • Plain index or single-column index: Each index contains a single column. A table can have multiple single-column indexes
  • Multi-column index (compound index, combined index) : A compound index is an index created on multiple fields. The index is used only when the first field is used in the query condition.
  • Unique index or non-unique index
  • Full-text full-text index: It looks up keywords in Text rather than directly comparing values in the index
  • Spatial index: A spatial index is an index of a field of a spatial data type

Data structure perspective

  • Hash index: Converts database field data into a fixed-length Hash value using the Hash algorithm and stores the row pointer of the data in the corresponding position of the Hash table. If a Hash collision occurs, it is stored as a linked list under the corresponding Hash key. The Memory engine supports non-unique Hash indexes. If a Hash collision occurs, multiple records will be stored in a linked list in the same Hash entry. Few databases use Hash indexing. Currently, Memory engines and NDB engines support Hash indexing.

    The disadvantages are that only equivalence comparison queries, such as =, in(), do not support range lookup, such as where ID > 10, can not be sorted.

  • B+ tree index (more on this later)

From a physical storage perspective

  • Clustered Index

  • Non-clustered index (also called secondary index)

    Both clustered and non-clustered indexes are B+ tree structures

MySQL > select * from ‘MySQL’

Indexes can have many structural types, which can provide better performance for different scenarios.

The first thing to understand is that indexes are implemented at the storage Engine level, not the server level. Not all storage engines support all index types. Even if multiple storage engines support an index type, their implementation and behavior may differ.

MySQL does not use Hash to index MySQL.

I’m going to say, sorry, MySQL also uses Hash indexes, and the Memory storage engine supports Hash indexes. The Hash structure is more suitable for scenarios where there are only equivalent queries

Why not use a binary search tree? This is very simple, there are only two numbers on the cross of a binary tree, if there is too much data, how many layers.

Disk I/o

Before introducing index structures, let’s take a look at disk IO and prefetch

The disk reads data by mechanical movement, and the time spent on each read can be divided into three parts: seek time, rotation delay and transmission time

  • The seek time refers to the time required for a magnetic arm to move to a specified track. For mainstream disks, it is usually less than 5ms.
  • The rotation delay is what we often hear about the rotation speed of a disk. For example, if a disk is 7200 revolutions, it can rotate 7200 times per minute. That is to say, it can rotate 120 times per second1/120/2 = 4.17 ms;
  • Transfer time refers to the time it takes to read or write data from or to the disk, usually in the order of a few tenths of a millisecond and negligible compared to the first two.

So the time to access a disk, namely the time to access a disk IO, is about 5+4.17 = 9ms, which doesn’t sound bad, but considering that a 500-mips machine can execute 500 million instructions per second, because instructions rely on electrical properties, In other words, one IO can execute 400,000 instructions, and the database is used for hundreds of millions or even tens of millions of levels of data. The time of 9 milliseconds each time is obviously a disaster. The following is a comparison of computer hardware delay for your reference:

Considering that disk I/O is a very expensive operation, computer operating systems have been optimized to read not only the current disk address data, but also adjacent data into the memory buffer during an I/O, because the principle of local prefetch tells us that when a computer accesses the data of an address, Adjacent data will also be accessed quickly. Each IO read is called a page. The specific data size of a page depends on the operating system, and it is usually 4K or 8K. In other words, when we read the data in a page, we actually only have one IO. This theory is very helpful for the data structure design of the index.

There shouldn’t be a data structure that can keep disk I/OS down to a very small order of magnitude each time the data is looked up, and the B+ tree comes into being.

B tree in my heart

MySQL InnoDB index uses B+ tree instead of B tree

B-Tree == B-Tree = B-Tree = B-Tree = B-Tree = B-Tree

Take a quick look at the Wikipedia overview:

In a B-tree, internal (non-leaf) nodes can have a variable number of children (the number range is predefined). When data is inserted or removed from a node, the number of its children changes. Internal nodes may be merged or separated in order to stay within a predetermined number. Because the number of child nodes has a certain allowable range, b-trees do not need to be rebalanced as often as other self-balanced lookup trees, but some space may be wasted because the nodes are not fully filled. The upper and lower bounds on the number of child nodes are set for a particular implementation. For example, in a 2-3 B tree (often referred to as a 2-3 tree), each internal node can have only 2 or 3 child nodes.

Each internal node in a B tree contains a number of keys that divide the subtree of the node. For example, if an internal node has three children (subtrees), then it must have two keys: A1 and A2. All the values of the left subtree must be less than A1, all the values of the middle subtree must be between A1 and A2, and all the values of the right subtree must be greater than A2.

In cases where accessing node data takes much longer than processing it, b-trees have many advantages in alternative implementations because the overhead of accessing nodes is spread over multiple operations on inner nodes. This usually occurs when the node is stored in secondary storage such as hard disk storage. By maximizing the number of children of inner nodes, the height of the tree is reduced and the overhead of accessing nodes is reduced. Also, there are fewer attempts to rebalance the tree. The maximum number of child nodes depends on the amount of information each child node must store, and the size of a full disk block or similar capacity in secondary storage. Although 2-3 trees are easier to interpret, in practice, B-trees use secondary memory and require a large number of child nodes to improve efficiency.

B+ tree is a variant of B tree. In B+ tree structure, all data are stored on leaf nodes, and leaf nodes are connected together by Pointers to form a data linked list to speed up the retrieval efficiency of adjacent data.

Recommend a data structure visualization site: www.cs.usfca.edu/~galles/vis…

,13,15,16,20,23,25,30,23,27 will [11] with B and B + tree storage, look at the structure

B trees are different from B+ trees

B-tree and B+Tree are both balanced search trees designed for external storage devices such as disks.

keywords b-tree B + tree note
Maximum branch, minimum branch Each node has at most m branches (subtrees), at least ⌈m/2⌉ (intermediate nodes) or 2 branches (root nodes but not leaf nodes). With the left Order m corresponds to the largest branch
The relationship between n keywords and branches The branch is equal to n plus 1 Branches are equal to n There is no
Number of keywords (B+ tree keywords more) Greater than or equal to ⌈m/2⌉-1 Is less than or equal to m-1 Greater than or equal to ⌈m/2⌉ less than or equal to m B+ tree keyword number more, + reflected in the place.
Leaf nodes are similar Elements in each node are not equal and are arranged in ascending order. All the leaf nodes are in the same layer. With the left There is no
The leaves are different Contains no information Leaf nodes contain information and Pointers to records. There is no
Relationships between leaf nodes There is no There is a pointer on the B+ tree to the leaf node with the smallest keyword, and all the leaf nodes are linked into a linear linked list There is no
Non-leaf node A key corresponds to the storage address of a record Serves only as an index There is no
Storage structure The same With the left There is no

Why do WE use B plus trees

With disk IO and B-tree concepts in mind, it was a natural progression. The less DISK I/O times, the higher the query efficiency. And the number of I/OS depends on the height of the B+ tree

Let’s use the InnoDB storage engine to illustrate.

When the system reads data from disks to memory, the basic unit is disk blocks. The data in the same disk block is read at a time instead of what is needed.

InnoDB storage engine has the concept of pages, the smallest unit of disk management. The default page size in the InnoDB storage engine is 16KB. You can use the innodb_page_size parameter to set the page size to 4K, 8K, or 16K. In MySQL, you can run the following command to check the page size: show variables like ‘innodb_page_size’;

The storage space of a system disk block is usually not that large, so InnoDB uses several contiguous disk blocks to achieve a page size of 16KB each time it requests disk space. InnoDB reads data from disk to disk on a page basis. If each piece of data on a page helps locate data records, this will reduce disk I/O times and improve query efficiency.

For example

Indexes are used to quickly query data, and MySQL rows can be large

Take scope search as an example to take a simple look, B Tree structure query [10-25] data (starting from the root node, random search for the same reason, but I draw only 2 layers of the graph, persuasive is not so obvious)

  1. Load root node, first node element 15, greater than 10
  2. Find 11,13 by loading the left child address of the root node.
  3. Reload the root node, find the data of the middle node 16,20
  4. Load the root node again, 23 is less than 25, load the right child node again, find 25, end

B+ tree range lookup is easy, the data is all under the bottom leaf node, and it is linked, I just need to find the first one and walk through it (regardless of page splitting and other issues).

answer

MySQL > select * from B+ tree;

B+Tree is an optimization based on B-Tree, making it more suitable for implementing external storage index structure.

Each node in a B+ tree stores data, whereas only leaf nodes in a B+ tree store data. Therefore, the height of a B+ tree is higher and I/OS are more frequent when the same amount of data is searched. Database indexes are stored on disk. When there is a large amount of data, the entire index cannot be loaded into memory. Instead, each disk page (corresponding node of the index tree) can be loaded one by one. The B+ tree is further optimized in MySQL: the leaf node is a bidirectional linked list, and the head node and tail node in the linked list are also pointed to circularly.

Each node in the B-tree diagram must contain not only the key value but also the data value. The storage space of each page is limited. If the data on each node (that is, a page) is large, the number of keys that can be stored is small. If the data on each page is large, the b-tree depth is large, which increases the disk I/O times and affects the query efficiency. In a B+Tree, all data record nodes are stored on the leaf nodes at the same layer according to the order of key values, instead of the non-leaf nodes only storing key values. This greatly increases the number of key values stored on each node and reduces the height of the B+Tree.

The number of I/OS depends on the height of B+ h. Assume that the data in the current data table is N and the number of data items in each disk block is M, then h=㏒(m+1)N. When the amount of data N is constant, the larger M is, the smaller H is. The size of a disk block is the size of a data page. The size of a disk block is fixed. If the space occupied by data items is smaller, the number of data items is larger and the height of the tree is lower. That’s why each data item, or index field, should be as small as possible; for example, int is 4 bytes, less than half of bigInt’s 8 bytes. This is why B+ trees require real data to be placed in leaf nodes rather than inner nodes, where the number of data items in a disk block drops dramatically, causing the tree to grow taller. When the data item is equal to 1 it will degenerate into a linear table.

MyISAM and InnoDB index principle

MyISAM primary key index and secondary index structure

MyISAM engine index files and data files are separate. The data fields of the leaf nodes of the MyISAM engine index structure do not store the actual data records, but the addresses of the data records. Index files are separated from data files, and such indexes are called “non-clustered indexes.” The PRIMARY index of MyISAM is not very different from the secondary index. The PRIMARY key index is a unique non-empty index named PRIMARY.

The term “clustering” refers to the compact storage of rows of data with adjacent key values

In MyISAM, indexes (including leaf nodes) are stored in a separate.myi file, and leaf nodes store the physical address offsets of the data (access by offsets is random and fast).

A primary index is a primary key index. The key value cannot be repeated. Secondary indexes are normal indexes that may have duplicate key values.

The process of finding data by index: first find the index node from the index file, get the file pointer of the data, and then locate the specific data by the file pointer in the data file.

Secondary indexes are similar.

InnoDB primary key index and secondary index structure

The data field of the leaf node of the Index structure of the InnoDB engine stores the actual data records (for the primary index, this is where all data records in the table are stored; In other words, InnoDB’s data files themselves are primary key index files. Such indexes are called “clustered indexes”. Each table can have only one clustered index.

Primary key index:

As we know, InnoDB index is a clustered index, and its index and data are stored in the same.idb file. Therefore, its index structure stores both index and data in the same tree node. As shown in the figure below, the lowest leaf node has three rows of data corresponding to id, name and score data items in the data table.

Innodb indexes are divided into leaf nodes and non-leaf nodes. The non-leaf nodes are stored separately in the index section like the contents of xinhua Dictionary, while the leaf nodes are arranged sequentially in the data section.

InnoDB data files can be shred by table (just turn on innodb_file_per_table), shred and stored in XXX.ibd, not in xxx.ibData.

As of MySQL 5.6.6, the default value is ON.

Extension point: It is recommended to set this value to ON. Storing a form as a single file is easier to manage, and when you don’t need the table, the system drops the file with the drop table command. In a shared tablespace, the space is not reclaimed even if the table is dropped.

When you delete half of the data from the largest table, the size of the table file remains the same

Secondary (non-primary key) indexes:

In this example, we create a secondary index for the name column in the secondary table. Its index structure is very different from that of the primary key index. The lowest leaf node has two rows of data, the string in the first row is the secondary index and is sorted by ASCII code.

This means that a conditional search for the name column requires two steps:

  1. Name is retrieved on the secondary index and the corresponding primary key is obtained by reaching the leaf node.
  2. Use the primary key to perform the corresponding retrieval operation on the primary index

This is called a “back table query”

InnoDB index structure points to note

  1. Data files are themselves index files
  2. The table data file itself is an index structure file organized by B+Tree
  3. Clustered index middle nodes contain complete data records
  4. InnoDB tables must have primary keys, and integer increment primary keys are recommended

InnoDB storage structure as we described above, index and data are stored together, whether primary key index or secondary index, when the search is to find the index node to obtain the corresponding data, if we do not explicitly specify the index column in the design of table structure. MySQL selects columns from the table that do not duplicate data to create an index. If there is no matching column, MySQL automatically generates an implied field for the InnoDB table as the primary key. This field is 6 bytes long and of type integer.

Index policy

In which case you need to create an index

  1. The primary key automatically creates a unique index

  2. Fields frequently used as query criteria

  3. The foreign key relationship is used to index the fields associated with other tables in the query

  4. Single key/composite index selection problem, who? High concurrency tends to create composite indexes

  5. A sorted field in a query that is significantly faster through index access

  6. Statistics or grouping fields in the query

When not to create an index

  1. Too few table records
  2. A watch that is often added, deleted, or modified
  3. Table fields with repetitive, evenly distributed data should only be indexed for the columns that are most frequently queried and sorted (indexing doesn’t make much sense if a data class contains too many duplicates).
  4. Frequently updated fields are not suitable for index creation (add IO burden)
  5. Indexes are not created for fields that are not needed in the WHERE condition

Efficient index

From High Performance MySQL

Separate columns

MySQL does not use indexes if the columns in the query are not independent. “Independent column” means that the index cannot be part of an expression and cannot be a parameter to a function.

Such as:

EXPLAIN SELECT * FROM mydb.sys_user where user_id = 2;
Copy the code

Select * from sys_user where user_id is the primary key and user_id is the primary key index.

This query uses the PRIMARY KEY to optimize the query.

EXPLAIN SELECT * FROM mydb.sys_user where user_id + 1 = 2;
Copy the code

The result:

The prefix index

The prefix index is actually the first few characters of the text (the specific characters are specified in the index creation) to build the index, so that the established index takes up less space, so the query is faster.

ALTER TABLE table_name ADD KEY(column_name(prefix_length));
ALTER TABLE table_name ADD index index_name(column_name(prefix_length));
Copy the code

For long columns such as blob, TEXT, or long VARCHar columns, prefix indexes must be used. MySQL does not allow indexes to the full length of these columns.

So the problem is choosing a prefix of the right length, prefix_length. Prefix is too short, selectivity is too low, prefix is too long, index takes too much space.

In the example above, two different indexes also execute the following statement

select id.name,email from user where emai='[email protected]' 
Copy the code

The common index IDx_email finds the records that meet the conditions, and then returns to the primary key index to retrieve data. The prefix index finds zhangs for several times, and then returns to the primary key index to retrieve data for comparison, and scans data rows for several times.

If the prefix index idx_pre_email(7) is built from the first 7 bytes, only one line is scanned.

So using a prefix index with a defined length can save space without adding too much extra query cost.

To determine the appropriate length for prefixes, find a list of the most common values and compare them to the most common prefix columns.

Prefix indexes are an effective way to make indexes smaller and faster, but there are drawbacks: MySQL cannot use prefix indexes for ORDER BY and GROUP BY, and cannot use prefix indexes for “overwriting”.

A common scenario is to use prefix indexes for very long hexadecimal unique ids.

How to index such data as id number?

  • Use reverse order storage: If you store your ID number backwards, you can write it this way every time you query it

    select field_list from t where id_card = reverse('input_id_card_string');	
    Copy the code
  • ** Uses the hash field. ** You can create an integer field on the table to hold the id check code and create an index on this field.

    alter table t add id_card_crc int unsigned.add index(id_card_crc);
    - the query
    select field_list from t where id_card_crc=crc32('input_id_card_string') and id_card='input_id_card_string'
    Copy the code

Cover index

Covering Index (Covering Index, or Index Covering, which is usually referred to as ** does not need to return to the table operation **

  • MySQL can use the index to return columns from the select list, instead of reading the data file again based on the index. In other words, the query column is overwritten by the created index.

  • An index is an efficient way to find rows, but a typical database can also use an index to find data for a column, so it doesn’t have to read the entire row. After all, index leaf nodes store the data they index, and when you can read the index to get the data you want, you don’t need to read rows. An index that contains (overwrites) data that meets the query result is called an overwritten index.

  • The judgment standard

    Using Explain, which can be determined by the output extra column. For an index overwrite query, which is displayed as using index, the MySQL query optimizer determines whether there is an index overwrite query before executing the query

Multi-column index (composite index, union index)

Concatenated index: an index consisting of multiple columns, for example, create index idx_EMP on EMP (COL1, COL2, COL3…). , we call idx_EMP index a composite index.

Creating a single column index on multiple columns does not improve MySQL query performance in most cases. Both single-column indexes are bad choices for the following query WHERE condition:

SELECT user_id,user_name FROM mydb.sys_user where user_id = 1 or user_name = 'zhang3';
Copy the code

Before MySQL 5.0, MySQL used full table scan for this query, unless rewritten as two query UNION.

MySQL 5.0 and later introduced a strategy called “index merge” where queries can scan with both single-column indexes and merge results. This algorithm has three variants: union of OR conditions, intersection of AND conditions, AND intersection of the first two cases. The index merge strategy is sometimes the result of optimization, but more often it is a sign of poorly built indexes on a table:

  1. When a server intersects multiple indexes (multiple AND conditions), it usually means that you need a multi-column index with all related columns, rather than multiple independent single-column indexes.

  2. When a server joins multiple indexes (multiple OR conditions), it usually consumes a lot of CPU and memory resources for algorithm caching, sorting, and merging operations. This is especially true when some of these indexes are less selective and need to merge large amounts of data returned from the scan.

  3. If you see index merges in Explain, you should take a good look at the query and table structures to see if they are already optimal.

Left-most prefix rule

There is an important concept in the composite index: the leading column, in the example above, col1 is the leading column. We can use “where col1 =? , you can also use “where col1 =? And col2 =?” , such constraints use indexes, but “where col2 =?” The query will not use the index. So the combined index is used by the constraint only if it contains a lead column.

Here’s an example:

When the data items of the B+ tree are composite data structures, such as (name,age,sex), the B+ tree will build the search tree in the order from left to right. For example, when the data is retrieved, such as (zhang,20,F), the B+ tree will preferentially compare the name to determine the direction of the next search. If the name is the same, then age and sex are compared in turn to obtain the retrieved data. However, when data such as (20,F) without name comes, B+ tree does not know which node to search next, because name is the first comparison factor when the search tree is established, and search must be conducted according to name first to know where to search next. For example, when (three,F) data to search, B+ tree can use name to specify the search direction, but the next field age is missing, so we can only find the data whose name is equal to three, and then match the data whose gender is F, this is a very important property, that is, the left-most matching feature of the index.

As you can see, the index entries are sorted by the fields that appear in the index definition.

When your logical requirement is to find all the people whose name is “Bob”, you can quickly locate to ID = 2 and iterate backwards to get all the results you need.

If you want to look up all people whose names start with “B”, your SQL statement will say “where name like ‘B %’ “. In this case, you can also use the index to find that the first record that meets the condition is ID=2, and then iterate back until the condition is not met.

As you can see, indexes can be used to speed up retrieval as long as the left-most prefix is satisfied, not just the entire definition of the index. The leftmost prefix can be the leftmost N field of a union index or the leftmost M characters of a string index.

Then there is the problem of how to arrange the field order in the index when creating a federated index.

The evaluation criteria here is the reuse capability of the index. Since the left-most prefix can be supported, there is generally no need to create a separate index on a when there is already a joint index (a,b). Therefore, the first rule of thumb is that if you can maintain one less index by adjusting the order, that order is often the preferred one.

An index pushdown

Last time we talked about satisfying the left-most prefix rule, the left-most prefix can be used to locate records in an index. Now, you might ask, what about the parts that don’t fit the leftmost prefix?

Let’s take the federated index (name,age,sex) as an example. If you now have a demand: retrieve all the boys in the table whose names start with B and whose age is 19. So, the SQL statement looks like this:

mysql> select * from tuser where name like 'B %' and age=19 and sex=F;
Copy the code

You already know the prefix index rule, so this statement searches the index tree using only “B” to find the first record whose ID = 2. Of course, this is not bad; it’s better than a full table scan. (The composite index satisfies the leftmost match, but the match stops when it encounters an unequal judgment.)

And then what?

Of course, whether the other conditions are met.

Before MySQL 5.6, you could only return a table from ID = 2. Find the rows on the primary key index and compare the field values.

The index condition pushdown feature introduced in MySQL 5.6 can be used to determine the fields contained in the index during index traversal and filter out the records that do not meet the conditions to reduce the number of table returns.

The optimization of index push-down on non-primary key indexes can effectively reduce The Times of back to the table and greatly improve the efficiency of query

Use index scans for sorting

MySQL has two ways to generate ordered results: by sorting or scanning by index. If the type column in explain is index, then MySQL is Using index scanning to sort results. That is using the overwrite index query).

Scanning the index itself is fast because you only need to move from one index record to the next, but if the index does not cover all the columns required by the query, you have to query the entire row back to the table for each index record scanned, which is basically random I/O. Therefore, reading data in indexed order is generally slower than sequential full table scans, especially in I/O intensive workloads.

MySQL can use the same index for both sorting and row lookup, so if possible, it is best to design the index for both tasks as much as possible.

MySQL > select * from table where order by is the same as order by; select * from table where order by is the same as order by; The order by clause has the same restriction as the lookup query. The order BY clause must meet the requirement of the leftmost prefix of the index. Otherwise, MySQL will need to perform the sort operation and cannot use the index sort.

Compress (prefix compression) indexes

MyISAM uses prefix compression to reduce the size of indexes so that more indexes can fit into memory, which can greatly improve performance in some cases.

By default, only strings can be compressed, but integers can also be compressed through parameter Settings.

MyISAM compresses each index block by completely saving the first value in the block, and then comparing the other values to the first value to get the number of bytes with the same prefix and the remaining different suffixes, which can be stored.

For example, if the first value in the index block is “Perform” and the second value is “performance”, the prefix of the second value is compressed to store something like “7,ance”. MyISAM uses similar prefix compression for line Pointers.

Compressed blocks use less space at the cost of potentially slower operations. Because the compressed prefix of each value depends on the preceding value, MyISAM lookup cannot use binary lookup in the index block and can only scan from scratch. Positive scanning is good, but reverse scanning — ORDER BY DESC, for example — is not so good. All operations to find a row in a block require scanning half an index block on average.

Tests show that for CPU-intensive applications, compressing indexes makes MyISAM several times slower at index lookups because scans require random lookups. The reverse scan of compressed indexes is even slower. Compressing indexes requires a tradeoff between CPU memory resources and disk. Compressing an index may require as little as a tenth of the disk space, and in I/O intensive applications, the benefits for some queries can outweigh the costs.

You can specify the PACK_KEYS parameter in the CREATE TABLE statement to control how index compression is performed.

Duplicate and redundant indexes

MySQL allows multiple indexes to be created on the same column, intentionally or unintentionally. The purpose is not clear ~

Duplicate indexes are indexes of the same type that are created in the same order on the same columns. You should avoid creating duplicate indexes in this way and remove them as soon as they are discovered.

There are some differences between redundant and duplicate indexes. If an index (A,B) is created, creating index (A) is redundant because it is just A prefixed index of the previous index. So indexes (A,B) can also be used as indexes (A) (this redundancy is only for b-tree indexes). But if you create index (B,A) again, it is not redundant, and neither is index (B), because B is not the left-most prefix of index (A,B). In addition, other different types of indexes (such as hash indexes or full-text indexes) are not redundant indexes of the B-tree index, regardless of the overridden index column.

Unused index

In addition to redundant and duplicate indexes, there may be some indexes that the server will never use. Such indexes are completely redundant and should be considered for deletion. There are two tools to help locate unused indexes:

  1. In Percona Server or Mariadb, first turn ON the userstat=ON server variable, which is turned off by default, and then let the server run for a while. The frequency of each index can be found by querying information_schema.index_statistics.

  2. Using the Pt-index-Usage tool in the Percona Toolkit, the tool reads query logs, explains each query in the log, and prints a report on indexes and queries. This tool not only finds out which indexes are unused, You can also see the execution plan for the query.

4. Index optimization

Causes of slow SQL execution

  1. Hardware problems. For example, the network speed is slow, the memory is insufficient, the I/O throughput is small, and the disk space is full

  2. No index or index failure

  3. Too much data (separate databases and tables)

  4. Server tuning and parameter setting (adjust my.cnf)

The index optimization

  1. Full value matches my favorite
  2. (a,b) (a, C) (a,b,c) (a,b, C)
  3. Not doing anything on the index column (calculation, function, (automatic or manual) type conversion) will cause the index to fail and move to a full table scan
  4. The storage engine cannot use the column to the right of the range condition in the index
  5. Minimize select * by using overridden indexes (queries that access only the index (the index column is the same as the query column)
  6. Is null,is not NULL also cannot use the index
  7. like "xxxx%"You can use indexes,like "%xxxx"Like “% XXX %” Like begins with a wildcard (‘% ABC… Index failure becomes a full table scan operation,
  8. The index of a string without single quotation marks is invalid
  9. Use the index merge technique sparer. index merge is used to optimize index merge.
  10. <, <=, =, >, >=, BETWEEN, IN available to index, <>, not IN,! = does not work, which results in full table scan

Principles of index building

  1. A = 1 and b = 2 and C > 3 and d = 4; a = 2 and C > 3 and d = 4; D (a,b,d,c); d (a, B,d,c); d (a, B,d);

  2. = and in can be out of order, such as a = 1 and b = 2 and c = 3. Create (a,b,c) indexes in any order. MySQL’s query optimizer will help you optimize them into a form that can be recognized by the indexes.

  3. Try to select columns with high distinction as indexes. The formula of distinction is count(DISTINCT Col)/count(*), which indicates the proportion of fields that are not duplicated. The larger the ratio is, the fewer records we scan. So one might ask, is there any empirical value to this ratio? This value is also difficult to determine in different application scenarios. Generally, we require join fields to be above 0.1, that is, 10 records are scanned per item on average.

  4. From_unixtime (create_time) = ‘2014-05-29’; from_unixtime(create_time) = ‘2014-05-29’; from_unixtime(create_time) = ‘2014-05-29’; Obviously the cost is too great. The statement should be create_time = unix_TIMESTAMP (‘ 2014-05-29 ‘).

  5. Expand indexes as much as possible, do not create new ones. For example, if you want to add (a,b) to a table that already has an index of A, you only need to modify the original index.

I have a public account “JavaKeeper”

I also have a GitBook github.com/JavaKeeper

Reference

  • Zh.wikipedia.org/wiki/B%E6%A…
  • medium.com/@mena.meseh…
  • www.javatpoint.com/b-tree
  • Blog.csdn.net/Abysscarry/…
  • MySQL tutorial 45
  • High Performance MySQL