The introduction of

Avoid using select * to find fields. Instead, write specific fields after select. So the reason for doing this is to reduce the amount of data being transferred.

There is a more important reason: with SELECT *, it is virtually impossible to use an overwrite index.

Such a query that should use an overwritten index becomes one that cannot use an overwritten index, resulting in random I/O and table-back queries.


The index principle

B-tree and B+Tree exploration tour


The index type

Indexes are implemented at the storage Engine level, not at the Server level. Not all storage engines support all index types. Even if multiple storage engines support an index type, their implementation and behavior may differ.

B-tree indexes

The most common type of index is implemented based on b-tree (or variants of it) data structures. The basic idea of a B-tree is that all values (indexed columns) are sorted, and each leaf node is equidistant from the root node. Therefore, B-tree is suitable for searching a certain range of data, and can directly support data sorting (ORDER BY). But when indexing multiple columns, the order of the columns is particularly important and needs are particularly important. InnoDB and MyISAM both support B-tree indexes. InnoDB uses a variant B+Tree, while MyISAM compresses indexes to save space, sacrificing performance.

A Hash index

Implemented based on Hash table. So this index only supports exact lookups, not range lookups and no sorting. This means that either range lookup or ORDER BY relies on additional work on the Server layer. Currently, only the Memory engine supports explicit Hash indexes (but its hashes are nonunique, and too many collisions can affect lookup performance). Although it also supports B-tree indexes, the Memory engine’s default index type is Hash.

Custom hash index

Create a pseudo hash index based on the B-tree. This is not the same thing as a real hash index, because the lookup is still done in a B-tree, but it uses the hash value instead of the key itself for the index lookup. All you need to do is manually specify the use of hash functions in the WHERE clause when querying.

Full-text index

It is used primarily to find keywords in the text, not to compare directly with values in the index. A full-text index is very different from other indexes in that it is more like a search engine than a simple WHERE statement parameter matching. The full-text index and b-tree index of a column are mutually exclusive. Full-text indexing is used with the MATCH AGAINST operation, rather than the usual WHERE statement with LIKE.

Spatial (R-tree) index

Only the MyISAM engine supports it, and it doesn’t support it well. You can ignore it.

Other Index categories

There are also many tripartite storage engines that use different types of data structures to store indexes.


The index strategy

Proper creation and use of indexes are fundamental to high-performance queries.

Separate columns

Independent columns mean that indexed columns cannot be part of an expression or arguments to a function.

The prefix index

Sometimes you need to index a long character column, which can make indexes slow and large. One strategy is the hash index mentioned earlier, and the other is to index only the first part of the character, which can greatly save the index space, thus improving the index efficiency. But this also reduces index selectivity. Prefix indexes are an effective way to make indexes smaller and faster. But there are drawbacks: MySQL cannot use prefix indexes to do ORDER BY and GROUP BY, nor can it use prefix indexes to do override scans.

Create prefix index:

ALTER TABLE table_name ADD KEY(name(5));
Copy the code

The column index more

The most common mistakes with multi-column indexes are creating separate indexes for each column, or creating multi-column indexes in the wrong order. MySQL introduced the index merge strategy after version 5.0, which allows you to use multiple single-column indexes on a table to locate a specified row to a certain extent. This strategy includes: union of OR conditions, intersection of AND conditions, AND union AND intersection of the first two cases of combination.

Appropriate index column order

The correct order depends on the queries that use the index, and you also need to consider how best to meet the sorting and grouping requirements. The way to choose the index column order is to place the most selective column at the top of the index. Meanwhile, performance depends not only on the selectivity of all indexes (overall cardinality), but also on the specific values of query conditions, i.e. the distribution range of values.

While they are important in terms of index selectivity and cardinality, there are other factors in the WHERE clause such as sorting, grouping, and range conditions that can have a significant impact on query performance.

The index

Clustered index is not a separate index type, but a way of data storage. In InnoDB, data is aggregated by primary keys. If no primary key is defined in the table, InnoDB selects a unique non-empty index as the primary key. If there is no such index, InnoDB implicitly defines a primary key as the cluster index. Also, InnoDB only aggregates records on the same page, and pages with adjacent keys may be far apart.

The advantages of clustered indexes are as follows:

  • You can keep related data together.
  • Faster data access.
  • Queries that use an overridden index scan can use the primary key value of a leaf node directly.

Disadvantages of clustered indexes:

  • Clustered indexes greatly improve the performance of I/O intensive applications, but with all the data in memory, clustered indexes have no advantage.
  • Insertion speed depends heavily on insertion order. This is why InnoDB typically sets an incremented INT column as the primary key.
  • Updating the clustered index is expensive because InnoDB is forced to move every updated row to a new location.
  • If new data is inserted out of order, “page splitting” may occur.
  • Clustered indexes can cause slow full table scans, especially if rows are sparse or the data store is discontinuous due to page splitting.
  • Secondary indexes (non-clustered indexes) can be larger than you think because the leaf nodes of secondary indexes contain primary key columns that reference rows.
  • Secondary index access requires two index lookups instead of one.

Suggest to have a look at the original book P164~P170!

Cover index

An overwrite index is an index that contains or overwrites all the field values to be queried. So an index may be an override index for some queries and not for others. An overwrite index is essentially a secondary index that meets certain conditions.

Overwriting indexes is a very useful tool that can greatly improve performance. Secondary back table queries with non-primary key indexes are greatly avoided. The benefits are as follows:

  • Index rows are usually much smaller than data rows. If you just need an index, you can dramatically reduce the number of data visits.
  • Because indexes are stored sequentially, indexes have much less I/O for I/ O-intensive range queries than random reads of each row from disk.
  • Overwrite indexes are especially useful for InnoDB because of its clustered indexes.

Index scan to sort

MySQL can generate ordered results in two ways: by sort operation, by index order scan.

Queries that cannot use index sort:

  • Two different sorting directions.
  • Referenced a column that is not in the index.
  • Sorted columns cannot form the left-most prefix of an index (do not match the left-most matching rule).
  • Use range queries.
  • Use columns in table association.

Compress (prefix compression) indexes

MyISAM uses prefix compression to reduce the size of indexes so that more indexes can be put into memory, which can greatly improve performance in some cases. MyISAM compresses each index block by completely saving the first value in the index block, and then comparing the other values to the first value to get the same number of bytes and the rest of the different suffixes, which can be stored. MyISAM uses similar prefix compression for line Pointers.

Compressed blocks use less space, at the cost of potentially slower operations. Because the compressed prefix of each value depends on the preceding value, MyISAM lookup cannot use binary lookup in the index block and can only scan from scratch.

Redundant and duplicate indexes

Duplicate indexes are indexes of the same type that are created in the same order on the same columns. (Different types do not count as duplicate indexes)

Redundant indexes are not needed in most cases, and you should try to extend existing indexes rather than create new ones. However, redundant indexes are sometimes required for performance reasons, because extending an existing index can cause it to become so large that it can affect the performance of other queries that use the index.

Unused index

In addition to redundant indexes and duplicate indexes, there may be some indexes that will not be used. Such indexes are completely redundant and recommended to be deleted.

Index and lock

Indexes allow queries to lock fewer rows. Although InnoDB’s row locking is efficient and uses very little memory, it still incurs extra overhead when locking rows. Second, locking more rows than needed increases lock contention and reduces concurrency.

InnoDB uses shared (read) locks on secondary indexes, but exclusive (write) locks are required to access primary indexes. This eliminates the possibility of using overwritten indexes and makes SELECT FOR UPDATE much slower than LOCK IN SHARE MODE or unlocked queries.


paraphrase

Secondary indexes

Why does a secondary index need to be queried? The essence of the row pointer stored in the secondary index: the secondary index does not hold a pointer to the physical address of the row in the leaf node, but the primary key of the row. Therefore, if this query is not an overwrite query, the primary key value of the row stored in the secondary index leaf node will be used to perform a second query in the table.

Back table query means that the secondary level query cannot directly query the data of all columns. Therefore, the secondary index is used to query the cluster index and then the required data. This process is called back table query.

Using INDEX condition in Extra is a secondary index return table, while Using WHERE is a filter.

Page divided

Page merge and split in InnoDB

Index selectivity

Index selectivity refers to the ratio of non-repeating index columns (also called cardinality) to the total record (#T) of the table, ranging from 1/#T to 1. The more selective an index is, the more efficient the query is, because a more selective index allows MySQL to filter out more rows in a lookup. Unique index selectivity is 1, which is the best index selectivity and the best performance.

So how do you find a balance between prefix indexes and index selectivity?

  • Choose prefixes long enough for high selectivity, but not too long (to save space).
  • Calculates the selectivity of the full column and makes the selectivity of the prefix close to that of the full column.

For an example, see High-performance MySQL P154 to P157.

clustering

Clustering means that rows of data are stored close together with adjacent key values.


Leftmost matching principle

When creating a joint index in MySQL, follow the principle of left-most prefix matching, that is, left-most first. Matches start from the leftmost of the union index when retrieving data.

Example: Create a joint index for columns COL1, col2, and col3

KEY test_col1_col2_col3 on test(col1,col2,col3);
Copy the code

Test_col1_col2_col3 creates (col1), (col1,col2), (col,col2,col3).

SELECT col1, col2, col4 FROM test WHERE col1="1" AND clo2="2" AND clo4="4"
Copy the code

The query above is executed according to the leftmost prefix matching principle and retrieved using indexes (COL1,col2) for data matching.


Index condition push down (ICP)

Index Condition Pushdown optimization was introduced in MySQL 5.6. During index traversal, you can judge the fields in the index first and directly filter out the records that do not meet the conditions to reduce the number of table returns

MySQL official manual:

The goal of ICP is to reduce the number of full-record reads and thereby reduce IO operations. For InnoDB clustered indexes, the complete record is already read into the InnoDB buffer. Using ICP in this case does not reduce IO.
Copy the code

Translation:

The purpose of ICP is to reduce I/O operations by reducing the number of full record reads. For InnoDB clustered indexes, the full record is already read into the InnoDB buffer, in which case ICP does not reduce I/O.Copy the code

According to the(name, is_del)The joint index queries all indexes whose names start with Zhang, returns to the table to query the corresponding row data, and then filters out the user data that is not deleted. The process is as follows: No index push-down execution process:Index push-down execution flow:The difference in the above image is that InnoDB is in(name, is_del)The index determines whether the data is logically excluded. If the data is logically excluded, the index directly determines and skips the data. As a result, you can see from the above example that you need to return to the table four times for data judgment, but now only need to return to the table two times.

Note:

  • Index push only works on secondary indexes, because InnoDB’s primary key index leaves hold full rows, so using index push does not reduce the query effect on full rows.
  • Index condition down can generally be used for the fields whose query fields are not or not all of the joint index, the query condition is multi-condition query and the query condition clause fields are all of the joint index.

MySQL enables index push down by default. SET Optimizer_switch = ‘index_condition_pushdown=off’; SET optimizer_switch = ‘index_condition_pushdown ‘; .


SQL Example Learning


conclusion

Practice is the mother of knowledge. Practice is the mother of knowledge. Highly recommended to check out High Performance MySQL!!


reference

SQL performance optimization Create high performance index union indexes


Personal note

This blog content is the author study “high performance MySQL” take notes, abuse delete! If used for other purposes, please specify the source!