This is the second day of my participation in Gwen Challenge

preface

MySQL > create index (); MySQL > create index ();

I believe that you will also encounter in the interview how to query optimization problems, which index related strategy is the key inspection items, such as how to set index column.

Here’s a detailed look at some of these indexing strategies from the book High Performance MySQL.

Separate columns

If the columns in the query are not independent columns, MySQL does not use indexes.

** “independent column” ** means that the index column cannot be part of an expression and cannot be a parameter to a function.

Student, for example, has an index idx_age;

The AGE column needs to be queried.

select * from student where age=20;
Copy the code

The index idx_age is used

However, if the query is as follows

select * from student where age+1=20;
Copy the code

The idx_age index is not used

Prefix index and index selectivity

** “index selectivity” ** refers to the ratio of non-duplicate index values to the total number of data table records, ranging from 1/N to 1.

For example, if there are 10 people and they have 5 different occupations, the choice is 5/10, which is 1/2.

The more selective the index is, the more efficient the query is and the more rows can be filtered out.

The first part of an index can save index space, improve index efficiency, but reduce index selectivity.

“How do I calculate the selectivity of the full column?”

For example, the age column option in the student table

select count(distinct age)/count(*) from student;
Copy the code

The book high Performance MySQL says selectivity approaches 0.031, which is basically usable.

“How do I create prefix indexes?”

For example, if the name column in student is prefixed, the number n is used to index the first n characters

alter table student add key(name(n));
Copy the code

For BLOB, TEXT, or very long Varchar columns, you must use a prefix index. MySQL does not support indexing these columns to their full length.

To calculate the appropriate prefix length is to calculate the selectivity of the complete column and make the selectivity of the prefix column close to that of the complete column.

“How do I determine n?”

select count(1) as count,left(remark,2) pre from student group by pre;
Copy the code

select count(distinct left(remark,2))/count(*) from student;
Copy the code

Select several arguments until the appropriate prefix length is calculated, for example:

select 
count(distinct left(remark,2))/count(*) as pre2,
count(distinct left(remark,3))/count(*) as pre3,
count(distinct left(remark,4))/count(*) as pre4,
count(distinct left(remark,5))/count(*) as pre5,
from student;
Copy the code

The column index more

The index merge strategy is sometimes the result of optimization, but more often it means that the index on the table is poorly built:

1. When you intersect multiple indexes (AND), it means that you need a multi-column index with all related columns, rather than multiple independent single-column indexes.

2. When a joint operation (OR) is required on multiple indexes.

For example, create a joint index for the age and name columns of the student table

alter table student add key idx_age_remark(age,name);
Copy the code

The query using the AGE column at this point uses the index

However, the name column query does not use the index

This is the left-most prefix principle of an index, which only uses the left-most prefix column of the index for retrieval.

“What is the left-most prefix?”

For example, if the index is (a,b,c), then a, ab, ABC are the leftmost prefixes and the index will be used, but if b, BC, ac are used, the index will not be used

Select the appropriate index column order

The correct order depends on the queries that use the index and how best to meet the sorting and grouping requirements.

The order of indexed columns means that the index is sorted by the leftmost column first, the second column second, and so on. Indexes can be scanned in ascending and descending ORDER to satisfy the query requirements for clauses such as GROUP BY, ORDER BY, and DISTINCT that precisely conform to column ORDER.

Rule of thumb:

Place the most selective column at the top of the index

Clustering index

A clustered index is not a separate index type, but a data storage method. The clustered index on InnoDB actually stores b-tree indexes and data rows in the same structure.

When a table has a clustered index, the rows are actually stored in the leaf pages of the index, meaning that the rows are tightly stored with adjacent key values. Rows cannot be stored in two different places at the same time, so a table can only be indexed by one cluster.

Leaf pages contain all the data for rows, while node pages contain only index columns.

If no primary key is defined, InnoDB selects a unique non-empty index instead. If there is no such index, a primary key is implicitly defined as the cluster index

advantages

(1) Relevant data can be saved together

(2) Faster data access. The index and data are saved in the same B-tree, and no query is required in the table

(3) Queries that use overridden index scans can directly use the primary key of the page node.

disadvantages

(1) Maximizes performance in IO intensive applications, but if all data is stored in memory, the order of access is not so important.

(2) The insertion speed depends heavily on the insertion sequence. Primary key insertion is the fastest way to load data into An InnoDB table. If the inserts are not in primary key order, it is best to reorganize the TABLE with the OPTIMIZE TABLE command after loading.

(3) Updating the cluster index is expensive, which will move the row data to a new location.

(4) A table based on a clustered index may suffer from “page splitting” when new rows are inserted or rows need to be moved due to update of the primary key. When the primary key of a row requires that the row be inserted into a full page, the storage engine splits the page into two pages to accommodate the row, causing the table to take up more storage space.

(5) Full table scan may be slow, especially when rows are sparse or data storage is discontinuous due to page splitting.

(6) Secondary indexes (non-clustered indexes) may be larger than you think, because the leaf nodes of secondary indexes contain primary key columns that reference rows.

(7) Secondary index access requires two index lookups. The secondary index leaf node does not hold a pointer to the physical location of the row, but rather the primary key value of the row. Get the primary key and then look up the corresponding row in the cluster index

Primary key index

“Secondary index”

Cover index

If an index contains or overwrites the values of all the fields to be queried, it is called an overwriting index.

Benefits of a query that only needs to scan the index without returning to the table:

(1) Index entries are usually smaller than the data row size. If only the index is read, the data access will be greatly reduced.

(2) Indexes are stored in column value order, which is much less IO for IO intensive range queries than if each row of data is read randomly from disk.

(3) Some storage engines, such as MyISAM, only cache indexes in memory, while data is dependent on the operating system to cache, so it requires a system call, resulting in performance problems.

(4) Because of InnoDB’s clustered index, overwriting index is very beneficial for InnoDB tables. InnoDB’s secondary index holds the primary key of the row on the leaf node, so if the secondary index can override the query, it can avoid the secondary query on the primary key index. So InnoDB secondary index query can also use overwrite index.

When you initiate an overwrite index query, you can see the “Using Index” information in the Extra column of EXPLAIN.

For example, if I create a joint index of age and name above, I will query only those two columns.

Use index scans for sorting

MySQL can generate ordered results in two ways:

Sort by operation or by index. If type is index, MySQL is using index scan to sort.

If the index does not cover all the columns required by the query, you have to go back to the table to query the corresponding row without scanning a single index record. This is basically random IO.

MySQL can use the same index for both sorting and lookup rows.

An index can be used to sort results only if the index’s column Order is exactly the same as the Order by clause, and all columns have the same sort direction (DESC or ASC).

If a query requires multiple tables to be associated, the index can be used to sort only if all the fields referenced by the Order BY clause are the first table.

The Order BY clause has the same restrictions as a lookup query, requiring the left-most prefix of the index. Otherwise, sort operations need to be performed.

One case where the left-most prefix is not required is when the leading column is a constant. If you specify constants for these columns in the Where or Join clause, you can compensate for the lack of indexes.

For example: table T has multiple indexes (A, B, C)

select * from T where A='1' order by B,C
Copy the code

Leftmost prefix:

select * from T where A='1' order by B desc


select * from T where A > '1' order by A desc,B desc 
Copy the code

“Can’t use index sort:”

Different sorting directions:

select * from T where A = '1' order by A desc,B asc
Copy the code

The order by clause exists in a column that is not in the index:

select * from T order by A desc,D desc
Copy the code

Non-leftmost prefix:

select * from T where A=1 order by C desc
Copy the code

Range query:

select * from T where A>1 order by B,C


select * from T where A=1 and B in ('a','b') order by C desc
Copy the code

“Using Where” appears in the Extra column of EXPLAIN, indicating that the server will store the row returned by the engine before applying the Where condition.

For both range and In queries, the EXPLAIN type is range. For range queries, MySQL cannot use any index column after the range column, but for In queries, there is no such restriction.

Redundant and duplicate indexes

Duplicate indexing means that indexes of the same type are created on the same columns in the same order.

Redundant index refers to the index that already has A multi-column index and creates the left-most prefix of this multi-column index, for example, another (A, B, C) index and creates A (A) or (A, B) at the same time. This is A redundant index.

Unused index

Unused indexes should be dropped

Index and lock

Indexes allow queries to lock fewer rows.

Locking rows incurs additional overhead, and locking more rows than needed increases lock contention and reduces concurrency.