Small knowledge, big challenge! This paper is participating in theEssentials for programmers”Creative activities

Mysql > select * from ‘Mysql’;

preface

In the previous section, we explained in detail the mysql cluster index part and mysql index use matching rules. The most important content is the most left matching rules, which can be derived from the application of many rules, so it needs to focus on the close, and other content only need to learn.

Learning Content:

  1. Learn how to design indexes
  2. How to avoid the design index pit
  3. How do you make your queries 100% indexed

Suggestions for building indexes

Here are some tips for daily indexing:

  • Fields that are frequently queried or sorted
  • Fields with more values have higher query value for the optimized index
  • Query for types with small fields, such as tinyint, char, and so on
  • Use primary key increment instead of UUID whenever possible
  • Indexes don’t require much design
  • ** If a range query is used, indexes are not available in most cases, so you should place the range query at the far right of the query.
  • The first range query can use the index, but the second cannot, so it is best to have only one range query

Index usage issues:

Paging and sorting

In the previous section we discussed that for most pagination and sorting, there is no way to use an index because the federated index must be looked up in the leftmost way.

Case study:

For example, when we query province, city and gender, sometimes we need to query according to different fields, so in many cases, the principle of left-most matching cannot be applied.

Solutions:

Instead, it is better to place three fields, such as province, city, and gender, at the far left of the union index. This way, the union index can be combined with other fields, so that most queries can filter the where condition directly through the index tree.

Suggestion: In the design process, you can design several fields of the index and match the query from left to right, the last one is processed with the range value, so that the whole query can use the index.

Mysql Execution Plan

What are the execution costs?

The first is the cost of executing the plan. When we calculate the cost of CPU, the cost of compliance is 0.2, and the cost of reading from disk to memory is set to 1

Show table status like “table name”. For Innodb, rows is the estimated value. Here are the estimated values:

  • “Rows” is the number of records in a table,
  • Data_length: specifies the size of the cluster index in bytes

How to calculate page count:

  • Data_length divided by 1024 is the size in kilobytes, and then divided by 16KB (the default size of a page) is how many pages there are

  • IO cost is: number of data pages * 1.0 + trim value, CPU cost is: number of rows * 0.2 + trim value

Index access speed estimation

  1. You first need to estimate the range of the calculation, for example by the range value of NAME, and if only one range exists, it is usually scanned on a data page.
  2. Assume the efficiency of the secondary index is 100 pages, and then use 0.2, which is 20, which is the speed of the secondary index
  3. Then the secondary index needs to go back to the table operation, at this point, it needs to go back to the clustered index table for lookup.

Common optimization methods:

Constant replacement

Select * from T1 join t2 on t1.x1=t2.x1 and t1.id=1; select * from T1 join t2 on t1.x1=t2.x1 and t1.id=1; Select * from t1 where id=1 join t2 on t1 where x1 =t2.x1; select * from t1 where id=1 join t2 on T1 where x1 =t2.x1

The subquery

First, a subquery is divided into two statements, starting with a lookup of the content based on the primary key’s clustered index. For the above subquery, the execution plan will be optimized to execute the subquery first, i.e. select x2 from T2 where x3= XXX, and write all the data to a temporary table, also called materialized table, meaning that the intermediate result set will be materialized.

Optimization of semi-connections

Select t1.* from T1 semi JOIN T2 on T1.x1 =t2.x2 and t2.x3= XXX, select t1.* from T1 semi JOIN T2 on T1. You can link only semi-linked tables that meet the ON condition.

The relationship between execution plans and SQL statements: Although indexes can be used for less complex single-table queries, many times the use of SQL such as statistics, summaries, and functions can slow down the overall SQL query and use speed.

Performance indicators:

Here are three basic principles:

  1. Primary key index queries must be CONST

  2. If a secondary index is CONST, your index must be UNIQUE. However, if the query method IS NULL, the REF method IS still used.

  3. In addition, if the query is based on the order column of the index, but the WHERE condition is not, you can use the index to find the leaf node directly.

  4. Ref_or_null (select * from table where name=x and name IS NULL

  5. A common INDEX can be queried by REF, similar to INDEX(NAME, AGE).

  6. The RANGE query mode is used for RANGE query

  7. For this kind of access to the secondary index you can get the data you want, without the need to go back to the source of the clustered index access method, called index access method! INDEX requires traversal of a secondary INDEX, but because the secondary INDEX is small, traversal performance is not bad.

Now let’s pause for a moment and think about it. Const, ref, and range are essentially binary searches and multi-level jumps based on index trees, so performance is generally very high. Then index is a bit slower. Because it does this by traversing the leaves of a secondary index tree, it is certainly slower than a binary search based on an index tree, but it is still better than a full table scan.

Driver table and driven table

Driven table: refers to the table that the associated query criteria need to filter first, usually in front of the table

Driven table: usually according to the associated data of a table to find the content of another table for association, so it is called driven table.

Driving rules

Loop nesting rule: We assume that 10 data are found in the driver table. Finding the driven data through some of the fields in the driver table means that the number of times the driver table lookup needs to be performed in the driven table.

For example, if the drive table is 10 times, the driven table scans the entire table for 10 times.

The explain plan

Basic field format:

First, you need to understand the basic format of EXPLAIN

  1. First, a select will have an ID, usually in a complex query that contains multiple table queries, such as JOIN, IN, etc

  2. SelecT_TYPE: This specifies the type of query

  3. Table: indicates the Table name

  4. Partitions: This represents a table space, the concept of Partitions

  5. Type: for example, the optimization level of the query, const, index, all, respectively represent the clustered index, secondary index, full table scan query search mode

  6. PossiblEkeys: Like type, PossiblEkeys can be accessed by determining which indexes are available.

  7. Key: Determines what options are available and provides the corresponding length of the index

  8. Key_len: indicates the length of the index

  9. Ref: Indicates matching information during equivalent matching

  10. Rows: Estimates how many Rows of data will be read by index or other means

  11. Filtered: indicates the remaining percentage of data Filtered by search criteria.

  12. The extra information doesn’t matter.

Here is a simple example:

expain select * from (select x1,count(*) as cnt from t1 group by x1) as _t1 where cnt > 10
Copy the code

The query result is as follows:

DERIVED: indicates the index scanning speed that the results of the subquery materialize to an internal temporary table, and then the outer query materialize to start the search group aggregation for the temporary table.

About query levels:

  1. Const: Typically for primary key queries
  2. Ref: Queries based on a headphone index
  3. Eq_ref: indicates that join queries are associated according to secondary index indexes
  4. Eq_ref_null: Associative query is allowed based on the Null value during the association of secondary indexes
  5. Index_merge: A × query may extract data based on multiple indexes and then merge it
  6. Range: The query method is Range based on the secondary index Range query

Using filesort

This can be seen in sorting, especially in paged sorting queries, and should be avoided because sorting without indexing is very, very slow and requires the use of Memory tables for data manipulation.

Using temprory

Similar to Filesort, temporary tables may be generated due to the large amount of data.

conclusion

The key point is, try to use one or two complex multi-field joint index, resist more than 80% of your queries, and then use one or two secondary index resist the remaining 20% of atypical queries, to ensure that more than 99% of your queries can make full use of the index, you can ensure the speed and performance of your queries!

other

Table design on whether to log in for 7 days

7 days login is a relatively common small requirement, the simplest way is not the range query, but to increase a 7 days login flag value and periodically refresh the value of this field through the scheduled task. If you want to use an index for such a query, you can design a federated index as follows: (province, city, sex, hobby, character,does_login_in_latest_7_days, age), then when searching, Does_login_in_latest_7_days =1 does_login_latest_7_days =1 does_login_latest_7_days =1 does_login_latest_7_days =1

A case study for designing auxiliary indexes

Use secondary indexes, such as adding a secondary index to speed up sorting and filtering operations.

Write in the last

The last part of the index is actually more combined with the execution plan, the best way for MYSQL to optimize is to learn MYSQL explain plan, which is a very powerful tool to use.