MySQL index invalidation

preface

This note is learned from the advanced MySQL database in Silicon Valley

DQL, DML, DDL statements are not included in this course. If necessary, please refer to the video mysql_Basics

My mysql version is

Server version: 5.5.62-log MySQL Community Server (GPL)

Course introduction

This course is aimed at Java developers, not fully suited to operations and database engineers

1. Introduction of mysql architecture

2. Index optimization analysis

3. Query interception analysis

Mysql Logical Architecture

Mysql is a bit different from other databases in that its architecture can be used and used well in many different scenarios

This is mainly reflected in the architecture of the storage engine

Plug-in storage engine architecture separates query processing from other system tasks and data storage and extraction

In this architecture, you can select an appropriate storage engine based on service requirements

This section describes the architecture level and functions of mysql

1. Connection layer

At the top layer are some client and connection services, including local SOCK communication and most TCP/IP-like communication implemented based on client/server tools

Mainly complete some similar connection processing, authorization and authentication, and related security schemes

On this layer, the concept of thread pools is introduced to provide threads for clients that are securely accessed through authentication.

SSL – based secure links can also be implemented at this layer. The server also validates that it has operational permissions for each client that is securely connected

2. Service layer

The second layer architecture mainly completes the core service functions, such as SQL interface, and completes the cache query, SQL analysis and optimization and the execution of part of the built-in functions. All cross-storage engine functions are also realized in this layer, such as procedures and functions.

In this layer, the server parses the query and creates the corresponding internal parse tree, and performs the corresponding optimization such as determining the order of the query table, whether to utilize the index, and finally generates the corresponding execution operations.

In the case of a SELECT statement, the server also queries the internal cache. If the cache space is large enough, it can greatly improve system performance in an environment where a large number of read operations are handled.

3. Engine layer

Storage engine layer, storage engine is really responsible for the storage and extraction of data in MySQL, server through API communication with storage engine. Different storage engines have different functions, so you can select them according to your actual needs.

The MyISAM and InnoDB engines that are used a lot will be discussed later

4. Storage layer

The data storage layer stores data on file systems running on raw devices and interacts with storage engines.

Introduction to Storage Engines

The storage engine is responsible for storing and extracting data from mysql

We use SQL command to view

mysql> show engines;
mysql> show variables like '%storage_engine%'
Copy the code

The show engines command writes out the current engine and a brief description of it

The show variables like ‘%storage_engine%’ command shows innoDB as the default engine

You can see that the most common **InnoDB engine in our current version is the default **, which is described as

Supports transactions, row-level locking, and foreign keys

Namely: support for transactions, row locks, and foreign keys

Contrast MyISAM with InnoDB

Compare the item MyISAM InnoDB
The main foreign key Does not support support
The transaction Does not support support
Line table lock Table locks. Operating on even one record locks the entire table,Not suitable for high concurrency! Row locks. Lock one row, not the other rows,Suitable for highly concurrent operations
The cache Only indexes are cached, not real data Caching not only indexes but also real data requires a lot of memory, and memory size has a decisive impact on performance
Table space small big
concerns Performance, The transaction,
The default installation Y Y

Talk about SQL

Execution order

For the SQL we developers write, yes

select distinct
	<select_list>
from
	<left_table> <join_type>
join <right_table> on <join_condition>
where
	<where_condition>
group by
	<group_by_list>
having
	<having_condition>
order by
	<order_by_condition>
limit
	<limit_number>
Copy the code

For mysql services, parsing starts from

from
	<left_table>
on <join_condition>
<join_type> join <right_table>
where
	<where_condition>
group by
	<group_by_list>
having
	<having_condition>
select
distinct
	<select_list>
order by
	<order_by_condition>
limit
	<limit_number>
Copy the code

7. Join theory

  • == inner join == : inner is used to look up intersection, that is, common data in multiple tables

  • == left join == : Left join overwrites the left table, and can check the left table and intersection, or exclude the data existing in both the left and right tables

  • == right join == : Right join overwrites the right table. You can check the right table and the intersection or exclude the data that exists in both the left and right tables

  • Select * from outer select * from outer select * from outer select * from outer select * from outer select * from outer

    • Note: MySQL does not support FULL OUTER JOIN, but we can union the results of left and right joins

    • The MySQL UNION operator is used to join the results of more than two SELECT statements into a result set

      Multiple SELECT statements delete duplicate data (you can use UNION ALL to query duplicate data)

      SELECT name FROM name UNION [ALL] SELECT name FROM name ORDER BY name;

Talk about the index

What is an index?

The official definition of an index is: an index is a data structure that helps mysql obtain data efficiently

Can be understood as: a kind of ordered efficient lookup data structure

Whether innoDB engine in mysql is B tree or B+ tree is controversial on the Internet. Let’s check the mysql development manual

As you can see, it’s mentioned in the development manual

Everyone has seen a B-tree and knows that the entries in the root page point to the leaf pages. (I indicate those pointers with vertical ‘|’ bars in the drawing.) But sometimes people miss the detail that leaf pages can also point to each other (I indicate those pointers with a horizontal two-way pointer ‘<–>’ in the drawing). This feature allows InnoDB to navigate from leaf to leaf without having to back up to the root level. This is a sophistication which you won’t find in the classic B-tree, which is why InnoDB should perhaps be called a B+-tree instead.

Everyone has seen a B tree and knows that entries in the root page point to leaf pages. Vertical (I was in a diagram “|” said the pointer.) But sometimes people overlook the fact that pages can also point to each other (I’ve represented these Pointers in the graph as horizontal bi-directional Pointers “<–>”). This feature allows InnoDB to navigate from one leaf to another without backing up to the root level. This is a complexity not found in a classic B-tree, which is why InnoDB should be called a B+ tree.

Mysql indexes also use B+ trees

Mysql > create index BTREE (BTREE)

The purpose of an index is to improve the efficiency of queries. It can be likened to a catalogued book or dictionary

If there is no index, a full table scan will be performed, which is very inefficient when the data volume is large!

Advantages and disadvantages of indexes

advantages

  • Fast query, improve retrieval efficiency, reduce I/O costs
  • Sorting data reduces the cost of sorting data and CPU consumption

disadvantages

  • An index is also a file. The table holds primary keys and index fields and points to records in the entity table, so index columns also take up space
  • Although indexing greatly improves retrieval efficiency, it doesReduce the efficiency of updating tables
    • When updating a table, mysql not only saves the data, but also updates the index file with each update
  • Indexes can be a factor in improving efficiency, but if mysql has a large number of tables, you need to spend time researching indexes or optimizing queries that best meet your business needs

Indexes are usually too large to be stored in memory, so they are often stored on disk as index files

Since indexing is about sorting and finding, it affects where lookup and order by sorting in SQL

The index classification

Single value index

  • An index contains only one column
  • There can be multiple single-valued indexes in a table

This is applicable when a single column query is in high demand

The composite index

A composite index is a multi-column index created by combining multiple columns. It follows the left-most prefix rule when used

Left-most prefix rule: Indexes are used only when the first field of a composite index is used in a query condition

The only index

The value of the indexed column must be unique, allowing null values

Define the index as UNIQUE with the keyword UNIQUE

The primary key index

A primary key index is a special unique index, and a null value ** is not allowed

The basic grammar

create

create [unique] index Indicates the index nameonTable name (column name (length));alter tableThe name of the tableaddIndex [index name] (column name (length))Copy the code
1,PRIMARYKEY (primary KEY index)ALTER TABLE table_name ADD PRIMARY KEY ( column )

2,UNIQUE(Unique index)ALTER TABLE table_name ADD UNIQUE (column)

3INDEX(common INDEX)ALTER TABLE table_name ADD INDEX index_name ( column )

4, FULLTEXT(FULLTEXT index)ALTER TABLE table_name ADD FULLTEXT ( column )

5, multi-column indexALTER TABLE table_name ADD INDEX index_name ( column1, column2, column3 )
Copy the code

delete

drop index [indexName] on tableName;
Copy the code

To view

show index from tableName
Copy the code

Index structure

BTree index

B tree here

B + tree here

The light blue block in the background is what we call a disk block

You can see that each disk block contains several data items (dark blue blocks) and Pointers (yellow blocks),

For example, disk block 1 contains item 17 and 35, and Pointers P1, P2, and P3

B tree features: Each node has data and Pointers

  • The node ordering
  • A node can hold more than one meta, and the meta is sorted
  • Each node stores keys and data

B+ tree features: Data only exists in leaf nodes

  • It has the characteristics of a B tree

  • All Pointers on non-leaf nodes are index Pointers. Leaf nodes store all elements and indexes in a sorted order

  • There are Pointers between the leaf nodes

    • Sequential access Pointers were added, with each leaf node adding a pointer to adjacent leaf nodes
  • Only leaf nodes store data, and leaf nodes contain all indexes and data for the tree

In the B + tree

Real data exists in the leaf nodes, namely 5,8,9,10,15,18,20,26,27,28,30,33,35,38,50,56,60,63,65,73,79,80,85,88,90,96,99

Non-leaf nodes do not store real data, but only the data items that guide the search direction. For example, 5, 28 and 65 do not really exist in the data table

A Hash index

Because of the structure of the hash index, it is directly used in each query, unlike b-tree. Therefore, hash indexes are more efficient than B-tree indexes. However, hash also has the following disadvantages:

  • Since it stores hash values, only <=> and in operations are supported
  • The hash index cannot be sorted by manipulating the index. This is because the hash value calculated during storage is not necessarily the same as the stored one. Therefore, the hash index cannot be sorted
  • There is no way to index parts in a composite
  • A full table scan cannot be avoided because the memory table supports hash indexes with non-unique values, that is, different index keys that may have the same hash value
  • Hash indexes become inefficient when there are many identical hash values

Full – text index

Full text is only supported by myISam in mysql, and only char, vARCHar, and text data types are supported in mysql

Full-text is used to replace like “%***%”

R – Tree indexes

R-tree is rarely used in mysql. It supports only the geometry data type. Only myISAM, BDB, InnoDB, NDB, and Archive support this type of storage engine.

Compared with B-tree, R-tree has the advantage of range search.

Whether an index should be created

Which situations need to be created

  • The primary key automatically creates a unique index, the default name isPRIMARY
  • Fields frequently used as query criteria
  • A foreign key field associated with another table as a query condition
  • Single key index/compound index? Compound indexes are recommended for high concurrency
  • A sorted field in a query that is accessed by index to speed sorting
  • Select * from group by; select * from group by; select * from group by

Which cases do not need to be created

  • The table has too few entries, which may cause index invalidation

  • If a field is frequently updated, the index is updated at the same time, consuming performance and reducing efficiency

  • Fields not used in the WHERE condition

  • A watch that is often added, deleted, or modified

  • With many duplicate columns of data, indexing is not very effective

    • If A table has 100,000 rows, A field A has only T and F values, and the probability of distribution of each value is about 50%, then the index of such A table A field will not improve the query speed of the database.

      Index selectivity refers to the ratio of the number of different values in the index column to the number of records in the table. If there are 2000 records in a table with 1980 different values in the index column, the index selectivity is 1980/2000=0.99

      The closer the selectivity of an index is to 1, the more efficient the index will be

Performance optimization analysis: THE SQL performance is slow

Server hardware performance bottlenecks: TOP,free,iostat, and vmstat to view the system performance status

  • CPU bottleneck:
    • CPU saturation occurs when data is loaded into memory or read from disk
  • IO bottleneck:
    • Disk I/O bottlenecks occur when much more data is loaded than there is memory

Use IP addresses instead of domain names for database paths to avoid DNS resolution problems

Explain the use of

What is explain?

Using the Explain keyword, you can simulate the mysql statement optimizer to execute optimized SQL statements to see how mysql is optimized to analyze performance bottlenecks in SQL statements or table structures

Id, type, key, Rows Extra

Explain how to use?

Explain + SQL statement

explain select * from t_user;
Copy the code

explain select * from t_user where uid=1;
Copy the code

The read order of the table

It is not written in the same order, because mysql does overwrite sort (similar to Java instructions reordering).

Table Specifies the name of the table

Id is the execution level

  • Execute from top to bottom with the same ID
  • A table with a larger ID is executed earlier

Derived2

indicates that the table is a virtual table derived from the table whose id is 2.

The type of data read operation

Select_type is the query type

Complex queries used to distinguish common queries, federated queries, subqueries, and so on

  • simple
    • A simple SELECT query, with no subqueries or unions
  • primary
    • If the query contains any complex subparts, the outermost query is marked as PRIMARY
  • subquery
    • A subquery followed by select or WHERE
  • derived
    • Temporary virtual tables DERIVED from subqueries contained in the FROM list will be marked as DERIVED
  • union
    • If the second select appears after the union, it is marked as union
    • If UNION is included in the subquery of the FROM clause, the outer select is marked as DERIVED
  • union result
    • Select to get the result from the UNION table

Type is the access type

Shows which type is used for the query

From best to worst

System >const>eq_ref>ref>range>index>ALL

In general, make sure the query is at least at the range level and preferably at ref

  • system
    • The table has only one row (equal to the system table), which is a const column
  • const
    • **const is used to compare a primary key with a unique cluster index.
    • Because only one row of data is matched, MySQL can quickly convert the query to a constant if the primary key is placed in the WHERE list
  • eq_ref
    • A unique index scan in the case of a multi-table query in which only one record in the table matches each index key
    • Common for federated queries with primary key or unique index scans
  • ref
    • A non-unique index scan that returns all rows matching a single value
    • It is also essentially an index access that returns all rows that match a single value
    • It may find multiple rows that match the criteria, so it should be a mixture of lookup and scanning
  • range
    • Only rows of a given range are retrieved, using an index to select rows. The key column shows which index is used
    • Where statement between, <, >, in, etc
    • This range scan index scan is better than a full index scan because it only needs to start at one point in the index and end at another point without scanning the whole index (if the index is added)
  • index
    • Full Index Scan: The Index type is different from ALL. Only the Index tree is traversed
    • This is usually faster than ALL because index files are usually smaller than data files
    • That is, both all and Index are read from the full table, but Index is read from the Index, while all is read from the hard disk
  • ALL
    • Walk through the table to find matching rows

Which indexes can be used and which indexes are actually used

Possible_keys displays possible references to one or more indexes in the table

If there is an index on the field involved in the query, the index will be listed, but the query may not be actually referenced!

Keys indicates the actual index used. If null, no index is used

If an overridden index is used in the query, it only appears in the key list

If an index contains (or overwrites) the values of all the fields that need to be queried, it is called a ‘overwrite index’. That is, you only need to scan the index without returning to the table

EXPLAIN select * from T_user where uname=’yyf’ and uid=2

Where uid is the default primary key index and uname is the created normal index

Possible_keys is PRIMARY and index_user_uname, indicating two possible indexes

Keys = PRIMARY

Key_len indicates the number of bytes used in the index

This column is used to calculate the length of the index used in the query. With no loss of accuracy, the shorter the length, the better

Key_ len displays the value for the maximum possible length of the index field, not the actual length used

That is, key_ len is computed from the table definition, not retrieved from the table

References between tables

Ref shows which column of the index is used

A const, if possible

Which columns or constants are used to find values on index columns

How many rows per table are queried by the optimizer

Rows roughly estimates the number of rows to read to find the desired record based on table statistics and index selection

In the case of a full table scan, this value is large and inefficient

Extra contains additional information that is important but not suitable for display in other columns

  • Using filesort
    • Note mysql uses an external index sort for data, rather than reading data in the order of the indexes in the table.
      • Filesort uses the corresponding sorting algorithm to sort the obtained data in memory
    • A sort operation in MySQL that cannot be done using an index is called “file sort”
    • There are actually two algorithms for filesort implementation in MySQL
      • Double sort: the sort field and the row pointer information that can directly locate the row data are retrieved according to the corresponding conditions, and then sorted in sort buffer
      • Single-way sort: Sort all the fields that meet the criteria in sort buffer at one time.
  • Using temporary
    • Temporary tables are used to hold intermediate results. MySQL uses temporary tables when sorting query results.
  • USING index
    • The corresponding select operation using a Covering Index, avoiding access to the table row, efficiency is good
    • Using WHERE indicates that the index is used to perform a key lookup.
    • If using WHERE is not present at the same time, the index is used to read data rather than perform lookup actions.
  • Using where
    • Indicates where filtering is used
  • using join buffer
    • Connection caching is used
  • impossible where
    • The value of the WHERE clause is always false and cannot be used to get any tuples
  • select tables optimized away
    • Optimize MIN/MAX operations based on indexes in the absence of the GROUPBY clause
    • Or for MyISAM storage engine optimization COUNT(*) operations, you don’t have to wait until the execution phase to perform the calculation, the optimization is done during the query execution plan generation phase.
  • distinct
    • Optimize the distinct operation to stop finding the same value once the first matching tuple is found

Mysql stored procedure create loop

Here we write a mysql loop insert data for testing

Because mysql can’t do a while loop like Java, you need to create and call a stored procedure

A stored procedure is a code snippet that executes when semicolons are encountered during mysql execution. How do you change the semicolons and keep the code executing? Delimiter

Delimiter tells the mysql interpreter that the command is finished and ready to execute.

By default, delimiter is a semicolon; Execute when a semicolon is encountered.

The // symbol tells mysql to wait until it encounters //

delimiter //# define the identifier as a double slashdrop procedure if existstest; Delete the test stored procedure if it existscreate procedureTest () # create a stored procedure with no parameters named testbegin
    declare i int default 0; Declare the variable and assign the value while I< 1000Do # ends the loop when I is greater than10Break out of the while loopinsert into test(username,balance) values("while",i); Select * from testset i = i + 1; # loop once, I plus oneendwhile; # End the while loopend
//# end definition statement delimiter; # change back to the original identifiercalltest(); Call the stored procedureselect * from test order by id desc limit 0.1000; Select * from testCopy the code

We create a table and insert 100,000 pieces of data into the table for testing

CREATE TABLE `t_index` (
  `id` int unsigned zerofill NOT NULL AUTO_INCREMENT,
	`uid` VARCHAR(255) not NULL,
  `username` varchar(255) not NULL,
	`password` VARCHAR(255) not null,
  `balance` int unsigned zerofill not NULL,
	`views` int not null.PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

delimiter //
drop procedure if exists test;
create procedure test()
begin
    declare i int default 0;
    while i < 100000 do
        insert into t_index(uid,username,`password`,balance,views)
				values (i,"while","while",i,i);
        set i = i + 1;
    end while;
end
//
delimiter ;
call test();
select * from t_index order by id desc limit 0.1000;
Copy the code

Select query statement poorly written

What are the possible causes of long SQL execution times and long wait times?

So let’s analyze it

Unneeded records were queried

  • Select * from table_name; select * from table_name; select * from table_name;
  • Select * from table where LIMIT 1 = 1 and LIMIT 1 = 1

Return all columns for multiple table associations

  • The association of multiple tables, such as some meaningless columns, can also affect performance.

Always pull out all columns

  • Queries such as SELECT *, which fetch all columns, do not necessarily achieve index overwrite optimizations, but trigger various back-table queries, incurring additional IO, memory, and CPU consumption on the server.

Query the same data repeatedly

  • For this type of query that returns the same result every time, in fact, it is enough to check once, save the result in Redis such cache, relieve MySQL pressure.

Small tables drive large tables

That is, operate on the small dataset first, and then operate on the large dataset

How to choose in and exists?

Check the internal table in exists

Therefore, in is preferred if a subset of in is small, and exists is used conversely

  • The exists:
    • Put the data of the master query into the sub-query for conditional verification, and determine whether the data result of the master query can be retained according to the verification result (TRUE or FALSE)
    • Since only TRUE/FALSE is returned, either select *, select 1, or anything else in the subquery will work, and fields will be ignored when executed
  • In:
    • The in query is executed only once. It caches all the queried data and then checks if the queried fields are present in the cache. If so, it adds the queried data to the result set until all the queried results are iterated
Select * from B where B = 1select * from A where id in (select id fromB) select * from Bselect * from A where exists (select 1 from B where B.id = A.id)
Copy the code

Our principle is to operate small first, so when B is smaller than A, we should operate table B first

B = in B = in B = in

When the data volume of A is smaller than that of B, we should operate table A first

If exists exists, select A from the exists table

Order by sort FileSort

Order by sentence, try to use index sort, namely index sort, avoid using filesort sort, namely filesort

Sort on index columns as much as possible (default ASC sort)

Keep in mind that the order by index is strictly in order

There are two conditions for sorting using index:

  • The order by statement uses the left-front index principle
  • The combination of WHERE clause and order by clause condition columns satisfies the left-front index principle

When will filesort come?

  • Not indexed

  • Do not follow the leftmost prefix rule

  • Inconsistencies follow the default ASC order (that is, one uses ASC and the other uses DESC) and the index is invalid

    • Of course, as long as all are ASC or all are DESC, it will not be invalid

Do the reverse of filesort

There are actually two algorithms for filesort implementation in MySQL

  • Double sort (before Mysql4.1) :
    • Scan the disk twice and finally get the data. It is time-consuming and inefficient
    • Take the row pointer to the data and the ORDER BY column, sort them, then scan the sorted list and reread the corresponding data output from the list according to the values in the table
    • Read the sort fields from disk, sort them in sort_buffer, and fetch other fields from disk
  • Single-way sort (after mysql4.1) :
    • Sort_buffer is to take out all the fields that meet the condition at one time, sort them according to the order BY column in sort_buffer, and then scan the sorted fields for output
    • It is faster and avoids a second read of the data. And it turns random IO into sequential IO, but it uses more space because it saves each row inIn the memorysorting
      • If the problem is not solved once, it is necessary to perform IO operations in batches to retrieve data, which becomes inefficient
        • Increase the sort_buffer_size parameter to increase the buffer size
        • Increases the max_LENGTH_FOR_sort_data parameter

Increased order by speed

Use indexes, of course, and try not to invalidate them

It is best not to use SELECT *, which results in a query with too much data

  • When total field size less than max_length_for_sort_data, Query and sort field is not a TEXT | BLOB, will use the improved algorithm, single channel sorting, or with the old algorithm, multi-channel sorting, So you can increase the max_LENGTH_FOR_sort_data parameter
  • Both algorithms may exceed the size of sort_buffer. After that, TMP files will be created for merge sorting, resulting in multiple I/ OS. However, single-path sorting is more risky, so the sort_buffer_SIZE parameter should be increased

Just a quick summary

create index test on test(a,b,c);

Create federated indexes for fields A, B, and C

Follow the leftmost rule: indexes can be used

  • order by a
  • order by a,b
  • order by a,b,c
  • order by a DESC , b DESC , c DESC
  • where a = const ORDER BY b , c
  • where a=const and b=const ORDER BY c
  • Where a=const and b>=const ORDER BY b, c

Index of the failure

  • ORDER BY a DESC, B ASC, c DESC
  • ORDER BY b, c
  • Where a=const ORDER BY c
  • Where a=const ORDER BY b, d
  • where a in (…) ORDER BY b, c

Group by is similar to Order BY

The essence of group by is to sort and then group the index according to the left-most prefix principle

Where takes precedence over HAVING, so don’t write in HAVING if you can write in WHERE

Seven conditions of index failure

Index failure is when an index is created but not used

Take a look at the blog “Index failures and precautions” for a more comprehensive summary

The following are possible index failures

Index invalid formula:The number of models is fastest by air

  • Mode: fuzzy query LIKE starts with %
  • Type: data type error
  • Number: Use internal functions for index fields
  • NULL: Index column is NULL
  • Operation: The index column performs four operations
  • Most: The composite index does not start the search at the left of the index column
  • Fast: Full table lookup is expected to be faster than indexing

1.Separate references to non-first position index columns in a composite index

Compound indexes follow the left-most prefix principle

That is, the index is used only when the first field of the composite index is used in the query condition.

Therefore, the order of index columns in a composite index is critical. An index cannot be used unless the search starts in the leftmost column of the index.

If there is INDEX(a,b,c), the INDEX can be used when the condition is A or A,b or a,b,c, but the INDEX will not be used when the condition is B, C.

Therefore, when creating a composite index, we should place the columns most commonly used as constraints at the far left, descending in order

It’s actually going to look it up in the index file. Type is index

2,Perform operations on indexed columns

Operations include (+, -, *, /,! , < > and! =, %, or, in, exist, etc.), causing index invalidity

Or causes index invalidation: use union

Note that OR also invalidates indexes, so either perform the operation in business or use the union in mysql

Union is used to combine the results from multiple SELECT statements into a result set

When using union, mysql deletes duplicate records from the result set

When using union all, mysql returns all records and is more efficient than using union

select * from t_index where uid ='921930193014' or balance = 499010Change # ↑ to ↓, useunion

select * from t_index where uid ='921930193014'
UNION 
select * from t_index where balance = 499010
Copy the code

Select * from test where id-1=9;

Select * from test where id=10;

Attention!!!!!

SQL > not in, not exists, <>! = Field indexes that do not run range are invalid

<, >, <=, >=, this is judged according to the actual query data. If the overall scan speed is faster than the index speed, the index is not moved

The range query field of type range and the prime following it are invalid

What are range type query criteria and their following field indexes invalid?

Let’s look at an example

T_index table has 140W data structure as follows

CREATE TABLE `t_index` (
  `id` int(10) unsigned zerofill NOT NULL AUTO_INCREMENT,
  `uid` varchar(255) NOT NULL,
  `username` varchar(255) NOT NULL,
  `password` varchar(255) NOT NULL,
  `balance` int(10) unsigned zerofill NOT NULL,
  `views` int(11) NOT NULL.PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1400001 DEFAULT CHARSET=utf8;
Copy the code
CREATE index index_tindex_username_balance_views on t_index(username,balance,views)

EXPLAIN
select * from t_index where username = 'while' and balance > = 499008 ORDER BY views DESC limit 1
Copy the code

SQL > create index username,balance,views3

As can be seen, although 3 field index is used, the query type is still range level, and extra is using filesort external file sorting, not using views index sorting, of course, the efficiency will be lower

Type becomes range, which is tolerable. Using filesort in Extra is still unacceptable.

But we already have the index, why not use index sort?

This is because the way the BTree index works,

To sort the username

If the same username is encountered, then balance is sorted. If the same balance is encountered, then views is sorted

When the balance field is in the middle of the joint index,

The balance > 499008 condition is a range value.

MySQL cannot use the index to retrieve the following views, that is, the range query field and its prime citation are invalid

So how do we solve this problem? Remove the index of the range condition field, i.e., subtract the balance index column

Add only the indexes of username and views

DROP index index_tindex_username_balance_views on t_index

CREATE index index_tindex_username_views on t_index(username,views)

EXPLAIN
select * from t_index where username = 'while' and balance > = 499008 ORDER BY views DESC limit 1
Copy the code

You can see that type is changed to ref, ref displays const constants, and extra loses the using filesort

The optimizer optimizes based on index order

Of course, the mysql optimizer will be optimized according to the index order

(mysql does not fully obey the conditional order of SQL statements, and executes the optimal order according to the index to achieve the optimal result)

Let’s look at a table that has fields A, B, C, d

And create multiple column indexes for ABCD simultaneously

CREATE TABLE `abcd` (
  `id` int(10) unsigned zerofill NOT NULL AUTO_INCREMENT,
  `a` varchar(255) NOT NULL,
  `b` varchar(255) NOT NULL,
  `c` varchar(255) NOT NULL,
  `d` varchar(255) NOT NULL.PRIMARY KEY (`id`),
  KEY `i` (`a`,`b`,`c`,`d`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8; Perform several more inserts to test the datainsert into abcd(a,b,c,d) values('a'.'b'.'c'.'d');

CREATE index i on abcd(a,b,c,d)
Copy the code

Let’s take a look at the execution of the following four SQL statements

1. Normal order A-B-C-D

a='a' and b='b' and c='c' and d='d'

EXPLAIN
select * from abcd where a='a' and b='b' and c='c' and d='d'
Copy the code

If type is ref, the index is normally used, and ref displays 4 const values, meaning that all 4 of them are indexed

D, C, B, A

d='d' and c='c' and b='b' and a='a'

EXPLAIN
select * from abcd where d='d' and c='c' and b='b' and a='a'
Copy the code

It is the same as the normal order because the optimizer adjusts the order to a- B – C – D consistent with the index

A, b, C, d, 3

If we put d in front of C and use the range index, will c’s index be invalidated?

The result: no

EXPLAIN
select * from abcd where a='a' and b='b' and d> ='d' and c='c'
Copy the code

We can see that ref has three more const, because ref shows which column of the index is used, so we can get

Where a=’a’ and b=’b’ and c=’c’ and d>=’d’

A, B, c, d, c, d, c, d

A, b, C, d, 4

If the order is still a-B-C-D, then C uses the range query, and D uses the index?

Mysql > select * from ‘c’; mysql > select ‘d’; mysql > select * from ‘c’

So there should only be 2 const ref’s left, a and b, so let’s see what happens

EXPLAIN
select * from abcd where a='a' and b='b' and c> ='c' and d='d'
Copy the code

C > ref (c, c); c > ref (c, c); d > ref (c, c)

5, a, B, D and c

Go straight to code

EXPLAIN
select * from abcd where a='a' and b='b' and d='d' ORDER BY c
Copy the code

How many indexes are used at this point? Two, three or four?

The answer is three

Strictly speaking, the c index is used to sort rather than find, so there is no count

(If c is not indexed, it should display using filesort, see Example 6)

And this const is the constant of a and b

6. A — B — C — D

EXPLAIN
select * from abcd where a='a' and b='b' ORDER BY d
Copy the code

C is broken in the middle, so of course D doesn’t go to the index, so only A and B go to the index

So using Filesort is displayed in Extra

7, A, b, C, D

Where a=’a’ ORDER BY B,c uses several indexes.

Three. A, B and C are used

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

How many indexes are used in the following SQL?

1, only A is used, c and B are not used, because they do not conform to the order, the order is strictly followed in order by

So Using Filesort

EXPLAIN
select * from abcd where a='a' ORDER BY c,b
Copy the code

3,Apply internal functions to indexes

In this case, a function-based index should be established

select * from template t where ROUND(t.logicdb_id) = 1;

The index should be ROUND(t.logicdb_id).

4,Type error

For example, if the field type is VARCHAR, the WHERE condition is number

Example: The template_id field is of vARCHAR type.

Select * from template t where t.template_id = 1

Select * from template t where t.template_id = ‘1’

It is worth noting that the invalidation condition for this type error does not mean that the index is invalidated by type

For example, the uid of t_user is of type int and the id of student is of type varchar

Examine whether the following two statements use indexes

# failure EXPLAINselect * from student where id = 1;
Copy the code

# primary key index EXPLAINselect * from t_user where uid = '1';
Copy the code

Another word:

If the vARCHAR query condition is a digit, the vARCHAR will become a string and truncate digits for query

Int implicitly converts a number to a string, so the index is used

Varchar in mysql is not used to compare equals, but instead is a string interception

Select * from ‘where id= 1’ where id= 1

It is ok to

Select * from ‘123 ‘where id=’123’;

It’s also possible

Where id = ‘xx123’ where id = ‘xx123

Where uid = ‘1’ where uid = ‘1’ where uid = ‘1’ where uid = ‘1’

5,If MySQL expects a full table scan to be faster than an index, do not use the index

6,Fuzzy query with like starts with %, index invalid

Like “%aaa%” does not use index

Like “aaa%” can use the index

7,Index columns are not restricted to NULL

Columns that contain NULL values will not be included in the index

Any column in a composite index that contains a NULL value is invalid for the composite index

Too many associated queries JOIN: Index from the table

It could be a flaw in the developer’s design when writing SQL

It could also be the result of a forced need in the business

Take left Join left join as an example

In a left join, the left table accesses all the data, so we should index the right table

Select * from a left join b on a.id = b.id; select * from a left join B on A.id = B.id; select * from a left join B on A.id = B.id

Right join The same method is used to index the condition columns of the left table

The same is true for multi-table queries

  • Minimize the total number of NestedLoop loops in Join statements: always drive large result sets with small ones
  • The inner loop of NestedLoop is optimized first
  • Ensure that Join condition fields on the driven table in the Join statement are indexed
  • If you cannot ensure that the Join condition fields of the driven table are indexed and the memory resources are sufficient, do not be too generous with the JoinBuffer Settings

So the conclusion is: build indexes from tables

Slow SQL

Here we have a rough guide, if you need to students can go to in-depth understanding

Slow query open and capture

By default, slow query logging is disabled for the MySQL database. You need to set this parameter manually.

Of course, it is generally not recommended to enable this parameter unless it is necessary for tuning, because slow query logging can have a significant or minor performance impact.

Slow query logs can be written to a file

Enable the slow query log and set the threshold. For example, the SQL that exceeds 10 seconds is slow and captured

View slow query logs

You can view whether the log file is enabled and the log file path

show VARIABLES like '%slow_query_log%';
Copy the code

Check the slow SQL threshold time. The default time is 10 seconds

Mysql determines that the threshold is triggered when the time is greater than long_query_time, not greater than or equal to it

show VARIABLES like '%long_query_time%'
Copy the code

Enable slow query configuration

# Enable slow query logsset global show_query_log = 1; # Change the file pathset global slow_query_log_file ="If-slow. log" # change the time thresholdset global long_query_time = 10
Copy the code

After this function is enabled, it takes effect only for the current database and becomes invalid after mysql is restarted

If the threshold is changed, you need to re-establish a query session to query the new threshold

If you want to make it permanent, you can add it to the my.ini configuration file

  • Enabling Slow Query
  • Specify the log file and its path
  • Slow SQL query time threshold
slow-query-log=1
slow_query_log_file="IF-slow.log"
long_query_time=10
Copy the code

Explain + slow SQL analysis

We set the threshold to 3, which is set global long_QUERy_time = 3

Then execute select sleep(4) to make the SQL sleep for 4 seconds and log it as slow SQL

Then open log to view

# Time: 211118 12:07:44
# User@Host: root[root] @  [127. 0. 01.]
# Query_time: 4003336.  Lock_time: 0000000. Rows_sent: 1  Rows_examined: 0
use test1;
SET timestamp=1637208464;
select SLEEP(4);
Copy the code

You can see a lot of data

  • The time when the record was generatedTime
  • SQL execution timeQuery_time
  • Lock timeLock_time
  • The row record returnedRows_sent
  • Number of rows scannedRows_examined
  • Table TEST1 is useduse test1;
  • SQL statement executedselect SLEEP(4);

Install perl

To execute mysqlDumpslow.pl (which is a Perl program), download the Perl compiler.

Download it at pan.baidu.com/s/1i3GLKAp

Is activePerl_5.16.2.3010812913.msi, step by step after installation, add bin environment variable path

Go to the bin directory of mysql and run the command

C:\Program Files\MySQL\MySQL Server 5.5\bin>perl mysqldumpslow.pl --help
Copy the code

If the parameters can be displayed, the installation succeeds

C:\Program Files\MySQL\MySQL Server 5.5\bin>perl mysqldumpslow.pl --help
Usage: mysqldumpslow [ OPTS... ] [ LOGS... ]

Parse and summarize the MySQL slow query log. Options are

  --verbose    verbose
  --debug      debug
  --help       write this text to standard output

  -v           verbose
  -d           debug
  -s ORDER     what to sort by (al, at, ar, c, l, r, t), 'at' is default
                al: average lock time
                ar: average rows sent
                at: average query time
                 c: count
                 l: lock time
                 r: rows sent
                 t: query time
  -r           reverse the sort order (largest last instead of first)
  -t NUM       just show the top n queries
  -a           don't abstract all numbers to N and strings to 'S' -n NUM abstract numbers with at least n digits within names -g PATTERN grep: only consider stmts that include this string -h HOSTNAME hostname of db server for *-slow.log filename (can be wildcard), default is '*', i.e. match all -i NAME name of server instance (if using mysql.server startup script) -l don't subtract lock time from total time
Copy the code

-s, sort, c,t, L,r, and AC,at,al,ar are query times respectively. Time, lock time, returns record sort. Plus a is in reverse order.

– t, top n. To keep up with the numbers is to figure out the top number of minus g. And regular expressions.

Run commands to view logs

mysqldumpslow.pl -r -s c -a -t   >C:\ProgramData\MySQL\MySQL Server 5.5\data\IF-slow.log
Copy the code

End and spend

Because this article mainly talks about index failure, so the other chapters of this article are briefly skipped, this article is over, thank you for watching!