Index is similar to the bibliographic index of university library, which can improve the efficiency of data retrieval and reduce the IO cost of database. MySQL performance starts to decline around 3 million records, although official documentation says 500~800w records, so indexing is necessary for large data volumes. MySQL provides Explain, which displays detailed information about SQL execution and allows for index optimization.
1. Causes of slow SQL execution
1. Hardware problems. For example, the network speed is slow, the memory is insufficient, the I/O throughput is small, and the disk space is full.
2. There is no index or the index is invalid. In Internet companies, dbAs lock tables in the middle of the night and re-index them because when you delete data, the tree structure of the index is incomplete. So Internet companies do fake deletions of their data. One is to do data analysis, and two is not to break the index.)
3. Too much data (separate databases and tables)
4. Server tuning and parameter setting (adjust my.cnf)
Second, when analyzing the reasons, we must find a breakthrough point
1. Enable slow query logs and set thresholds (for example, SQL queries that exceed 3 seconds are slow). After running in the production environment for one day, check which SQL queries are slow.
2.Explain and slow SQL analysis. Examples include poorly written SQL statements, missing or invalid indexes, too many associated queries (sometimes due to design flaws or unnecessary requirements), and so on.
3.Show profiles are execution details that go a step further than Explain, showing what was done for each SQL execution and how many seconds it took.
4. Ask the DBA or O&M to tune the parameters of the MySQL server.
What is an index?
An Index is a data structure that helps MySQL obtain data efficiently. We can simply understand it as: a kind of data structure that can be quickly found and sorted. Mysql index has two main structures: B+Tree index and Hash index. Index (B+Tree) index (B+Tree) The index is shown below:
The outermost light-blue disk block 1 contains data 17 and 35 (dark blue) and Pointers P1, P2, and P3 (yellow). P1 indicates disk blocks smaller than 17, P2 indicates disk blocks between 17 and 35, and P3 indicates disk blocks larger than 35. The real data is in the cotyledons, the bottom layer 3, 5, 9, 10, 13… Non-leaf nodes do not store real data, but only the data items that guide the search direction, such as 17 and 35.
Search process: for example, to search for 28 data items, first load disk block 1 into memory, an I/O occurs, and use binary search to determine the P2 pointer. Then it is found that 28 is between 26 and 30, and disk block 3 is loaded into memory through the address of P2 pointer, and a second I/O occurs. Find disk block 8 in the same way, and a third I/O occurs.
The reality is that the B+Tree at the top 3 levels can represent millions of data, millions of data with only three I/ OS instead of millions of I/ OS, and the time improvement is huge.
4. Explain analysis
After the foreshadowing is completed, enter the practical operation part. First, insert the data required by the test:
CREATE TABLE `user_info` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL DEFAULT ”,
`age` INT(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_index` (`name`)
)ENGINE = InnoDB DEFAULT CHARSET = utf8;
INSERT INTO user_info (name, age) VALUES (‘xys’, 20);
INSERT INTO user_info (name, age) VALUES (‘a’, 21);
INSERT INTO user_info (name, age) VALUES (‘b’, 23);
INSERT INTO user_info (name, age) VALUES (‘c’, 50);
INSERT INTO user_info (name, age) VALUES (‘d’, 15);
INSERT INTO user_info (name, age) VALUES (‘e’, 20);
INSERT INTO user_info (name, age) VALUES (‘f’, 21);
INSERT INTO user_info (name, age) VALUES (‘g’, 23);
INSERT INTO user_info (name, age) VALUES (‘h’, 50);
INSERT INTO user_info (name, age) VALUES (‘i’, 15);
CREATE TABLE `order_info` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`user_id` BIGINT(20) DEFAULT NULL,
`product_name` VARCHAR(50) NOT NULL DEFAULT ”,
`productor` VARCHAR(30) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_product_detail_index` (`user_id`, `product_name`, `productor`)
)ENGINE = InnoDB DEFAULT CHARSET = utf8;
INSERT INTO order_info (user_id, product_name, productor) VALUES (1, ‘p1’, ‘WHH’);
INSERT INTO order_info (user_id, product_name, productor) VALUES (1, ‘p2’, ‘WL’);
INSERT INTO order_info (user_id, product_name, productor) VALUES (1, ‘p1’, ‘DX’);
INSERT INTO order_info (user_id, product_name, productor) VALUES (2, ‘p1’, ‘WHH’);
INSERT INTO order_info (user_id, product_name, productor) VALUES (2, ‘p5’, ‘WL’);
INSERT INTO order_info (user_id, product_name, productor) VALUES (3, ‘p3’, ‘MA’);
INSERT INTO order_info (user_id, product_name, productor) VALUES (4, ‘p1’, ‘WHH’);
INSERT INTO order_info (user_id, product_name, productor) VALUES (6, ‘p1’, ‘WHH’);
INSERT INTO order_info (user_id, product_name, productor) VALUES (9, ‘p8’, ‘TE’);
Initial experience of the effect of implementing Explain:
There are three columns possible_keys, key, and KEY_len for index usage. Next, we’ll explain them from left to right.
1.id
— Same id, execution order from top to bottom
explain select u.*,o.* from user_info u,order_info o where u.id=o.user_id;
— Different ids, the larger the value, the earlier the execution
explain select * from user_info where id=(select user_id from order_info where product_name =’p8′);
2.select_type
You can see the following types of execution instances of id:
-
SIMPLE: indicates that the query does not contain UNION query or subquery
-
PRIMARY: indicates that this query is the outermost query
-
SUBQUERY: The first SELECT in a SUBQUERY
-
UNION: Indicates that this query is the second or subsequent query to the UNION
-
DEPENDENT UNION: The second or subsequent query statement in the UNION, depending on the external query
-
The UNION RESULT
-
DEPENDENT SUBQUERY: The first SELECT in a SUBQUERY, DEPENDENT on the external query. That is, subqueries depend on the results of outer queries.
-
DERIVED: DERIVED, representing the SELECT (subquery FROM the clause) FROM the DERIVED table
3.table
Table represents the tables involved in the query or derived tables:
explain select tt.* from (select u.* from user_info u,order_info o where u.id=o.user_id and u.id=1) tt
The <derived2> with id 1 is derived from the u and O tables with ID 2.
4.type
The Type field is important because it provides an important basis for determining whether a query is efficient. Through the type field, we can determine whether the query is a full table scan or an index scan, etc.
The common values of type are:
-
System: There is only one data in the table. This type is a special const type.
-
Const: An equivalent query scan for a primary key or unique index that returns at most one row of data. A const query is very fast because it only reads once. For example, the following query uses the primary key index, so type is const: explain select * from user_info where id = 2;
-
Eq_ref: This type is typically found in multi-table JOIN queries and means that for each result of the previous table, only one row of the subsequent table can be matched. In addition, the comparison operation of query is usually =, which indicates high query efficiency. For example, explain select * from user_info, order_info where user_info.id = order_info.user_id;
-
Ref: This type typically occurs in join queries with multiple tables, for non-unique or non-primary key indexes, or for queries that use the left-most prefix rule index. For example, in this example, a query of type ref is used: explain select * from user_info, order_info where user_info.id = order_info.user_id AND order_info.user_id = 5
-
Range: indicates that the index range query is used to obtain some data records in the table by the index field range. This type usually appears IN =, < >, >, > =, <, < =, IS NULL, the < = >, BETWEEN, IN () operation. Explain select * from user_info where id between 2 and 8;
-
Index: Full index scan, similar to ALL except that ALL scans ALL tables, whereas index scans only ALL indexes without data. The index type usually occurs when the data to be queried can be retrieved directly from the index tree without scanning the data. When this is the case, the Extra field shows Using Index.
-
ALL: indicates a full table scan. This type of query is one of the least high-performance queries. In general, our queries should not have all-type queries, because such queries can be a disaster for database performance in the case of large data volumes. If a query is an All-type query, it can generally be avoided by adding indexes to the corresponding fields.
In general, the performance relationship between different types of type is as follows:
ALL < index < range ~ index_merge < ref < eq_ref < const < system
The ALL type is the slowest in the same query condition because it is a full table scan. An index query is not a full table scan, but it scans ALL indexes and is therefore slightly faster than an ALL query. The latter types all use indexes to query data, so they can filter some or most of the data, so the query efficiency is relatively high.
5.possible_keys
It represents the index that mysql may use when querying. Note that even though some indexes appear in Possible_keys, it doesn’t mean that the index will actually be used by mysql. The key field determines which indexes are used in the mysql query.
6.key
This field is the actual index used by mysql in the current query. For example,possible_keys means how many people could be there while key means how many people could actually be there. When we do not build indexes:
explain select o.* from order_info o where o.product_name= ‘p1′ and o.productor=’whh’;
create index idx_name_productor on order_info(productor);
drop index idx_name_productor on order_info;
Create compound index before query:
7.key_len
Indicates the number of bytes of the index used by the query optimizer. This field evaluates whether the composite index is fully used.
8.ref
This shows which column of the index is used and, if possible, is a constant. Ref = ref type = ref type = ref type
9.rows
“Rows” is also an important field. The mysql query optimizer estimates the number of rows that SQL needs to scan to find the result set based on the statistics. This value is a very intuitive indication of the efficiency of SQL. Compare this to the example in key, where rows is 9 without indexing and 4 after indexing.
10.extra
A lot of additional information in Explain will be displayed in the Extra field. Common ones are the following:
-
Using filesort: indicates that mysql requires additional sorting operations. Using filesort is recommended to remove this query because it consumes large CPU resources.
-
Using index: overwrites the index scan, indicating that the query can find the required data in the index tree without scanning the table data files, usually indicating good performance.
-
Using temporary: A query uses temporary tables, which are used for sorting, grouping, or join of multiple tables. Optimization is recommended because the query efficiency is not high.
-
Using WHERE: Table names are filtered using where.
5. Optimization cases
explain select u.*,o.* from user_info u LEFT JOIN order_info o on u.id=o.user_id;
Execute result, type has ALL, and no index:
Select * from ‘ALL’, ‘ref’ and ‘rows’ from’ 9 ‘to’ 1 ‘;
There is a general rule is: the left link index is added to the right table above, right link index is added to the left table above.
Do you need to create an index?
Indexes can speed up queries very efficiently, but slow down table updates. The index is actually a table that holds primary keys and index fields and points to records in the entity table, so index columns also take up space.
I am an ordinary programmer, the level is limited, the article inevitably has mistakes, welcome to sacrifice their precious time readers, on the content of this article, my purpose is just hope to help readers.