MySQL index optimization and query optimization
Refer to the link: www.bilibili.com/video/BV1iq…
1. Overview of MySQL optimization
What dimensions are available for database tuning? In short:
- Index failure, underutilization of index – index creation
- Too many associated queries
JOIN
(Design flaw or necessity) -SQL optimization - Server tuning and various parameter Settings (buffer, thread count, etc.) – adjustment
my.cnf
- Too much data — separate database and separate table
Although there are many techniques for SQL query optimization, it can be completely divided into physical query optimization and logical query optimization.
- Physical query optimization is carried out through indexes and table join methods, and it is important to master the use of indexes.
- Logical query optimization is to improve query efficiency through SQL equivalent transformation. Frankly speaking, another query writing method may be more efficient.
Steps to analyze slow SQL:
1, slow query open and capture: open slow query log, set the threshold, such as more than 5 seconds is slow SQL, and grab it out.
Explain + slow SQL analysis.
3, show Profile query SQL execution details and life cycle in MySQL database.
4. Tuning the parameters of MySQL database server.
2. Single table index analysis
2.1. Sorting optimization
Question: Index the WHERE condition field, but why index the ORDER BY field?
Optimization suggestions:
-
In SQL, indexes can be used in the WHERE and ORDER BY clauses to avoid full table scans in the WHERE clause and FileSort sorts in the ORDER BY clause. Of course, in some cases full table scans or FileSort sorts are not necessarily slower than indexes. But in general, we still want to avoid, in order to improve query efficiency.
-
Try to use Index to complete ORDER BY. Use single-index columns if WHERE and ORDER BY are followed BY the same column; If not, use a federated index.
-
If Index cannot be used, you need to tune the FileSort mode.
Data preparation
DROP TABLE IF EXISTS `article`;
CREATE TABLE IF NOT EXISTS `article`(
`id` INT(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT COMMENT 'primary key',
`author_id` INT(10) UNSIGNED NOT NULL COMMENT 'the author id',
`category_id` INT(10) UNSIGNED NOT NULL COMMENT 'category id',
`views` INT(10) UNSIGNED NOT NULL COMMENT 'Times viewed',
`comments` INT(10) UNSIGNED NOT NULL COMMENT 'Note of reply',
`title` VARCHAR(255) NOT NULL COMMENT 'title',
`content` VARCHAR(255) NOT NULL COMMENT 'Body content'
) COMMENT 'articles';
INSERT INTO `article`(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES(1.1.1.1.'1'.'1');
INSERT INTO `article`(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES(2.2.2.2.'2'.'2');
INSERT INTO `article`(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES(3.3.3.3.'3'.'3');
INSERT INTO `article`(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES(1.1.3.3.'3'.'3');
INSERT INTO `article`(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES(1.1.4.4.'4'.'4');
Copy the code
Case: query category_id = 1 and comments > 1 and select the author_id with the most views.
1. Write SQL statements and view SQL execution plans.
# 1,sqlstatementsSELECT id,author_id FROM article WHERE category_id = 1 AND comments > 1 ORDER BY views DESC LIMIT 1;
# 2,sqlExecution plan mysql> EXPLAIN SELECT id,author_id FROM article WHERE category_id = 1 AND comments > 1 ORDER BY views DESC LIMIT 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: article
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00
Extra: Using where; UsingFilesort # produces in-file sorting and needs to be optimizedSQL
1 row in set.1 warning (0.00 sec)
Copy the code
2. Conclusion: Type is all, which means the worst case. Extra also shows the worst case Using filesort.
Optimization is required, optimization: create composite index idx_article_CCV. 【 CCV indicates category_id, comments, views and category_id > category_id = ‘category_id’ and ‘category_id’ >
CREATE INDEX idx_article_ccv ON article(category_id,comments,views);
Copy the code
3. View the current index.
4. View the execution plan of the current SQL statement after the index is created.
After we found, to create a composite index idx_article_ccv, although have solved the problem of a full table scan (type = range, from top to bottom in turn by category_id, comments and views scan 】, but when not used in the order by sorting index, MySQL still uses Using filesort.
This is because, in accordance with how BTree indexing works, you sort ‘category_id’ first, and if you approach the same category_id then sort ‘Comments’, and if you approach the same comments then sort’ views’. If comments> 1 is a range value, MySQL cannot use the index to retrieve the following views, that is, the index behind the range query is invalid.
SELECT CATEGORY_ID (‘ category_id ‘, ‘category_id’, ‘category_id’, ‘category_id’, ‘category_id’) FROM article WHERE category_id = 1 AND category_id = 1 ORDER BY views DESC LIMIT 1; Take a look at the SQL execution plan.
If comments > 1 and comments = 1, order by sort views index is not useful. If comments = 1, order by sort views index is not useful. Therefore, indexes after the range are invalidated.
Select idx_article_ccv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv, idx_article_cv 【 CCV means category_id, ‘category_id’, ‘views’ 】
/* create index idx_article_cv */
CREATE INDEX idx_article_cv ON article(category_id,views);
Copy the code
View the current index
7, the current index isidx_article_cv
, take a look at the SQL execution plan.
The results are very satisfactory.
Conclusion:
-
When both indexes exist, mysql automatically selects the optimal solution. (For this example, mysql selects idx_AGe_STUno_name). However, as the amount of data changes, so does the selected index
-
If there is a choice between group by and Order by, the number of filter conditions is observed first. If the number of filter data is large enough, but the number of data to be sorted is not large enough, the index is placed on the range field first. And vice versa.
3, Index analysis of two tables (associated query)
Data preparation
DROP TABLE IF EXISTS `class`;
DROP TABLE IF EXISTS `book`;
CREATE TABLE IF NOT EXISTS `class`(
`id` INT(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT COMMENT 'primary key',
`card` INT(10) UNSIGNED NOT NULL COMMENT 'classification'
) COMMENT 'Category of Goods';
CREATE TABLE IF NOT EXISTS `book`(
`bookid` INT(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT COMMENT 'primary key',
`card` INT(10) UNSIGNED NOT NULL COMMENT 'classification'
) COMMENT 'books';
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into class(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
insert into book(card) values(floor(1+(rand()*20)));
Copy the code
The SQL execution plan for the two table join queries
1. SQL execution plan without index creation.
Mysql > select * from class where type = all; mysql > select * from class where type = all; Try something bold!
2. Left table (book
Table) create index.
Create index idx_book_card
/* Create index on book table */
CREATE INDEX idx_book_card ON book(card);
Copy the code
View SQL execution plans in the case of idx_BOOK_card index in the book table
3, delete,book
Table index, right table (class
Table) create index.
Create index idx_class_card
/* Create index */ in class table
CREATE INDEX idx_class_card ON class(card);
Copy the code
View the SQL execution plan with idx_class_card index in the class table
As can be seen, the left join is more appropriate to create the index on the right table [ref is better than index], and the right join is more appropriate to create the index on the left table.
4. Index analysis of three tables
Data preparation
DROP TABLE IF EXISTS `phone`;
CREATE TABLE IF NOT EXISTS `phone`(
`phone_id` INT(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT COMMENT 'primary key',
`card` INT(10) UNSIGNED NOT NULL COMMENT 'classification'
) COMMENT 'mobile phone';
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));
Copy the code
Three join query SQL optimization
1, without any index, view the SQL execution plan.
2. It was found that using join buffer appeared, and both tables were full table scanning. According to the experience of query optimization of two tables, left join needs to add index on right table, so try to add index on book table and phone table.
/* Create index on book table */
CREATE INDEX idx_book_card ON book(card);
/* Create index on phone table */
CREATE INDEX idx_phone_card ON phone(card);
Copy the code
Execute the SQL execution plan again
The next two rows are both of type ref and the total rows optimization is good, so indexes are best placed in fields that need to be queried frequently
5, conclusion
JOIN statement optimization:
-
Ensure that the JOIN field of the driven table has been indexed
-
The data types of the fields to be joined must be the same.
-
When LEFT JOIN, select the small table as the driver table and the large table as the driven table. Reduce the number of outer loops.
-
When an INNER JOIN is performed, MySQL automatically selects tables in small result sets as driver tables. Choose to trust the MySQL optimization strategy.
-
Can direct multiple table association as far as possible direct association, without subquery. (Reduce the number of queries)
-
It is not recommended to use subquery. It is recommended to separate the subquery SQL into multiple queries, or use JOIN instead of subquery.
-
Derived tables cannot be indexed
-
Minimize the total number of nestedloops in JOIN statements: it is always the small result set that drives the large result set.
-
The inner loop of NestedLoop is optimized first.
-
Ensure that JOIN condition fields on the driven table in the JOIN statement are indexed.
-
When it is impossible to ensure that the JOIN condition fields of the driven table are indexed and the memory resources are sufficient, do not be too stingy with the JOIN Buffer Settings.
6. Sub-query optimization
-
MySQL supports subqueries since version 4.1. Subqueries can be used to perform nested queries of SELECT statements. That is, the result of one SELECT query is the condition of another SELECT statement. Subqueries can perform many SQL operations at once that logically require multiple steps.
-
Subquery is an important function of MySQL, which can help us to implement complex queries through a SQL statement. However, the execution efficiency of subqueries is not high.
-
The reason:
-
To execute a subquery, MySQL creates a temporary table for the results of the inner query, and then the outer query queries the records from the temporary table. After the query is complete, the temporary tables are destroyed. This will consume too much CPU and I/O resources, resulting in a large number of slow queries.
-
Temporary tables stored in the result set of a subquery do not have indexes in either in-memory or disk temporary tables, so the query performance is affected.
-
For subqueries that return large result sets, the impact on query performance is greater. In MySQL, JOIN queries can be used instead of subqueries. Join queries do not require temporary tables, are faster than subqueries, and perform better if indexes are used in the query.
-
-
Conclusion: Do NOT use NOT IN or NOT EXISTS. Use LEFT JOIN XXX ON xx WHERE xx IS NULL instead
8. Override indexes first
8.1. What is an overwrite index?
An index is an efficient way to find rows, but a database can also use an index to find data for a column, so it doesn’t have to read the entire row. After all, index leaf nodes store the data they index; When the desired data can be retrieved by reading the index, there is no need to read rows. An index that contains data that satisfies the query result is called an overwrite index.
A form of non-clustered composite index that includes all columns used in the SELECT, JOIN, and WHERE clauses of the query (i.e. the fields that are indexed are the ones that are covered by the query condition).
In short, the index column + primary key containsSELECT
到 FROM
Between the columns of the query
8.2. Advantages and disadvantages of overwriting indexes
Benefits:
- Avoid secondary queries for Innodb tables with indexes (back tables)
- You can change random I/OS into sequential I/OS to improve query efficiency
Disadvantages:
The maintenance of index fields always comes at a cost. Therefore, there are trade-offs when creating redundant indexes to support overwriting indexes. This is the job of a business DBA, or business data architect.
9. Plain indexes vs. unique indexes
For performance reasons, would you prefer a unique index or a plain index? What is the basis of selection? Suppose we have a table with a primary key column ID, a field K in the table, and an index on k, assuming that none of the values on field K are duplicated. The construction sentence of this table is:
mysql> create table test(
id int primary key,
k int not null,
name varchar(16),
index (k)
)engine=InnoDB;
Copy the code
In the table (ID, k) values of R1 and R5, respectively (100, 1), (200, 2), (300, 3), (500, 5) and (600).
9.1. Query process
For example, select ID from test where k=5
- For a normal index, after finding the first record that meets the condition (5,500), the next record is searched until the first record that does not meet the condition k=5 is encountered.
- For unique indexes, since the index defines uniqueness, the search stops after the first record that meets the condition is found.
So how much of a performance difference does this make? The answer is, very little.
9.2 Update process
To illustrate the impact of normal and unique indexes on update statement performance, I introduced the Change Buffer
When a data page needs to be updated, it is updated directly if the data page is in memory. If the data page is not already in memory, InooDB caches these updates in the hange buffer without affecting the consistency of the data, so that the data page does not need to be read from the disk. The next time a query needs to access the data page, the data page is read into memory and the change Buffer operations related to the page are performed. In this way, the correctness of the data logic can be guaranteed.
The process of applying the operations in the Change Buffer to the original data page to get the latest results is called merge. In addition to accessing the data page that triggers the merge, the system has background threads that merge periodically. The merge operation is also performed during the database shutdown process.
If update operations can be recorded in the change buffer first to reduce disk reads, the execution speed of statements will be significantly improved. In addition, data reading into memory needs to occupy the buffer pool, so this method can avoid occupying memory and improve memory utilization.
The change buffer cannot be used for updates to unique indexes, and in fact only normal indexes can be used. What is InnoDB’s process for inserting a new record (4,400) into this table?
9.3 Application scenarios of change Buffer
- How to choose between normal index and unique index? In fact, there is no difference between the two types of indexes in terms of query capability, but the main consideration is the impact on update performance. Therefore, it is recommended that you choose normal indexes as much as possible.
- In actual use, it can be found that the combination of ordinary index and change buffer can significantly optimize the update of tables with large data volume.
- If all updates are immediately followed by a query for the record, you should turn off the change Buffer. In other cases, change Buffer can improve update performance.
- Since unique indexes do not use the optimization mechanism of Change Buffer, it is recommended to give priority to non-unique indexes from the perspective of performance if the business is acceptable. But what if “the business may not be assured”?
- First, business correctness comes first. We start with the premise that the business code has been guaranteed not to write duplicate data. If the business is not guaranteed, or if the business simply requires the database to do the constraint, then there is no choice but to create a unique index.
- In this case, the significance of this section is that if you encounter a large number of slow insertion of data, memory hit ratio is low, to provide you with a troubleshooting idea.
- However, in some “archive repository” scenarios, you can consider using unique indexes. For example, online data only needs to be kept for half a year, and historical data is kept in an archive. At this point, archiving the data ensures that there are no unique key conflicts. To improve archiving efficiency, consider changing the unique index in a table to a normal index.
10. Other query optimization strategies
10.1. Small tables drive large tables
Optimization principle: For MySQL databases, small tables always drive large tables. That is, small data sets drive large data sets
/** * For example, you can use nested for loops to understand that small tables drive large tables. MySQL > select * from mysql.mysql; select * from mysql.mysql; MySQL > connect to MySQL 1000 times 5 queries per session */ for(int i = 1; i <= 5; For (int j = 1; j <= 1000; J++) {/ / big data table 1000}} / / ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ the for (int I = 1; i <= 1000; i ++){ for(int j = 1; j <= 5; j++){ } }Copy the code
So if the small loop is in the outer layer, there are only 5 connections and 5000 operations for database connections, and if 1000 is outside, there are 1000 database connections, wasting resources and increasing consumption. This is why small tables drive large tables.
10.2 EXISTS and IN
When it comes to small tables driving large tables, it undeniably mentions another knowledge point, namely IN and EXISTS
The EXISTS:
- Grammar:
SELECT.... FROM tab WHERE EXISTS(subquery);
This syntax can be understood as: - This syntax can be explained as follows: put the data of the primary query into the sub-query for conditional verification.
true
orfalse
) to determine whether the data results of the primary query are preserved.
Tip:
EXISTS(subquery)
Subqueries only returntrue
orfalse
, so in the subquerySELECT *
Can beSELECT 1 OR SELECT X
They are not different.EXISTS(subquery)
The actual execution of subqueries may have been optimized rather than compared item by item in our understanding. If efficiency is a concern, actual tests can be performed to determine efficiency.EXISTS(subquery)
Subqueries can often use conditional expressions, other subqueries, orJOIN
Instead, what kind of optimal needs to be analyzed on a case-by-case basis.
Case demonstration:
1. When the data set of table B is smaller than that of table Ain
Better thanexists
.
select *from tb_emp_bigdata A where A.deptno in (select B.deptno from tb_dept_bigdata B)
Copy the code
- Table B contains TB_DEPT_bigDATA: 100 entries
- Table TB_EMP_BIGDATA: contains 5000 entries
- The query time for in is:
0.019 s
Convert the above SQL to exists:
SELECT * from tb_emp_bigdata A WHERE EXISTS(SELECT 1 from tb_dept_bigdata B WHERE B.deptno=A.deptno );
Copy the code
The query time of exists is 0.095s
By comparison, it can be seen that in is better than EXISTS when the data set of table B is smaller than that of table A. The current data set is not large, so there is not much difference in query time.
2. When the data set of table A is smaller than that of table Bexists
Better thanin
.
select *from tb_dept_bigdata A where A.deptno in(select B.deptno from tb_emp_bigdata B);
Copy the code
The query time with in is 0.02 SEC
Convert the above SQL to exists:
select *from tb_dept_bigdata A where exists(select 1 from tb_emp_bigdata B where B.deptno=A.deptno);
Copy the code
The query time in exists is 0.00 SEC
Since the amount of data is not very large, the comparison is not so hard.
3. Conclusions:
The following conclusions are specific to in or exists.
In is followed by a small table and EXISTS by a large table.
Brief note: In is small, exists is big.
For exists: select….. from table where exists(subquery); The data of the primary query is put into the sub-query to perform conditional verification. According to the verification result (true or false), the data of the primary query can be preserved.
ORDER BY optimization
Data preparation
CREATE TABLE `talA`( `age` INT, `birth` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ); INSERT INTO `talA`(`age`) VALUES(18); INSERT INTO `talA`(`age`) VALUES(19); INSERT INTO `talA`(`age`) VALUES(20); INSERT INTO `talA`(`age`) VALUES(21); INSERT INTO `talA`(`age`) VALUES(22); INSERT INTO `talA`(`age`) VALUES(23); INSERT INTO `talA`(`age`) VALUES(24); INSERT INTO `talA`(`age`) VALUES(25); /* CREATE INDEX */ CREATE INDEX idx_talA_age_birth ON 'talA' (' age ', 'birth');Copy the code
Example: SQL analysis – Check whether the statement following Order by generates filesort
Using filesort */ EXPLAIN SELECT * FROM 'talA' WHERE 'age' > 20 ORDER BY 'age'; Using filesort */ EXPLAIN SELECT * FROM 'talA' WHERE 'age' > 20 ORDER BY 'age', 'birth'; Using filesort */ EXPLAIN SELECT * FROM 'talA' WHERE 'age' > 20 ORDER BY 'birth'; Using filesort */ EXPLAIN SELECT * FROM 'talA' WHERE 'age' > 20 ORDER BY 'birth', 'age'; Using filesort */ EXPLAIN SELECT * FROM 'talA' ORDER BY 'birth'; Using filesort */ EXPLAIN SELECT * FROM 'talA' WHERE 'birth' > '2020-08-04 07:42:21' ORDER BY `birth`; Using filesort */ EXPLAIN SELECT * FROM 'talA' WHERE 'birth' > '2020-08-04 07:42:21' ORDER BY 'age'; Using filesort */ EXPLAIN SELECT * FROM 'talA' ORDER BY 'age' ASC, 'birth' DESC;Copy the code
1, ORDER BY clause, try to use index, avoid Using filesort sort.
MySQL supports two kinds of sorting methods, FileSort and Index. Index is efficient. It refers to MySQL scanning Index itself to complete sorting. The FileSort method is inefficient.
2, ORDER BY using Index:
ORDER BY
The statement uses the left-most front row of the index. 【order by index 】- use
WHERE
Clause andORDER BY
The clause condition column combination satisfies the left-most front of the index.
Conclusion: Order by should be sorted as far as possible in the index column order, following the principle of optimal left prefix for index construction. The order by clause defaults to ascending order.
4, If the order by condition is not in the index column, the Using filesort is generated
FileSort has two algorithms if it is not on the index column: MySQL has to start double-way sort and single-way sort
Double sort: before MySQL4.1, we used double sort, which literally means scanning the disk twice to get the data, reading the row pointer and ORDER BY column, sorting them, then scanning the sorted list and reading the corresponding data output from the list according to the values in the list. In short, fetch the sort fields from disk, sort them in buffer, and fetch the other fields from disk.
Fetching a batch of data requires two scans of the disk. IO is notoriously time-consuming, so after MySQL4.1, an improved algorithm was introduced, namely the single-way sorting algorithm.
2. Single-way sorting algorithm: Reads all the columns needed for the query from disk, sorts them according to the ORDER BY column in the buffer, and then scans the sorted list for output. It is faster and avoids the second reading of data. And it turns random IO into sequential IO, but it uses more space because it saves each line in memory.
Because the single-path sorting algorithm is the last, overall efficiency is better than the double-path sorting algorithm.
However, the single-way sorting algorithm has a problem: if the SortBuffer buffer is too small, so that all the columns read from disk cannot be completely stored in the SortBuffer buffer, the single-way multiplexing algorithm will have a problem and will not perform as well as the dual-way multiplexing algorithm.
Optimization strategy of single-channel multiplexing algorithm:
- increase
sort_buffer_size
Parameter Settings. - increase
max_length_for_sort_data
Parameter Settings.
Increase the speed of ordering BY:
-
SELECT * is a big no-no for ORDER BY. It is important to write the same fields as you are looking for. The impact here is:
- When the total size of the queried fields is less than
max_length_for_sort_data
And the sort field is notTEXT|BLOB
Type, single-path sort algorithm is used; otherwise, multi-path sort algorithm is used. - The data of both sorting algorithms can be exceeded
sort_buffer
The capacity of the buffer, when exceeded, is createdtmp
Temporary files merge sort, resulting in multiple IO, but the risk of single-path sort algorithm is greater, so increasesort_buffer_size
Parameter Settings.
- When the total size of the queried fields is less than
-
Try increasing sort_buffer_size:
Regardless of which algorithm is used, increasing this parameter will improve efficiency, depending on the system’s capabilities, of course, since this parameter is per-process.
-
Try increasing max_LENGTH_FOR_sort_data:
Increasing this parameter increases the probability of using a single-way sorting algorithm. However, if set too high, the probability of total data capacity sort_buffer_size increases, with obvious symptoms of high disk I/O activity and low processor utilization.
10.4, GROUP BY optimization
Group by uses indexes in almost the same way as ORDER BY. Group BY can use indexes without filtering conditions.
Select * from max_length_FOR_sorT_DATA and sort_buffer_SIZE where more efficient than HAVING Reduce the use of order by. Communicate with business by not sorting when possible, or put sorting on the application side.
The Order BY, Group BY, and DISTINCT statements consume CPU, and the CPU resources of the database are extremely valuable. The SQL statement contains the order BY, Group BY, and DISTINCT queries. Keep the result set filtered by the WHERE condition within 1000 lines; otherwise, the SQL will be slow.
GROUP BY
The essence is to sort and then group, following the index to build the best left prefix.- Used when indexed columns cannot be used
Using filesort
Sort it, increase itmax_length_for_sort_data
Parameter setting and incrementsort_buffer_size
Parameter Settings improve performance. WHERE
Execution order higher thanHAVING
To write inWHERE
Don’t put anything in the qualifierHAVING
In the now.
10.5,
Use indexes for sorting order by
- MySQL > select * from ‘MySQL’ where ‘sort’ is allowed
Using filesort
andIndex
Scan ordered index sort. - MySQL can use the same index for both sorts and queries, creating indexes that can be used for both sorts and queries.
- Mysql8 has no such requirements, even if you reverse the order, it will not produce a file ordering Using filesort
Select * from table_table_a_b_c (a, b, c); select * from table_table_c (b, c); ORDER BY a, b; ORDER BY a, b, c; ORDER BY a DESC, b DESC, c DESC; */ WHERE a = 'Ringo' ORDER BY b, c; WHERE a = 'Ringo' AND b = 'Tangs' ORDER BY c; WHERE a = 'Ringo' AND b > 2000 ORDER BY b, c; */ ORDER BY a ASC, b DESC, c DESC; */ WHERE g = const ORDER BY b, c; /* WHERE a = const ORDER BY c; /* WHERE a = const ORDER BY a, d; /* WHERE a IN (...) ORDER BY b, c; /* For sorting, multiple equality conditions (a=1 or a=2) are also range queries */Copy the code
10.6, other
COUNT(*) and COUNT(specific field) efficiency
MySQL > SELECT COUNT(*), SELECT COUNT(1), SELECT COUNT(1), SELECT COUNT(1), SELECT COUNT(1), SELECT COUNT(1)
On the SELECT (*)
In table query, you are advised to specify fields. Do not use * as the field list. SELECT < field List > is recommended. The reason:
- MySQL will query the data dictionary during the parsing process
*
Convert to all column names in sequence, which is costly in resources and time. - Unable to use overwrite index
LIMIT 1
Impact on optimization
For SQL statements that scan the entire table, if you can be certain that there is only one result set, then when LIMIT 1 is added, the scan will not continue when one result is found, which will speed up the query.
If the table already has a unique index for the field, then the query can be queried by the index. If the table does not have a full table scan, there is no need to add LIMIT 1.
Use the COMMIT more
Whenever possible, use COMMIT in your application as much as possible. This will improve your application’s performance and reduce the demand for resources released by COMMIT. COMMIT Releases resources:
- Rollback the information used to recover data on the segment
- The lock acquired by a program statement
redo / undo log buffer
In the space
Manage internal costs in the three resources listed above
Taobao database, how to design the primary key?
Talk about a practical problem: Taobao’s database, the main key is how to design?
Some of the wrong answers are still circulating on the Internet year after year, and even become the so-called MySQL catch-all. One of the most obvious errors is the design of MySQL’s primary key.
Most people responded confidently: Use 8-byte BIGINT as the main key, not INT. wrong
Such an answer is only at the database level, without thinking about primary keys from a business perspective. Is the primary key an increment ID? In the 2022 New Year, with auto-increment as the main key, you may not even get a passing grade in architectural design.
11.1. The problem of self-increasing ids
Increment ID primary key, easy to understand, almost all databases support the increment type, but their implementation is different. In addition to its simplicity, the auto-increment ID has other disadvantages. In general, it has the following problems:
- Poor reliability There is an issue with auto-increment ID backtracking, which was not fixed until the latest version of MySQL 8.0.
- Low-security interfaces can easily guess the corresponding information. For example: /User/1/ interface, it is very easy to guess the value of the User ID, the total number of users, it is also very easy to crawl data through the interface.
- Poor Performance The self-added ID has poor performance and needs to be generated on the database server.
- Interactive multi-service requires an additional function like last_insert_id() to know about the auto-increment just inserted, which requires an additional network interaction. In the massive concurrent system, one more SQL, one more time on the overhead.
- The most important aspect of local uniqueness is that the increment ID is locally unique, unique only within the current database instance, not globally unique, and unique across any server. For today’s distributed systems, this is a nightmare.
11.2. Main key of business field
To uniquely identify a member’s information, you need to set up a primary key for the member information table. So how do we set the primary key for this table to achieve our desired goal? Here we consider the business field as the primary key. Table data is as follows:
In this table, which field is appropriate?
Cardno a cardno seems appropriate because it cannot be empty and is unique and can be used to identify a membership record
mysql> CREATE TABLE demo.membermaster
-> (
-> cardno CHAR(8) PRIMARY KEY, -- Primary key of member card number
-> membername TEXT,
-> memberphone TEXT,
-> memberpid TEXT,
-> memberaddress TEXT,
-> sex TEXT,
-> birthday DATETIME
-> );
Query OK, 0 rows affected (0.06 sec)
Copy the code
Different member card numbers correspond to different members, and the field “cardno” uniquely identifies a member. If this is the case and the member card number corresponds to the participant one by one, the system can work normally.
But the reality is that there may be repeated use of member card numbers. For example, if Joe moves out of his original address because of a job change and stops spending money at the store (and returns his membership card), he will no longer be a member of the store. However, the merchant did not want to leave the card empty, so he gave wang wu the card number “10000001”.
From the point of view of system design, this change only changes the member information of the card number “10000001” in the member information table, and does not affect the data consistency. That is to say, if the member information of the member card number “10000001” is modified, all modules of the system will obtain the revised member information. There will not be “some modules get the information of the member before modification, and some modules get the information of the member after modification, resulting in data inconsistency within the system”. So, at the information system level, it’s okay.
But from the business level of using the system, there is a big problem, which affects the merchants.
For example, we have a sales statement (TRANS) that records all sales statements. On December 1, 2020, Zhang SAN bought a book at a store and spent 89 yuan. So, the system has a running record of Zhang SAN’s book purchase, as follows:
Next, let’s check the member sales record as of December 1st, 2020:
mysql> SELECT b.membername,c.goodsname,a.quantity,a.salesvalue,a.transdate
-> FROM demo.trans AS a
-> JOIN demo.membermaster AS b
-> JOIN demo.goodsmaster AS c
-> ON (a.cardno = b.cardno AND a.itemnumber=c.itemnumber);
+------------+-----------+----------+------------+---------------------+
| membername | goodsname | quantity | salesvalue | transdate |
+------------+-----------+----------+------------+---------------------+
|Zhang SAN| 书 | 1.000 | 89.00 | 2020- 12- 01 00:00:00 |
+------------+-----------+----------+------------+---------------------+
1 row in set (0.00 sec)
Copy the code
If the membership card “10000001” is sent to Wang Wu again, we will change the member information form. When the query results:
mysql> SELECT b.membername,c.goodsname,a.quantity,a.salesvalue,a.transdate
-> FROM demo.trans AS a
-> JOIN demo.membermaster AS b
-> JOIN demo.goodsmaster AS c
-> ON (a.cardno = b.cardno AND a.itemnumber=c.itemnumber);
+------------+-----------+----------+------------+---------------------+
| membername | goodsname | quantity | salesvalue | transdate |
+------------+-----------+----------+------------+---------------------+
|Cathy| 书 | 1.000 | 89.00 | 2020- 12- 01 00:00:00 |
+------------+-----------+----------+------------+---------------------+
1 row in set (0.01 sec)
Copy the code
The result is: Wang Wu bought a book on December 1, 2020, spending 89 yuan. Clearly wrong! Conclusion: never use the member card number as the main key.
Select member phone number or ID number can member phone be the main key? No good. In practice, mobile phone numbers are also recovered by operators and re-issued to others.
What about the ID number? It seems ok. Because id cards are never duplicated, there is a one-to-one correspondence between the id number and a person. But the problem is that the ID number is private and customers may not be willing to give it to you. Forcing members to register their ID numbers would drive many customers away. In fact, the customer phone also has this problem, which is why we allow the id number and phone number to be empty when designing the membership information form.
Therefore, it is recommended to avoid using business-related fields as the primary key. After all, as project design technicians, none of us can predict which business fields will be duplicated, or reused, for the sake of the project’s business requirements, throughout the project’s life cycle.
Lesson: A common mistake many people make when starting out with MySQL is to use the primary key of the business fields, assuming they know what the business needs are, but the reality is often unexpected, and the cost of changing the primary key Settings is very high.
11.3 Primary key design of Taobao
In Taobao’s e-commerce business, order service is a core business. Excuse me, how is the primary key taobao of the order form designed? Does it increment the ID? Open Taobao and look at the order information:
As you can see from the figure above, the order number is not an increment ID! Let’s take a look at the above four order numbers in detail:
1550672064762308113
1481195847180308113
1431156171142308113
1431146631521308113
Copy the code
The order number is 19 digits long, and the last 5 digits of the order are the same, which are 08113. And the first 14 digits of the order number are monotonically increasing.
Bold guess, taobao order ID design should be:
Order ID = time + deduplication field + the last 6 digits of the user IDCopy the code
Such a design can be globally unique and extremely friendly to distributed system queries.
11.4. Recommended primary key design
Non-core services: the primary key ID of the corresponding table is automatically added, such as alarms, logs, and monitoring information.
Core business: The primary key design should be at least globally unique and monotonically increasing. Globally unique is guaranteed to be unique across systems, and monotonically increasing is expected to insert without affecting database performance. The simplest primary key design is recommended here: UUID.
UUID features: Globally unique, occupying 36 bytes, data is out of order, and insert performance is poor.
Know the UUID:
Why are UUID’s globally unique? Why does a UUID take up 36 bytes? Why is the UUID out of order? The UUID of the MySQL database is as follows:
UUID = Time +UUID version (16 bytes) - Clock sequence (4 bytes) - MAC address (12 bytes)Copy the code
Take the UUID e0ea12D4-6473-11EB-943C-00155dbaa39d as an example:
Why are UUID’s globally unique?
In the UUID, the time part occupies 60 bits. It stores a TIMESTAMP similar to that of 1582-10-15 00:00:00.00, but represents the count of 100ns from 1582-10-15 to the present. You can see that the UUID store is more accurate than TIMESTAMPE, and the probability of time dimension duplication is reduced to 1/100ns. The clock sequence is designed to avoid the possibility of time repetition as the clock is rolled back. MAC addresses are globally unique.
Why does a UUID take up 36 bytes?
Uuids are stored as strings and are designed with useless “-” strings, so a total of 36 bytes is required.
Why are UUID’s randomly unordered?
Because the UUID design puts the time low first, and this part of the data is always changing, and is out of order.
Transform UUID
If the high and low parts of time are swapped, time is monotonically increasing, and it becomes monotonically increasing. MySQL 8.0 can change the time low and time high storage modes, so that UUID is ordered UUID.
MySQL 8.0 also addresses the space footprint of UUID strings by removing meaningless “-” strings from UUID strings and saving them in binary type, reducing the storage space to 16 bytes.
You can use the uuid_to_bin function provided by MySQL8.0 to do this, as well as the bin_to_uuid function provided by MySQL:
SET @uuid = UUID();
SELECT @uuid,uuid_to_bin(@uuid),uuid_to_bin(@uuid.TRUE);
Copy the code
Uuid_to_bin (@uuid,true) converts the UUID to an ordered UUID. Globally unique + monotonically increasing, this is not the primary key we want!
11.5 Ordered UUID performance test
How does a 16-byte ordered UUID compare in performance and storage space to the previous 8-byte incremented ID? Let’s do a test, insert 100 million data, each data occupy 500 bytes, contain 3 secondary indexes, the final result is as follows:
As you can see from the figure above, inserting 100 million ordered Uids is the fastest, and in actual business use ordered Uids can be generated on the business side. You can further reduce the number of SQL interactions.
In addition, although the ordered UUID is 8 bytes more than the increment ID, it actually increases the storage space by 3 gigabytes, which is acceptable.
In today’s Internet environment, database designs with auto-increment ids as primary keys are highly discouraged. A globally unique implementation like ordered UUID is more recommended.
In a real service system, the primary key can also add service and system attributes, such as the user’s tail number and equipment room information. This primary key design is even more challenging for the architect.
What if it’s not MySQL8.0?
Manually assign field master key!
For example, design the primary key of the membership table of each branch, because if the data generated by each machine needs to be merged, the problem of duplicate primary key may occur.
In the headquarters MySQL database, you can have a management information table and add a field to this table to record the maximum value of the current membership number.
When adding members, stores first obtain the maximum value from the headquarters MySQL database, add 1 to this value, and then use this value as the “ID” of new members. At the same time, update the maximum number of former members in the management information table of the headquarters MySQL database.
In this way, when each store adds members, it will operate the data table field in the same MySQL database of the headquarters, thus solving the problem of member number conflict when each store adds members
12. Index failure
Data preparation
CREATE TABLE `staffs`(
`id` INT(10) PRIMARY KEY AUTO_INCREMENT,
`name` VARCHAR(24) NOT NULL DEFAULT ' ' COMMENT 'name',
`age` INT(10) NOT NULL DEFAULT 0 COMMENT 'age',
`pos` VARCHAR(20) NOT NULL DEFAULT ' ' COMMENT 'job',
`add_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Entry Time'
)COMMENT 'Staff Record';
INSERT INTO `staffs`(`name`,`age`,`pos`) VALUES('Ringo'.18.'manager');
INSERT INTO `staffs`(`name`,`age`,`pos`) VALUES('Joe'.20.'dev');
INSERT INTO `staffs`(`name`,`age`,`pos`) VALUES('bill'.21.'dev');
/* Create index */
CREATE INDEX idx_staffs_name_age_pos ON `staffs`(`name`,`age`,`pos`);
Copy the code
1. Index failure
One of the most effective ways to improve performance in MySQL is to design proper indexes for tables. Indexes have a critical impact on query speed because they provide efficient access to data and speed up queries.
- Index can be used to quickly locate a record in a table, so as to improve the speed of database query and improve database performance.
- If the query does not use an index, the query statement scans all records in the table. In the case of large data volume, this query speed will be slow.
B+ trees are used (by default) to build indexes in most cases. However, indexes for spatial column types use R-trees, and MEMORY tables also support hash indexes.
Ultimately, the optimizer decides whether to use an index or not. What is the optimizer based on? Cost base Optimizer, which is not rule-based or semantically based. Do whatever it takes to make it cheap. In addition, whether SQL statements use indexes depends on database version, data volume and data selection.
Here are some cases of index failure:
- Full value matches my favorite.
- Optimal left prefix rule.
- Not doing anything on the index column (calculation, function, (automatic or manual) type conversion) will cause the index to fail and move to a full table scan.
- Fields to the right of the range condition in the index are invalidated.
- Try to use overwrite indexes (queries that access only the index, and the index column is the same as the query column), reduce
SELECT *
. - MySQL in the use of
! =
or<>
Unable to use the index will cause a full table scan. is null
,is not null
Indexes are also unavailable.like
Start with a wildcard%abc
Index failure becomes a full table scan (not a full table scan with an overwritten index).- The index of a string without single quotation marks is invalid.
- To use less
or
Index failure when used to join.
2, full value match my favorite
/* Create a composite index */
CREATE INDEX idx_staffs_name_age_pos ON `staffs`(`name`,`age`,`pos`);
Copy the code
0, display index
The idx_staffS_name_age_pos index name field is used
EXPLAIN SELECT * FROM `staffs` WHERE `name` = 'Ringo';
Copy the code
Idx_staffs_name_age_pos index name, age
EXPLAIN SELECT * FROM `staffs` WHERE `name` = 'Ringo' AND `age` = 18;
Copy the code
Idx_staffs_name_age_pos (name, age, pos);
EXPLAIN SELECT * FROM `staffs` WHERE `name` = 'Ringo' AND `age` = 18 AND `pos` = 'manager';
Copy the code
Conclusion: Full-time matching my favorite is that the query fields can be matched in the index in order! [Key_len finds higher and higher accuracy]
Conclusion: The order in which SQL columns are queried does not depend on the order in which they are used. The optimizer will give you automatic optimization without affecting the results of SQL execution.
3. The best left prefix rule
case
1, index not used, ALL table scan
EXPLAIN SELECT * FROM `staffs` WHERE `age` = 18 AND `pos` = 'manager';
Copy the code
2, index not used, ALL table scan
EXPLAIN SELECT * FROM `staffs` WHERE `pos` = 'manager';
Copy the code
SQL > alter table idx_staffs_name_age_pos alter table idx_staffs_name_age_pos
EXPLAIN SELECT * FROM `staffs` WHERE `name` = 'Ringo' AND `pos` = 'manager';
Copy the code
concept
Best left prefix rule: different order of query fields and index fields will cause the index to be underused or even invalid!
Reason: With composite indexes, you need to follow the best left prefix rule, that is, if you index multiple columns, follow the left-most prefix rule. Indicates that the query starts at the left-most front of the federated index and does not skip columns in the index.
Conclusion: The filtering conditions must be met in the order in which the index is created. Once a field is skipped, all the fields following the index cannot be enabled
Tip: the first brother can not die, the middle brother can not break.
Alibaba Java Development Manual index file has the left-most prefix matching feature of B-tree, if the left value is not determined, then this index cannot be used.
4, index column does not calculate
case
Select * from 'name'= 'Ringo'There are two ways to query the record below! #1, directly use fields=Value of the way to calculate mysql> SELECT * FROM `staffs` WHERE `name` = 'Ringo';
+----+-------+-----+---------+---------------------+
| id | name | age | pos | add_time |
+----+-------+-----+---------+---------------------+
| 1 | Ringo | 18 | manager | 2020- 08- 03 08:30:39 |
+----+-------+-----+---------+---------------------+
1 row in set (0.00 sec)
# 2MySQL > use the MySQL built-in function> SELECT * FROM `staffs` WHERE LEFT(`name`, 5) = 'Ringo';
+----+-------+-----+---------+---------------------+
| id | name | age | pos | add_time |
+----+-------+-----+---------+---------------------+
| 1 | Ringo | 18 | manager | 2020- 08- 03 08:30:39 |
+----+-------+-----+---------+---------------------+
1 row in set (0.00 sec)
Copy the code
We found that the above two SQL execution results are the same, but is there any difference in execution efficiency??
Analyze performance by analyzing the execution plans of the two SQL statements.
Thus, a calculation on an index column invalidates the index.
Tip: not computed on index columns.
5, after the scope of all invalid
case
/* The idx_staffs_name_age_pos index name, age, pos field is used for full value matching. * /
EXPLAIN SELECT * FROM `staffs` WHERE `name` = 'Ringo' AND `age` = 18 AND `pos` = 'manager';
/* The idx_staffs_name_age_pos index is invalid */
EXPLAIN SELECT * FROM `staffs` WHERE `name` = 'Joe' AND `age` > 18 AND `pos` = 'dev';
Copy the code
View the execution plan of the above SQL
As you can see, the query range fields use the index, but the index fields after the range (such as the WHERE condition) are invalidated.
Tip: all invalid after range.
6. Overwrite indexes as much as possible
When writing SQL, do not use SELECT *.
/* No overwrite index */ is used
EXPLAIN SELECT * FROM `staffs` WHERE `name` = 'Ringo' AND `age` = 18 AND `pos` = 'manager';
/* overwrite index */
EXPLAIN SELECT `name`, `age`, `pos` FROM `staffs` WHERE `name` = 'Ringo' AND `age` = 18 AND `pos` = 'manager';
Copy the code
Tip: the query must not be *.
7, ranging (! = or <>) sometimes fails
/* will use the overwrite index */
EXPLAIN SELECT `name`, `age`, `pos` FROM `staffs` WHERE `name` ! = 'Ringo';
/* Index failure full table scan */
EXPLAIN SELECT * FROM `staffs` WHERE `name` ! = 'Ringo';
Copy the code
8, Like 100 plus the right
/* Index failure full table scan */
EXPLAIN SELECT * FROM `staffs` WHERE `name` LIKE '%ing%';
/* Index failure full table scan */
EXPLAIN SELECT * FROM `staffs` WHERE `name` LIKE '%ing';
/* Use index range query */
EXPLAIN SELECT * FROM `staffs` WHERE `name` LIKE 'Rin%';
Copy the code
Tip: Like 100 plus the right.
If you must use %like and ensure that the index is not invalidated, use overwrite indexes to write SQL.
/* Overwrite index */ is used
EXPLAIN SELECT `id` FROM `staffs` WHERE `name` LIKE '%in%';
/* Overwrite index */ is used
EXPLAIN SELECT `name` FROM `staffs` WHERE `name` LIKE '%in%';
/* Overwrite index */ is used
EXPLAIN SELECT `age` FROM `staffs` WHERE `name` LIKE '%in%';
/* Overwrite index */ is used
EXPLAIN SELECT `pos` FROM `staffs` WHERE `name` LIKE '%in%';
/* Overwrite index */ is used
EXPLAIN SELECT `id`, `name` FROM `staffs` WHERE `name` LIKE '%in%';
/* Overwrite index */ is used
EXPLAIN SELECT `id`, `age` FROM `staffs` WHERE `name` LIKE '%in%';
/* Overwrite index */ is used
EXPLAIN SELECT `id`,`name`, `age`, `pos` FROM `staffs` WHERE `name` LIKE '%in';
/* Overwrite index */ is used
EXPLAIN SELECT `id`, `name` FROM `staffs` WHERE `pos` LIKE '%na';
/* Index failure full table scan */
EXPLAIN SELECT `name`, `age`, `pos`, `add_time` FROM `staffs` WHERE `name` LIKE '%in';
Copy the code
Tip: overwrite the index to protect both sides.
Development: Alibaba “Java Development Manual” [mandatory] page search is strictly prohibited left blur or full blur, if necessary, please go to the search engine to solve.
9, characters should be quoted (type conversion)
/* Overwrite index */ is used
EXPLAIN SELECT `id`, `name` FROM `staffs` WHERE `name` = 'Ringo';
/* Overwrite index */ is used
EXPLAIN SELECT `id`, `name` FROM `staffs` WHERE `name` = 2000;
/* Index failure full table scan */
EXPLAIN SELECT * FROM `staffs` WHERE `name` = 2000;
Copy the code
If name = 2000, a cast will occur in MySQL to convert a number to a string
Formula: use single quotation marks around characters.
10, primary key insertion order
If another record with a primary key value of 9 is inserted, it will be inserted as follows:
But this data page is already full, how to plug in again? We need to split the current page into two pages and move some records from this page to the newly created page. What does page splitting and record shifting mean? Means: Performance loss! So if we want to avoid this unnecessary performance loss, it is best to increase the primary key values of the inserted records successively so that this performance loss does not occur.
Instead of inserting the person_info table manually, use AUTO_INCREMENT as the primary key.
CREATE TABLE person_info(
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
birthday DATE NOT NULL,
phone_number CHAR(11) NOT NULL,
country varchar(100) NOT NULL.PRIMARY KEY (id),
KEY idx_name_birthday_phone_number (name(10), birthday, phone_number)
);
Copy the code
Our custom primary key column ID has the AUTO_INCREMENT attribute, and the storage engine will automatically fill in the increment primary key value for us when the record is inserted. Such primary keys take up less space and write sequentially, reducing page splitting.
There are non-index columns before OR after OR, and the index is invalid
Reduce the use of OR
Use union all or union instead:
12. Use UTF8MB4 for database and table character sets
The unified use of UTF8MB4 (supported in 5.5.3 and later) is more compatible, and the unified character set can avoid garbled characters caused by character set conversion. The need for conversion before comparing different character sets causes index invalidation.
13. Index related topics
Assuming that the index (a, b, c)
Where clause | Whether the index is in use |
---|---|
where a = 3 | Y, apply to a |
where a = 3 and b = 5 | Y applies to a and B |
where a = 3 and b = 5 | Y applies to a, B, and C |
Where b = 3 or where b = 3 and c = 4 or where c = 4 | N, the a field is not used |
where a = 3 and c = 5 | I use A, but I don’t use C, because B is broken |
where a = 3 and b > 4 and c = 5 | We use a, B, but we don’t use C, because C comes after the range |
where a = 3 and b like ‘kk%’ and c = 4 | Y, A, B, and c |
where a = 3 and b like ‘%kk’ and c = 4 | Only to a |
where a = 3 and b like ‘%kk%’ and c = 4 | Only to a |
where a = 3 and b like ‘k%kk%’ and c = 4 | Y, A, B, and c |
Analysis of interview questions
Data preparation
/* Create table */
CREATE TABLE `test03`(
`id` INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
`c1` CHAR(10),
`c2` CHAR(10),
`c3` CHAR(10),
`c4` CHAR(10),
`c5` CHAR(10));/* Insert data */
INSERT INTO `test03`(`c1`,`c2`,`c3`,`c4`,`c5`) VALUES('a1'.'a2'.'a3'.'a4'.'a5');
INSERT INTO `test03`(`c1`,`c2`,`c3`,`c4`,`c5`) VALUES('b1'.'b22'.'b3'.'b4'.'b5');
INSERT INTO `test03`(`c1`,`c2`,`c3`,`c4`,`c5`) VALUES('c1'.'c2'.'c3'.'c4'.'c5');
INSERT INTO `test03`(`c1`,`c2`,`c3`,`c4`,`c5`) VALUES('d1'.'d2'.'d3'.'d4'.'d5');
INSERT INTO `test03`(`c1`,`c2`,`c3`,`c4`,`c5`) VALUES('e1'.'e2'.'e3'.'e4'.'e5');
/* Create a composite index */
CREATE INDEX idx_test03_c1234 ON `test03`(`c1`,`c2`,`c3`,`c4`);
Copy the code
The title
/* select * from MySQL */
Select * from c1, c2, C3, c4 */
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c2` = 'a2' AND `c3` = 'a3' AND `c4` = 'a4';
Select * from c1, c2, C3, c4; select * from c1, C2, C3, c4
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c2` = 'a2' AND `c4` = 'a4' AND `c3` = 'a3';
MySQL > select * from c1, c2, C3, c4; MySQL > select * from c1, C2, C3, c4
EXPLAIN SELECT * FROM `test03` WHERE `c4` = 'a4' AND `c3` = 'a3' AND `c2` = 'a2' AND `c1` = 'a1';
/* select * from c1, c2, c3 where c1 = c1, c2, c3
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c2` = 'a2' AND `c3` > 'a3' AND `c4` = 'a4';
MySQL > select * from c1, c2, C3, c4; MySQL > select * from c1, C2, C3, c4
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c2` = 'a2' AND `c4` > 'a4' AND `c3` = 'a3';
Select * from key_len (select * from key_len (select * from key_len (select * from key_len (select * from key_len (select * from key_len (select * from key_len (select * from key_len (select * from key_len))))
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c2` = 'a2' AND `c4` = 'a4' ORDER BY `c3`;
Select * from key_len (c1, c2, c3); select * from key_len (c1, c2, c3)
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c2` = 'a2' ORDER BY `c3`;
/ * 8. Use the index c1 c2 two fields, c4 failure, c1 and c2 two fields for finding, sorting c4 field produced Using filesort sort that don't use c4 fields * /
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c2` = 'a2' ORDER BY `c4`;
* * * * * * * * * * * * * * * * * * * * * * *
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c5` = 'a5' ORDER BY `c2`, `c3`;
Using filesort */
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c5` = 'a5' ORDER BY `c3`, `c2`;
* * * * * * * * * * * * * * * * * * * * *
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c2` = 'a2' ORDER BY c2, c3;
* * * * * * * * * * * * * * * * * * * * * * *
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c2` = 'a2' AND `c5` = 'a5' ORDER BY c2, c3;
/ * 13. With the help of c1 c2 c3 three fields, c1 c2 is used to search, c2 c3 for sorting did not produce the Using filesort because of c2 before this field have identified is the 'a2', it is a constant, to ORDER BY c3, at this time of c2 c2 has no sort! So there is no use of Using filesort and (10) for comparison learning! * /
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c2` = 'a2' AND `c5` = 'a5' ORDER BY c3, c2;
/* GROUP BY (); * /
/* select * from c1, c2, c3, and c4 */
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c4` = 'a4' GROUP BY `c2`,`c3`;
Using filesort */
EXPLAIN SELECT * FROM `test03` WHERE `c1` = 'a1' AND `c4` = 'a4' GROUP BY `c3`,`c2`;
Copy the code
GROUP BY basically needs to be sorted, index optimization is almost the same as ORDER BY, but GROUP BY has temporary table generation.
15,
General recommendations for index optimization:
- For single-value indexes, try to select the current value
query
Better filtering index. - When selecting the composite index, the current
query
The field with the best filter quality is placed as early as possible in the index field order. - When selecting a composite index, try to include the current index as much as possible
query
In thewhere
Index of more fields in the clause. - By analyzing statistical information and adjusting whenever possible
query
To select the appropriate index.
Formula:
- The leader must not die.
- The middle brother cannot be broken.
- Not computed on index columns.
- It’s all gone after the range.
- Overwrite indexes as much as possible.
- Inequality sometimes fails.
- Like hundredths plus the right-hand side.
- Characters are quoted in single quotes.
- General SQL use or less.