Environment set up
Build table statements
CREATE TABLE `dept` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`deptName` VARCHAR(30) DEFAULT NULL,
`address` VARCHAR(40) DEFAULT NULL,
ceo INT NULL ,
PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `emp` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`empno` INT NOT NULL ,
`name` VARCHAR(20) DEFAULT NULL,
`age` INT(3) DEFAULT NULL,
`deptId` INT(11) DEFAULT NULL.PRIMARY KEY (`id`)
#CONSTRAINT `fk_dept_id` FOREIGN KEY (`deptId`) REFERENCES `t_dept` (`id`)
Copy the code
Create a function
This function has none of DETERMINISTIC……
Since querying logs is too slow, since we have bin-log enabled, we must specify a parameter for our function.
- show variables like ‘log_bin_trust_function_creators’;
- set global log_bin_trust_function_creators=1;
Randomly generated string
DELIMITER $$
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN
DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ';
DECLARE return_str VARCHAR(255) DEFAULT ' ';
DECLARE i INT DEFAULT 0;
WHILE i < n DO
SET return_str =CONCAT(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));
SET i = i + 1;
END WHILE;
RETURN return_str;
END$$# delete #drop function rand_string;
Copy the code
Randomly generate department numbers
# DELIMITER $$is used to randomly generate numbers from number to numberCREATE FUNCTION rand_num (from_num INT ,to_num INT) RETURNS INT(11)
BEGIN
DECLARE i INT DEFAULT 0;
SET i = FLOOR(from_num +RAND()*(to_num -from_num+1));
RETURN i;
END$$# delete #drop function rand_num;
SELECT rand_num(50.100);
Copy the code
Create the calling stored procedure
Insert data into the EMP table
DELIMITER $$
CREATE PROCEDURE insert_emp(START INT, max_num INT)
BEGIN
DECLARE i INT DEFAULT 0;
#set autocommit =0Set autoCommit to0
SET autocommit = 0;
REPEAT
SET i = i + 1;
INSERT INTO emp (empno, NAME ,age ,deptid ) VALUES ((START+i) ,rand_string(6), rand_num(30.50),rand_num(1.10000));
UNTIL i = max_num
END REPEAT;
COMMIT;
END$$# Delete # DELIMITER; #drop PROCEDUREinsert_emp; Add to emP table50DELIMITER;CALL insert_emp(100000.500000);
Copy the code
Insert data into the DEPT table
Execute the stored procedure and add the random data DELIMITER $$to the DEPT tableCREATE PROCEDURE `insert_dept`(max_num INT)
BEGIN
DECLARE i INT DEFAULT 0;
SET autocommit = 0;
REPEAT
SET i = i + 1;
INSERT INTO dept ( deptname,address,ceo ) VALUES (rand_string(8),rand_string(10),rand_num(1.500000));
UNTIL i = max_num
END REPEAT;
COMMIT;
END$$# Delete # DELIMITER; #drop PROCEDUREinsert_dept; Execute the stored procedure to add the dept table1DELIMITER;CALL insert_dept(10000);
Copy the code
Remove the index
Deletes all indexes of a table in the specified database except the primary key index.
DELIMITER $$
CREATE PROCEDURE `proc_drop_index`(dbname VARCHAR(200),tablename VARCHAR(200))
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE ct INT DEFAULT 0;
DECLARE _index VARCHAR(200) DEFAULT ' ';
DECLARE _cur CURSOR FOR SELECT index_name FROM information_schema.STATISTICS WHERE table_schema=dbname AND table_name=tablename AND seq_in_index=1 AND index_name<>'PRIMARY' ;
DECLARE CONTINUE HANDLER FOR NOT FOUND set done=2;
OPEN _cur;
FETCH _cur INTO _index;
WHILE _index<>' ' DO
SET @str = CONCAT("drop index ",_index," on ",tablename );
PREPARE sql_str FROM @str ;
EXECUTE sql_str;
DEALLOCATE PREPARE sql_str;
SET _index=' ';
FETCH _cur INTO _index;
END WHILE;
CLOSE _cur;
END$$#CALL proc_drop_index("dbname","tablename");
Copy the code
Single table index use and common index failure
case
All values match
CREATE INDEX idx_age ON EMP (age);
It has an optimization effect. Delete related indexes. CALL proc_drop_index(“mydb”,”emp”);
CREATE INDEX idx_AGe_deptid ON EMp (age,deptid);
Query time significantly reduced to 0.001!! Delete related indexes. CALL proc_drop_index(“mydb”,”emp”);
CREATE INDEX idx_AGe_depTID_name ON EMp (age,deptid,name);
EXPLAIN SELECT SQL_NO_CACHE * FROM emp WHERE emp.age = 30;
CREATE INDEX idx_age ON emp(age);
EXPLAIN SELECT SQL_NO_CACHE * FROM emp WHERE emp.age = 30 AND deptid = 4;
CREATE INDEX idx_age_deptid ON emp(age,deptid);
EXPLAIN SELECT SQL_NO_CACHE * FROM emp WHERE emp.age = 30 AND deptid = 4 AND emp.`name` = 'abcd';
CREATE INDEX idx_age_deptid_name ON emp(age,deptid,name);
Copy the code
Optimal left prefix rule
Transpose the order of the three conditions and the compound index still works! This is because the mysql optimizer automatically optimizes and reorders our SQL.
Put the age condition first followed by the name condition
Key_len =5 (age); Change again
Compound indexes all invalid!!
Why is that? That’s because of the best left prefix rule
In the first case, the mysql optimizer optimizes all three cases so the condition is still age,deptid,name
The second condition is age,name is missing deptid, so name cannot be found. So only age is in effect
The third condition is deptid, where name cannot be searched at all because age is missing. All the failure
If you index multiple columns, follow the leftmost prefix rule. Indicates that the query starts at the left-most front of the index and does not skip columns in the index.
## Best left prefix ruleCREATE INDEX idx_age_deptid_name ON emp(age,deptid,name);
EXPLAIN SELECT SQL_NO_CACHE * FROM emp WHERE deptid = 4 AND emp.age = 30 AND emp.`name` = 'abcd';
EXPLAIN SELECT SQL_NO_CACHE * FROM emp WHERE emp.age = 30 AND emp.`name` = 'abcd';
EXPLAIN SELECT SQL_NO_CACHE * FROM emp WHERE deptid = 4 AND emp.`name` = 'abcd';
Copy the code
Using a functional index is invalid
Both SQL statements have the same effect. We add the index. CREATE INDEX idx_name ON emp(name);
Now the first SQL statement, index optimization is in effect
The second SQL statement is still not optimized, why?
This is due to the use of a function that causes index optimization to fail!
Not doing anything on the index column (calculation, function, (automatic or manual) type conversion) will cause the index to fail and turn to a full table scan!
The storage engine cannot use the column to the right of the range condition in the index
CREATE INDEX idx_age_deptId_name ON emp(age,deptId,name);
Although the query time has been greatly reduced, it is still not as fast as the previous use of composite indexes! Why is this?
Key_len =10 hits age,depId. Name is invalid!
This is why the storage engine cannot use the column to the right of the range condition in the index(age,deptId,name)
.emp.deptId > 20
Lead toname
Failure.
So we place the columns that use the scope criteria at the end of the index creation. CREATE INDEX idx_age_name_deptId ON emp(age,name,deptId);
At this point, it’s all used.
The storage engine cannot use the column EXPLAIN to the right of the range criteria in the indexSELECT SQL_NO_CACHE * FROM emp WHERE emp.age = 30 AND emp.deptId > 20 AND emp.name = 'abc';
CREATE INDEX idx_age_deptId_name ON emp(age,deptId,name);
CREATE INDEX idx_age_name_deptId ON emp(age,name,deptId);
Copy the code
Mysql in use does not equal (! = or <>) will cause a full table scan
Add an INDEX CREATE INDEX idx_name ON emp(name);
You can see that type is still all, which does not achieve optimization effect.
Mysql in use does not equal (! = or <>) will cause a full table scan
## mysql is not equal to (! =or<>) will result in a full table scan of EXPLAINSELECT SQL_NO_CACHE * FROM emp WHERE emp.`name` <> 'abc';
CREATE INDEX idx_name ON emp(name);
Copy the code
is null
You can use indexes,is not null
Unable to use index.
CREATE INDEX idx_age ON EMP (age);
is null
You can use indexes,is not null
Unable to use index.
# # `is null'can use indexes,'is not null'Index cannot be used. EXPLAINSELECT SQL_NO_CACHE * FROM emp WHERE age IS NULL;
EXPLAIN SELECT SQL_NO_CACHE * FROM emp WHERE age IS NOT NULL;
CREATE INDEX idx_age ON emp(age);
Copy the code
Like begins with a wildcard (‘% ABC… Mysql index failure becomes a full table scan operation
Add an INDEX CREATE INDEX idx_name ON emp(name);
You can see that there is still no optimization effect! This is because the query conditions are uncertain and must be all searched, so it leads to failure.
Like begins with a wildcard (‘% ABC… Mysql index failure becomes a full table scan operation
# #likeStart with a wildcard ('%abc... 'Mysql index failure will become a full table scan operation EXPLAINSELECT SQL_NO_CACHE * FROM emp WHERE emp.`name` LIKE '%abc%';
CREATE INDEX idx_name ON emp(name);
Copy the code
The index of a string without single quotation marks is invalid
Add an INDEX CREATE INDEX idx_name ON emp(name);
It still hasn’t worked becauseemp.
name= 123;
Name is the vARCHAR type and invalid due to automatic type conversion. So we need to keep JavaBean objects consistent with the database to prevent automatic type conversion.
## string without single quotes index invalidation EXPLAINSELECT SQL_NO_CACHE * FROM emp WHERE emp.`name`=123;
CREATE INDEX idx_name ON emp(name);
Copy the code
conclusion
- Not doing anything on the index column (calculation, function, (automatic or manual) type conversion) will cause the index to fail and move to a full table scan.
- The storage engine cannot use the column to the right of the range condition in the index
- Mysql in use does not equal (! = or <>) will cause a full table scan
- Is not NULL also cannot use an index, but is NULL can use an index
- Like begins with a wildcard (‘% ABC… Mysql index failure becomes a full table scan operation
- String without single quotes index failure (automatic type conversion occurs)
Assuming that the index (a, b, c)
Where clause | Whether the index is in use |
---|---|
where a = 3 | Y, apply to a |
where a = 3 and b = 5 | Y applies to a and B |
where a = 3 and b = 5 and c = 4 | Y applies to a, B, and C |
Where b = 3 or where b = 3 and c = 4 or where c = 4 | N |
where a = 3 and c = 5 | A is used, but C is not, and B is interrupted |
where a = 3 and b > 4 and c = 5 | Using a and B, c cannot be used in the range after b breaks |
where a is null and b is not null | Is NULL supports indexes but IS not NULL does not support indexes. Therefore, A can use indexes, but B cannot |
where a <> 3 | Cannot use index |
where abs(a) =3 | Cannot use index |
where a = 3 and b like ‘kk%’ and c = 4 | Y applies to a, B, and C |
where a = 3 and b like ‘%kk’ and c = 4 | Y, I only use a |
where a = 3 and b like ‘%kk%’ and c = 4 | Y, I only use a |
where a = 3 and b like ‘k%kk%’ and c = 4 | Y applies to a, B, and C |
Suggestions (Query filters are unique properties like ID numbers that quickly filter out other properties)
- For single-key indexes, try to select indexes with better filtering for the current Query
- When selecting a composite index, the filtering field in the current Query is higher in the index field order, the better.
- When selecting a composite index, try to select an index that contains as many fields as possible from the WHERE clause in the current query
- When selecting composite indexes, try to place a field at the bottom of the index order if it is likely to show up in a range query
- Avoid index invalidation when writing SQL statements
Associated query
CREATE TABLE IF NOT EXISTS `class` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL.PRIMARY KEY (`id`)
);
CREATE TABLE IF NOT EXISTS `book` (
`bookid` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL.PRIMARY KEY (`bookid`)
);
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
Copy the code
ALTER TABLE bookADD INDEX Y (card);
ALTER TABLE classADD INDEX Y (card);
Although the index creation was successful, it still did not achieve the optimization effect of class and pair.
That’s because of theta at this pointclass
A table is a driver table,book
A table is a driven table, and creating an index for a driven table cannot avoid a full table scan. So we can only build index optimization for the driven table.
Instead, Join the two table queries using an inner Join.
The ids are the same and are executed from top to bottom, so the book table becomes the driver table and the class table becomes the driven table. Class index optimization takes effect.
- Ensure that the join field of the driven table is indexed
- When left JOIN, select the small table as the driver table and the large table as the driven table.
- When an inner join is performed, mysql itself selects the tables of small result sets as driver tables for you.
- The virtual table cannot be indexed
- Try not to put subqueries in the driven table, it may not use the index.
- Can direct multiple table association as far as possible direct association, without subquery.
Subquery optimization
SELECT * FROM t_emp a WHERE a.id NOT IN
(SELECT b.ceo FROM t_dept b where b.ceo IS NO NULL);
SELECT * FROM t_emp a LEFT JOIN t_dept b ON a.id = b.ceo
WHERE b.id IS NULL;
Copy the code
Do not use not in or not exists instead of left outer join on XXX is null
Sorting and grouping optimization
order by
No filtering no index
CREATE INDEX idx_age_deptId_name on emp(age,deptId,name);
Still can’t optimize, at this point we add limit after the filter
Found that the optimization effect has come out.
So when we use the order BY statement, we filter it.
# unfilter unindex EXPLAINSELECT SQL_NO_CACHE * FROM emp ORDER BY age,deptId;
EXPLAIN SELECT SQL_NO_CACHE * FROM emp ORDER BY age,deptId LIMIT 10;
CREATE INDEX idx_age_deptId_name on emp(age,deptId,`name`);
Copy the code
CREATE INDEX idx_age_deptId_name on emp(age,deptId,`name`);
EXPLAIN SELECT * FROM emp WHERE age = 45 ORDER BYdeptId; ## use index EXPLAINSELECT * FROM emp WHERE age = 45 ORDER BYdeptId,`name`; ## use index EXPLAINSELECT * FROM emp WHERE age = 45 ORDER BYdeptId,empno; ## empno has no indexUsing filesort
EXPLAIN SELECT * FROM emp WHERE age = 45 ORDER BY`name`,deptId; Index order wrongUsing filesort
EXPLAIN SELECT * FROM emp WHERE deptId = 45 ORDER BYage; Index order wrongUsing filesort
Copy the code
Where conditions for filtering, index order must be guaranteed, can not be wrong, can not break.
The sorting order should be consistent, not one up and one down.
Index selection
CREATE INDEX idx_age_empno_name on emp(age,empno,name); Creating this index invalidates the name because empno uses a range query
CREATE INDEX idx_AGe_name on EMP (age,name);
Delete the index from Using filesort. Run check mysql to select which index to use
Mysql > select idx_age_empno_name from idx_age_empno_name, Using filesort, but rows are much less. In order to query more efficient!
Because the empNO < 101000 condition uses an index, the number of scans is significantly reduced. Mysql selects this index!!
Conclusion: When there is a choice between a range condition and a group by or order by field, the filter quantity of the condition field is observed first. If there is enough data to be filtered but not enough data to be sorted, the index is placed on the range field first. And vice versa. Also trust mysql’s ability to select indexes!
Using filesort
When there is no way to avoid Using filesort, filesort has two algorithms: mysql has to enable double sort and single sort.
Double-sort: MySQL 4.1 used double-sort, which literally means scanning the disk twice to get the data, reading the row pointer and Order Derby columns, sorting them, then scanning the sorted list and re-reading the corresponding data output from the list based on the values in the list. Fetch the sort fields from disk, sort them in buffer, and fetch the other fields from disk.
Fetching a batch of data requires two scans of the disk. I\O is notoriously time-consuming, so after mysql4.1, a second improved algorithm was introduced, namely single-way sort.
Single-way sort: Reads all the columns needed for the query from disk, sorts them in buffer by the ORDER BY column, and then scans the sorted list for output. This is faster and avoids a second read. And it turns random IO into sequential IO, but it uses more space because it saves each line in memory.
As a result of single road is after, overall better than double road. But there’s a problem with single-path
In sort_buffer, method B takes up A lot more space than method A. Because method B takes out all fields, it is possible that the total size of the retrieved data exceeds the capacity of sort_buffer. As A result, only sort_buffer size can be retrieved each time for sorting (create TMP file, Sort_buffer = sort_buffer = sort_buffer = sort_buffer So multiple I/ OS.
An attempt to save one I/O operation resulted in a large number of I/O operations, which was more than worth the loss.
Optimization strategy
- Increases the sort_buffer_size parameter setting
- Increases the setting of the max_LENGTH_FOR_sorT_DATA parameter
- Reduce the field of the query after select.
Increased Order By speed
-
It is important to note that select * is a field that is not required only for Query when ordering by. The impact here is:
-
When total field size is smaller than max_length_for_sort_data Query and sort field is not a TEXT | BLOB, will use the improved algorithm, single channel sorting, or with the old algorithm, multiple sort.
-
Both algorithms may exceed the sort_buffer capacity. After that, TMP files will be created for merge sorting, resulting in multiple I/ OS. However, using single-path sorting algorithm is more risky, so sort_buffer_size should be increased.
-
-
Try increasing sort_BUFFer_size Regardless of which algorithm you use, increasing this parameter will improve efficiency, of course, depending on the capabilities of the system, since this parameter is adjusted between 1M and 8M per process
-
Trying to increase max_LENGTH_FOR_sort_DATA Increasing this parameter increases the probability of using the improved algorithm. However, if set too high, the probability of total data capacity exceeding sort_BUFFer_size increases, with obvious symptoms of high disk I/O activity and low processor utilization. 1024 to 8192
group by
Group by uses indexes in almost the same way as order BY, except that groupBY can use indexes without filtering conditions.
Cover index
What is an overwrite index? Select * from (select * from); select * from (select * from)
explain select * from emp where name like '%abc';
After using the overwrite index
That is, don’t use select *