This is the 29th day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021.”
❤️ About the author: Hello everyone, I am Xiao Xu Zhu. Java field quality creator 🏆, CSDN blog expert certification 🏆, Huawei Cloud enjoy expert certification 🏆
❤️ technology live, the appreciation
❤️ like 👍 collect ⭐ look again, form a habit
Project background
There are three mega data tables
Ex_subject_point 9,316 data points
Ex_question_junior 2,159,519 pieces of data have 45 fields
Ex_question_r_knowledge 3,156,155 pieces of data
Test database: mysql (5.7)
1. To optimize the query and avoid full table scan, first consider creating indexes on where and ORDER by columns.
Case Study:
SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior
WHERE ex_question_junior.GRADE_ID=1
Copy the code
Execution time: 17.609s (multiple executions, hovering around 17s)
Optimized: After the index is added to the GRADE_ID field
Execution time: 11.377s (multiple executions, hovering around 11s)
Note: What fields do we usually build indexes on?
This is a very complex topic that requires a thorough analysis of business and data. Both primary and foreign keys must have indexes. Other fields that need to be indexed must meet the following conditions:
A, the field appears in the query condition, and the query condition can use the index;
B. The execution frequency is high, more than thousands of times a day;
C. The set of records that can be filtered by field conditions is very small, so what is the appropriate proportion of data filtering?
This has no fixed value and needs to be evaluated according to the amount of table data. The following is an empirical formula for quick evaluation:
Small table (table with less than 10000 rows) : filter ratio <10%;
Large table :(number of records returned by filtering)<(total number of records in the table * length of single record)/10000/16
Single record Length ≈ Sum of average field length + Number of fields *2
The following is an empirical classification of whether a b-tree index is needed:
2. Avoid null values for fields in the WHERE clause, which will cause the engine to abandon the index and perform a full table scan
select id from t where num is null
It is best NOT to leave NULL to the database and populate the database with NOT NULL whenever possible.
Remarks, descriptions, comments, etc., can be set to NULL. Otherwise, it is best not to use NULL.
If a vARCHar is a variable length field, NULL takes up no space. If a vARCHar is a variable length field, NULL takes up no space. If a vARCHar is a variable length field, NULL takes up no space.
Select * from num where num is null; select * from num where num is null;
select id from t where num = 0
Case Study:
In mysql database, a full table scan will not be performed without an index.
SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior
WHERE IS_USE is NULL
Copy the code
The execution time is 11.729 seconds
SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior
WHERE IS_USE =0
Copy the code
The execution time is 12.253 seconds
The time is almost the same.
Avoid using it in where clauses! = or <> otherwise the engine will abandon the index for a full table scan.
Case Study:
Mysql > select ‘where’ from ‘where’; = or <> operators, the engine does not abandon the use of indexes.
EXPLAIN
SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior
WHERE ex_question_junior.GRADE_ID ! =15
Copy the code
The execution time is 17.579 seconds
The execution time is 16.966 seconds
4. Avoid using OR in the WHERE clause to join conditions. If a field has an index and a field has no index, the engine will abandon the index and perform a full table scan instead
Case Study:
GRADE_ID is indexed, QUESTION_TYPE is not
The execution time is 11.661s
Optimization scheme:
Use union all to separate indexed and non-indexed fields. The index field takes effect
The execution time is 11.811 seconds
However, the non-index field is still query speed will be very slow, so the query conditions, can add index index as far as possible
5. Use in and not in with caution. Otherwise, a full table scan will occur
Case analysis
Note: Using the IN and not IN operators for index fields in the WHERE clause of the mysql database, the engine does not abandon the use of indexes.
Note: Using the IN and not IN operators for non-index fields in the WHERE clause of the mysql database will result in a full table scan.
Case Study 2:
Use the difference between between and in
SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior
WHERE ex_question_junior.QUESTION_TYPE IN(1.2.3.4)
Copy the code
The execution time is 1.082 seconds
SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior
WHERE ex_question_junior.QUESTION_TYPE between 1 and 4
Copy the code
The execution time is 0.924s
The Times are about the same
Case Study 3:
Use exists and in to distinguish: conclusion
Use exists and in to distinguish: conclusion
1. In () applies to the case where table B is larger than table A (A<B)
select * from A
where id in(select id from B)
2. Exists () applies to the case where table B has A smaller data than table A (A>B)
select * from A
where exists(
select 1 from B where B.id = A.id
)
3. If the data in table A is the same size as that in table B, the efficiency of in and EXISTS is similar. You can use either of them. grammar
select * from A
where id in(select id from B)
The ex_QUESTION_R_KNOWLEDGE table has a large amount of data, while the ex_SUBJect_POINT table has a small amount of data
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Ex_question_r_knowledge (A) has A large amount of data, while ex_Subject_Point (B) has A small amount of data (A>B)
Use the exists for
SELECT *
FROM ex_question_r_knowledge
WHERE ex_question_r_knowledge.SUBJECT_POINT_ID IN
(
SELECT ex_subject_point.SUBJECT_POINT_ID
FROM ex_subject_point
WHERE ex_subject_point.SUBJECT_ID=7
)
Copy the code
SELECT *
FROM ex_question_r_knowledge
WHERE exists
(
SELECT 1
FROM ex_subject_point
WHERE ex_subject_point.SUBJECT_ID=7
AND ex_subject_point.SUBJECT_POINT_ID = ex_question_r_knowledge.SUBJECT_POINT_ID
)
Copy the code
The execution time is 13.537 seconds
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Ex_subject_point table has A small amount of data (A), ex_QUESTION_R_knowledge (B) has A large amount of data (A<B)
Use the in
SELECT *
FROM ex_subject_point
WHERE
ex_subject_point.SUBJECT_POINT_ID IN( SELECT
ex_question_r_knowledge.SUBJECT_POINT_ID FROM
ex_question_r_knowledge WHERE
ex_question_r_knowledge.GRADE_TYPE=2 )
Copy the code
SELECT * FROM ex_subject_point WHERE
ex_subject_point.SUBJECT_POINT_ID IN( SELECT
ex_question_r_knowledge.SUBJECT_POINT_ID FROM
ex_question_r_knowledge WHERE
ex_question_r_knowledge.GRADE_TYPE=2 )
Copy the code
The execution time is 1.554 seconds
SELECT *
FROM ex_subject_point
WHERE exists(
SELECT ex_question_r_knowledge.SUBJECT_POINT_ID
FROM ex_question_r_knowledge
WHERE ex_question_r_knowledge.GRADE_TYPE=2
AND ex_question_r_knowledge.SUBJECT_POINT_ID= ex_subject_point.SUBJECT_POINT_ID
)
Copy the code
The execution time is 11.978 seconds
6, like fuzzy full match will also result in full table scan
Case analysis
EXPLAIN
SELECT *
FROM ex_subject_point
WHERE ex_subject_point.path like"% % / 11 /"Copy the code
For efficiency, consider full-text retrieval. Lucene. Or another NOSQL database that can provide full-text indexing, such as TT Server or MongoDB
It’s going to be updated in a few more sections.
Last night, I had a whim, like fuzzy full match, will lead to a full table scan, then after fuzzy match and before fuzzy match will also be full table scan?
I turned on the computer today and took a test. The results are as follows:
Like fuzzy matching does not result in full table scan
Before matching like ambiguity, full table scan is performed
The principle of MY SQL is such, LIKE fuzzy full matching will lead to index failure, full table scan; Matching before LIKE fuzzy will also cause index failure and perform full table scan. But LIKE blur matches, index will have effect.