SQL optimization most dry goods summary -MySQL

preface

BATJTMD and other large factories interview more and more difficult, but whether from large factories or to small companies, has not changed a key is the investigation of SQL optimization experience. Speaking of databases, “What do you think of SQL optimization?” .

SQL optimization has become a measure of the program ape is good or not hard indicators, and even in the major factory recruitment functions are clearly marked, if you, in this problem can be suspended or the interviewer will be suspended?

directory

SELECT statements – syntax order:

SQL Optimization Strategy

  • Avoid scenarios where indexes are not used

  • Second, SELECT statement other optimization

  • 3. Add, delete and modify DML statement optimization

  • Fourth, optimization of query conditions

  • Five, build table optimization

Have a friend to question, SQL optimization is really so important? As shown in the figure below, SQL optimization is the (lowest cost & most effective) way to improve system performance. If your team is good at SQL optimization, it will be a major step forward in the usability of your entire large system, and it will really save your boss more than a few bucks.

  • Optimization cost: Hardware > System Configuration > Database table structure >SQL and index.
  • Optimization result: hardware < system configuration < database table structure
String result = "Well, that's right.";

if ("SQL optimization experience") {
    if ("Familiar with transaction locks") {
        if ("Concurrent scenario processing 666") {
            if ("To fight honor of Kings.") {
                result += "Start tomorrow."}}}}else {
    result += "Let's go back and wait for news.";
} 

Logger.info("Interviewer:" + result );
Copy the code

Don’t look at it. It’s a send-off.

Let’s get down to business. First, I generally follow five principles for MySQL layer optimization:

  1. Reduce data access: Set proper field types, enable compression, and reduce disk I/OS through index access
  2. Return less data: return only required fields and data paging reduces disk IO and network IO
  3. Reduce the number of interactions: batch DML operations, function storage, etc. reduce the number of data connections
  4. Reduce server CPU overhead: Minimize database sorting operations and full table queries to reduce CPU memory usage
  5. Use more resources: With table partitioning, you can increase parallel operations and maximize CPU resources

Summarized in SQL optimization, there are three points:

  • Maximizing the use of indexes;
  • Avoid full table scan as much as possible;
  • Reduce queries for invalid data;

To understand the principles of SQL optimization, first understand the order of SQL execution:

SELECT statements – syntax order:

1.  SELECT 
2.  DISTINCT <select_list>
3.  FROM <left_table>
4.  <join_type> JOIN <right_table>
5.  ON <join_condition>
6.  WHERE <where_condition>
7.  GROUP BY <group_by_list>
8.  HAVING <having_condition>
9.  ORDER BY <order_by_condition>
10. LIMIT <limit_number>
Copy the code

FROM < table name >

  • Select a table and turn multiple table data into a single table by cartesian product.

ON < filter criteria >

  • Filter virtual tables of Cartesian products.

JOIN < result set >

  • Specify join to add data to the virtual table after ON. For example, left JOIN adds the remaining data from the left table to the virtual table.

WHERE < filter >

  • Filter the above virtual table.

< GROUP BY >

  • Grouping.
  • This type of aggregate function is used in the having clause for judgment.

HAVING < group filter >

  • The results after grouping are aggregated and screened.

SELECT * from list of returned data

  • The single column returned must be in the group by clause, except for aggregate functions.

SQL > select * from * ORDER BY

  • Sorting.

LIMIT < line LIMIT >

SQL Optimization Strategy

Disclaimer: The following SQL optimization policies are applicable to scenarios with a large amount of data. If the data volume is small, do not use the SQL optimization policies.

Avoid scenarios where indexes are not used

1. Avoid vague query at the beginning of a field, which may cause the database engine to abandon the index and perform full table scan. As follows:

SELECT * FROM t WHERE username LIKE '% % Chen'
Copy the code

Optimization: Use fuzzy queries after fields as much as possible. As follows:

SELECT * FROM t WHERE username LIKE Chen '%'
Copy the code

If the requirement is to use fuzzy queries up front,

  • Use the MySQL built-in function INSTR(STR,substr) to match, similar to Java indexOf(), to find the position of the corner in the string
  • Use FullText FullText index and match against
  • For a large amount of data, you are advised to use ElasticSearch and Solr. The data search speed of 100 million is in seconds
  • When tables are small (thousands of rows), use like ‘%xx%’ instead of being fancy.

2. Avoid using in and not in, which may cause the engine to scan all tables. As follows:

SELECT * FROM t WHERE id IN (2.3)
Copy the code

Optimization method: If the value is continuous, use between instead. As follows:

SELECT * FROM t WHERE id BETWEEN 2 AND 3
Copy the code

If it is a subquery, use exists instead. As follows:

-- Don't go to the index
select * from A where A.id in (select id from B);
- walk index
select * from A where exists (select * from B where B.id = A.id);
Copy the code

3. Avoid using OR as much as possible. As a result, the database engine will abandon the index and perform full table scan. As follows:

SELECT * FROM t WHERE id = 1 OR id = 3
Copy the code

Optimization method: Union can be used instead of OR. As follows:

SELECT * FROM t WHERE id = 1
   UNION
SELECT * FROM t WHERE id = 3
Copy the code

4. Avoid null values, which may cause the database engine to abandon indexes and perform full table scan. As follows:

SELECT * FROM t WHERE score IS NULL
Copy the code

Optimization method: You can add the default value 0 to the field to determine the value 0. As follows:

SELECT * FROM t WHERE score = 0` 
Copy the code

5. Avoid expressions and function operations on the left side of the where condition, which will cause the database engine to abandon the index and perform full table scan. You can move expressions and function operations to the right of the equals sign. As follows:

-- Full table scan
SELECT * FROM T WHERE score/10 = 9
- walk index
SELECT * FROM T WHERE score = 10*9`
Copy the code

6. When there is a large amount of data, avoid using the where 1=1 condition. This condition is usually used by default to make it easier to assemble a query condition, and the database engine will discard the index for a full table scan. As follows:

SELECT username, age, sex FROM T WHERE 1=1
Copy the code

Optimise: use code to assemble SQL to judge, no where condition remove WHERE, where condition add and.

7. The query condition cannot be <> or! =

When querying with index columns as criteria, avoid using <> or! = and other judgment conditions. If the unequal symbol is used, you need to re-evaluate the index to avoid creating an index on this field and replace it with another index field in the query condition.

8. The WHERE condition contains only non-leading columns in the compound index

As follows: composite index (joint) contains key_part1, key_part2, key_part3 three columns, but the SQL statement does not contain the indexes pre column “key_part1”, in accordance with the principle of joint MySQL index leftmost match, will not go joint index.

select col1 from table where key_part2=1 and key_part3=2
Copy the code

9. Implicit type conversions cause no use of indexes

In the following SQL statement, the index pair column type is VARCHAR, but the given value is a numeric value, which involves implicit type conversion. As a result, the index cannot be entered correctly.

select col1 from table where col_varchar=123;
Copy the code

10. Order by must be the same as where, otherwise order by will not be sorted by index

Age index is not selected
SELECT * FROM t order by age;

-- Go to age index
SELECT * FROM t where age > 0 order by age;
Copy the code

For the above statement, the order in which the database processes it is:

  • Step 1: Generate an execution plan based on where conditions and statistics to get the data.
  • Step 2: Sort the obtained data. When processing data (Order by) is executed, the database first looks at the execution plan of the first step to see if the fields of order BY make use of indexes in the execution plan. If so, you can use the index order to retrieve the sorted data directly. If not, the sorting operation is performed again.
  • Step 3: Return the sorted data.

When a field in order BY appears in a WHERE condition, the index is used instead of a second sort. More precisely, when a field in Order BY uses an index in the execution plan, the sort operation is not used.

This conclusion is valid not only for order BY, but also for other operations that need to be sorted. For example, group by, Union, and DISTINCT.

11. Use hint optimization statements correctly

Hints can be used in MySQL to specify that the optimizer selects or ignores specific indexes during execution. In general, it is recommended to avoid hints when a table structure index changes due to a release change, and instead collect multiple statistics through the Analyze Table. But in certain cases, specifying hints can exclude other indexes and specify a better execution plan.

  • 1 one ️USE INDEX Add USE INDEX on the end of table name in your query statement to provide a list of indexes that you want MySQL to refer to, and let MySQL stop considering other available indexes. SELECT col1 FROM table USE INDEX (mod_time, name)…
  • 2 discount ️IGNORE INDEX If you simply want MySQL to IGNORE one or more indexes, you can use IGNORE INDEX as a Hint. SELECT col1 FROM table IGNORE INDEX (priority)…
  • 3 ️FORCE INDEX To FORCE MySQL to use a specific INDEX, you can use FORCE INDEX as Hint in query. SELECT col1 FROM table FORCE INDEX (mod_time)…

In the query, the database system will automatically analyze the query statement, and select a most appropriate index. But many times, the query optimizer of a database system does not always use the optimal index. If we know how to select an INDEX, we can use FORCE INDEX to FORCE the query to use the specified INDEX.

Such as:

SELECT * FROM students FORCE INDEX (idx_class_id) WHERE class_id = 1 ORDER BY id DESC;
Copy the code

Second, SELECT statement other optimization

1. Avoid select *

First, the SELECT * operation is not a good SQL writing practice in any type of database.

Fetching all columns using SELECT * prevents the optimizer from performing optimizations such as index coverage scans, affects the optimizer’s choice of execution plan, increases network bandwidth consumption, and incurs additional I/O, memory, and CPU consumption.

Suggest the number of columns that the business actually needs, specifying column names instead of SELECT *.

2. Avoid functions with uncertain results

This applies to service scenarios such as master-slave replication. Since the slave library essentially copies statements executed by the master library, using functions with uncertain results such as now(), RAND (), sysdate(), current_user(), and so on can easily result in inconsistencies between the master and slave libraries. In addition, functions with uncertain values produce SQL statements that cannot utilize Query Cache.

3. In associated query of multiple tables, the small table comes first and the large table comes last.

In MySQL, the associated query after the execution of from is performed from left to right (the opposite is true for Oracle). The first table will involve a full table scan, so put the small table first, scan the small table first, scan the small table first, and then scan the large table. Maybe only the first 100 rows of the large table will meet the return condition and return.

For example, table 1 has 50 data items and Table 2 has 3 billion data items; If the full table scan table 2, you taste, then go to eat a meal before it is right.

4. Use the alias of the table

When joining multiple tables in an SQL statement, use the alias of the table and prefix the alias to each column name. This reduces parsing time and reduces syntax errors caused by ambiguous friend column names.

5. Replace HAVING with where

Avoid HAVING, which filters the result set only after all records have been retrieved, and where, which scrounges records before aggregation, reduces overhead if you limit the number of records by using the WHERE clause. Conditions in HAVING are generally used for filtering aggregate functions, but other than that, you should write conditions in where clauses.

The difference between WHERE and HAVING: You cannot use group functions after WHERE

6. Adjust the join order in the Where sentence

MySQL parses where clauses from left to right, top-down. According to this principle, the filtering conditions should be put forward, the fastest speed to reduce the result set.

3. Add, delete and modify DML statement optimization

1. Insert data in batches

If a large number of inserts are performed simultaneously, it is recommended to use INSERT statements with multiple values (method 2). This is faster than using separate INSERT statements (method 1), and the efficiency of bulk inserts can vary by several times in general.

Method one:

insert into T values(1.2); 

insert into T values(1.3); 

insert into T values(1.4);
Copy the code

Method 2:

Insert into T values(1.2), (1.3), (1.4);
Copy the code

There are three reasons for choosing the latter approach.

  • Reduce the operation of SQL statement parsing. MySQL does not have a share pool similar to Oracle. In method 2, data can be inserted only after parsing once.
  • You can reduce the number of DB connections in specific scenarios
  • The SQL statement is short, which reduces IO transmission over the network.

2. Use commit appropriately

Properly using a COMMIT can release resources occupied by transactions and reduce consumption. Resources that can be released after a commit are as follows:

  • Undo data blocks occupied by transactions;
  • The block of data that a transaction records in the redo log
  • Release transactions imposed to reduce lock contention impact performance. Especially when large amounts of data need to be deleted using DELETE, the deletions must be broken down and committed periodically.

3. Avoid repeated query for updated data

MySQL does not support the PostgreSQL UPDATE RETURNING syntax, which can be implemented using variables.

For example, update the timestamp of a row of records, and want to query the timestamp of the current record.

Update t1 set time=now() where col1=1; 

Select time from t1 where id =1;
Copy the code

Using a variable, you can rewrite it as follows:

Update t1 set time=now () where col1=1 and @now: = now (); 

Select @now;
Copy the code

Both require two network round-trips, but using variables avoids reaccessing the table, especially when the T1 table is large, which is much faster than the former.

4. Query or update (INSERT, update, delete) first

MySQL also allows for changing the priority of statement scheduling, which makes queries from multiple clients work better together so that a single client does not have to wait a long time due to locking. Changing priorities also ensures that specific types of queries are processed faster. We should first determine the type of application, determine whether the application is query-oriented or update-oriented, ensure query efficiency or update efficiency, and decide whether query priority or update priority.

For Innodb storage engines, the execution of statements is determined by the order in which row locks are acquired. MySQL’s default scheduling policy can be summarized as follows:

1) Write operations take precedence over read operations.

2) Write operations to a table can occur only once at a time, and write requests are processed in the order they arrive.

3) Multiple reads on a table can be performed simultaneously. MySQL provides several statement modulators that allow you to modify its scheduling policy:

  • The LOW_PRIORITY keyword applies to DELETE, INSERT, LOAD DATA, REPLACE, and UPDATE;
  • The HIGH_PRIORITY keyword applies to SELECT and INSERT statements;
  • The DELAYED keyword is applied to INSERT and REPLACE statements.

If the write operation is a LOW_PRIORITY request, it will not be considered higher priority than the read operation. In this case, if the second reader arrives while the writer is waiting, the second reader is allowed to cut in ahead of the writer. The writer is allowed to start only if there are no other readers. This scheduling modification could have the LOW_PRIORITY write permanently blocked.

The HIGH_PRIORITY keyword of the SELECT query is similar. It allows SELECT to insert before a waiting write operation, even though the write operation would normally have a higher priority. Another effect is that high-priority SELECT statements are executed before normal SELECT statements because they are blocked by write operations. If you want all statements that support the LOW_PRIORITY option to be treated with low priority by default, use the –low-priority-updates option to start the server. You can eliminate the impact of this option on a single INSERT statement by using INSERTHIGH_PRIORITY to raise the INSERT statement to its normal write priority.

Fourth, optimization of query conditions

1. For complex queries, you can use intermediate temporary tables to temporarily store data

2. Optimize the group by statement

BY default, MySQL sorts all values of the GROUP BY GROUP, for example, “GROUP BY col1, col2,…. ;” ORDER BY col1, col2… ;” If you explicitly include an ORDER BY clause that contains the same columns, MySQL can optimize it without slowing down, although sorting is still done.

Therefore, if the query includes GROUP BY but you do not want to sort the values of the GROUP, you can specify ORDER BY NULL to disallow sorting. Such as:

SELECT col1, col2, COUNT(*) FROM table GROUP BY col1, col2 ORDER BY NULL ;
Copy the code

3. Optimize the JOIN statement

MySQL can use the SELECT statement to create a single column query result through a subquery, and then use that result as a filter in another query. Using subqueries allows you to do many SQL operations at once that would logically require multiple steps, avoid transactions or table locks, and make it easy to write. However, in some cases, subqueries can be joined more efficiently. Alternative.

Example: Suppose you want to fetch all users with no order records, you can do this with the following query:

SELECT col1 FROM customerinfo WHERE CustomerID NOT in (SELECT CustomerID FROM salesinfo )
Copy the code

If you use JOIN.. To complete the query, the speed will be increased. Especially if the salesInfo table has an index on CustomerID, the performance will be better.

SELECT col1 FROM customerinfo 
   LEFT JOIN salesinfoON customerinfo.CustomerID=salesinfo.CustomerID 
      WHERE salesinfo.CustomerID IS NULL
Copy the code

Connection (JOIN).. It is more efficient because MySQL does not need to create temporary tables in memory to perform this logical two-step query.

4. Optimize union queries

MySQL performs union queries by creating and populating temporary tables. Unless you really want to eliminate duplicate lines, union all is recommended. The reason for this is that without the all keyword, MySQL will add a distinct option to the temporary table. This will result in the uniqueness check of the entire temporary table, which is quite expensive.

High efficiency:

SELECT COL1, COL2, COL3 FROM TABLE WHERE COL1 = 10 

UNION ALL 

SELECT COL1, COL2, COL3 FROM TABLE WHERE COL3= 'TEST';
Copy the code

Inefficient:

SELECT COL1, COL2, COL3 FROM TABLE WHERE COL1 = 10 

UNION 

SELECT COL1, COL2, COL3 FROM TABLE WHERE COL3= 'TEST';
Copy the code

5. Split complex SQL into multiple small SQL to avoid large transactions

  • Simple SQL can easily use MySQL’s QUERY CACHE.
  • Reduce table lock time especially for tables using MyISAM storage engine;
  • Multi-core cpus can be used.

6. Use TRUNCate instead of DELETE

When deleting records in the full table, the operation using the DELETE statement will be recorded in the Undo block, and the deletion record will also be recorded in the binlog. When confirming the deletion of the full table, a large number of binlogs will be generated and a large number of Undo data blocks will be occupied. In this case, the efficiency is not good and a large number of resources will be occupied.

If TRUNCate is used, recoverable information is not recorded and data cannot be restored. Therefore, the truncate operation has very little resource footprint and very fast time. In addition, truncATE can be used to reclaim the water level of the table, so that the self-increment field value returns to zero.

7. Use proper paging methods to increase paging efficiency

Use appropriate paging methods to improve paging efficiency For presentation and other paging requirements, appropriate paging methods can improve the efficiency of paging.

Case 1:

select * from t where thread_id = 10000 and deleted = 0 
   order by gmt_create asc limit 0.15;
Copy the code

The above example returns sorting by fetching all the fields at once based on the filter criteria. Data access cost = Index I/o + Table data I/O corresponding to index all records. Therefore, this kind of writing more and more turn to the back of the execution efficiency is worse, the longer the time, especially when the table data is very large.

Application scenario: This applies when the intermediate result set is small (less than 10000 rows) or the query conditions are complex (multiple query fields or multiple table joins are involved).

Case 2:

select t.* from (select id from t where thread_id = 10000 and deleted = 0
   order by gmt_create asc limit 0.15) a, t 
      where a.id = t.id;
Copy the code

In the preceding example, the primary key of the T table is an ID column and the overwrite index secondary key (thread_ID, deleted, gmT_create) is present. First, the primary key IDS are extracted by the override index according to the filtering conditions for sorting, and then the other fields are extracted by the join operation. Data access cost = index IO+ table data IO corresponding to index pagination results (15 rows in the example). As a result, it takes roughly the same amount of resources and time to turn each page, just as it does to turn the first.

Application scenario: When the query and sort fields (that is, the fields involved in the WHERE clause and the ORDER BY clause) have corresponding coverage indexes, and the intermediate result set is large.

Five, build table optimization

1. Create an index in the table, and prioritize the fields used by WHERE and Order by.

2. Use numeric fields (for example, gender, male: 1 female: 2). If fields containing only numeric information are not designed as characters, the query and connection performance will be reduced, and storage overhead will be increased.

This is because the engine compares each character in the string one by one while processing queries and joins, whereas for numeric types it only needs to compare once.

3. Querying a table with a large amount of data slows the query. The main reason is that too many lines are scanned. This time can be through the program, segmented paging for query, cycle through, the results of the combined processing for display. To query data from 100000 to 100050, do as follows:

SELECT * FROM (SELECT ROW_NUMBER(a)OVER(ORDER BY ID ASC) AS rowid,* 
   FROM infoTab)t WHERE t.rowid > 100000 AND t.rowid < = 100050
Copy the code

4. Replace char/nchar with varchar/nvarchar

Use vARCHar /nvarchar instead of char/nchar whenever possible, because first of all, the storage space of a longer field is small, and second of all, it is obviously more efficient to search within a relatively small field for queries.

If a vARCHar is a variable length field, NULL takes up no space. If a vARCHar is a variable length field, NULL takes up no space. If a vARCHar is a variable length field, NULL takes up no space.

Thank you

Reprinted from SQL Optimization 2020 most dry goods summary –MySQL