The Secret Life of SQL: How to Optimize Database Performance by Bryana Knight
Query performance is poor because too much data is accessed
- Multiple table joins return all columns
select * from sakila.actor
inner join sakila.file_actor using(actior_id)
inner join sakila.film using(film_id)
where sakila.film.title = 'AronMan'
Copy the code
The right thing to do is this
select sakila.actor.* from sakila.actor
inner join sakila.file_actor using(actior_id)
inner join sakila.film using(film_id)
where sakila.film.title = 'AronMan'
Copy the code
- Decomposition joining technique
select * from tag
join tag_post on tag_post.tag_id=tag.id
join post on tag_post.post_id=post.id
where tag.tag='mysql'
Copy the code
After breaking the connection
select * from tag where tag='mysql'
select * from tag_post where tag_id=1234
select * from post where post.id in(123456789).Copy the code
Splitting connections may seem wasteful, but there are huge advantages
- High cache efficiency
- Under MyISAM, the table is locked for a short time
- Connecting on the application side makes it easier to scale the database and put the tables on different database servers
- The query itself is more efficient
- Reduce access to redundant rows
When to use split joins?
- Can cache a large number of queries
- Multiple MyISAM tables are used
- Data is distributed on different servers
- Replace joins with in for large tables
- A join references the same table more than once
To optimize the connection
- Ensure that the on or using columns have indexes
- Ensure that group by or Order BY references only one column so that indexes can be used
Pessimistic locking
select chairid from seat where booked is null for update
update seat set booked='x' where chairid=1
commit
Copy the code
Indexing and query optimization
Excerpts from mysql Performance Optimization – Slow query analysis, optimized indexing, and configuration
Type of index
ø Normal index: This is the most basic index type. There are no restrictions such as uniqueness.
Unique index: it is basically the same as normal index, but all index column values remain unique.
ø PRIMARY KEY: A PRIMARY KEY is a unique index that must be specified as “PRIMARY KEY”.
ø Full-text indexing: MYSQL has supported full-text indexing and full-text retrieval since 3.23.23. In MYSQL, full-text indexes are of the index type FULLTEXT. Full-text indexes can be created on VARCHAR or TEXT columns.
Note the left-most prefix problem when using multi-column indexes
Sometimes MySQL does not use indexes, even when indexes are available. One scenario is when the optimizer estimates that using the index will require MySQL to access most rows in the table. In this case, the table scan might be faster. However, if such a query uses LIMIT to search only a portion of rows, MySQL uses indexes because it can find several rows more quickly and return them in results.
Reasonable suggestions for indexing:
(1) Smaller data types are usually better: Smaller data types generally require less space in disk, memory, and CPU cache and are faster to process.
(2) Simple data types are better: Integer data is less expensive to process than character data because string comparisons are more complex. In MySQL, you should use built-in date and time data types instead of strings to store time; And storing IP addresses with integer data types.
(3) Avoid NULL as much as possible: the column NOT NULL should be specified unless you want to store NULL. Columns with null values are difficult to query optimize in MySQL because they complicate indexes, index statistics, and comparison operations. You should replace null values with 0, a special value, or an empty string
This section is a trivial list of tips and points to be aware of when indexing and writing SQL statements.
-
LIMIT 1 is used when the result set has only one row of data
-
Avoid SELECT * and always specify the columns you need
The more data read from the table, the slower the query becomes. It increases the amount of time the disk takes to operate, even if the database server is separate from the WEB server. You will experience very long network delays simply because data is being transferred unnecessarily between servers.
-
Use joins instead of sub-queries. Joins are more efficient because MySQL does not need to create temporary tables in memory to perform a logical two-step query.
-
Use ENUM, CHAR instead of VARCHAR, and use reasonable field attribute lengths
-
Use NOT NULL whenever possible
-
Fixed length tables are faster
-
Split large DELETE or INSERT statements
-
The smaller the query column, the faster
Where conditions
In the query, WHERE conditions are also an important factor, as few as possible and reasonable WHERE conditions are very important, as far as possible in multiple conditions, the conditions will extract as little data as possible on the front, reduce the query time after a WHERE condition.
Some WHERE conditions cause indexes to be invalid:
Select * from ‘where’; MySQL will not be able to use indexes.
Select * from TB where left(name, 4) = ‘XXX’ select * from TB where left(name, 4) = ‘XXX’
Select * from tbl1 where name = ‘XXX %’; select * from tbl1 where name = ‘XXX %’
Skills to sort out
Avoid using it in where clauses! = or <> otherwise the engine will abandon the index for a full table scan.
2, to optimize the query, try to avoid full table scan, first should consider where and ORDER by related columns index.
3, Avoid null values in the WHERE clause. Otherwise, the engine will abandon the use of index and perform full table scan.
select id from t where num is null
Copy the code
Select * from num where num is null; select * from num where num is null;
select id from t where num=0
Copy the code
4, Avoid using OR in the WHERE clause to join conditions. Otherwise, the engine will abandon the index and perform a full table scan.
select id from t where num=10 or num=20
Copy the code
It can be queried like this:
select id from t where num=10
union all
select id from t where num=20
Copy the code
5, the following query will also cause a full table scan:
select id from t where name like '%abc'
Copy the code
For efficiency, consider full-text retrieval.
6, in and not in should also be used with caution, otherwise it will cause a full table scan, e.g.
select id from t where num in(1, 2, 3)Copy the code
For continuous numbers, use between instead of in:
select id from t where num between 1 and 3
Copy the code
7. If you use parameters in the WHERE clause, it will also cause a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the choice of an access plan until run time; It must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is unknown and therefore cannot be used as an input for index selection. The following statement will perform a full table scan:
select id from t where num=@num
Copy the code
You can force the query to use indexes instead:
Select id from t with(index)where num=@num
Copy the code
8. Avoid expression operations on fields in the WHERE clause, which will cause the engine to abandon indexes and perform full table scans. Such as:
select id from t where num/2=100
Copy the code
Should be changed to:
select id from t where num=100*2
Copy the code
9. Avoid functional manipulation of fields in the WHERE clause, which will cause the engine to abandon indexes and perform a full table scan. Such as:
select id from t whereThe substring (name, 1, 3) = "ABC"Copy the code
Should be changed to:
select id from t whereName like 'ABC %' select id from twhereCreatedate > = '2005-11-30' and createdate < '2005-12-1'Copy the code
10. Do not perform functions, arithmetic operations, or other expression operations to the left of the “=” in the WHERE clause, or the system may not use the index properly.
11. When using an index field as a condition, if the index is a compound index, the first field in the index must be used as a condition to ensure that the system can use the index. Otherwise, the index will not be used, and the order of the fields should be as consistent as possible with the order of the index.
Do not write meaningless queries, such as generating an empty table structure:
select col1,col2 into #t from t where 1=0
Copy the code
This type of code does not return any result set, but consumes system resources.
create table # t (...).
Copy the code
13. It is often a good choice to use exists instead of in:
select num from a where num in(select num from b)
Copy the code
Replace with the following statement:
select num from a where exists(select 1 from b where num=a.num)
Copy the code
14, not all indexes on the query are effective, SQL is based on the data in the table to optimize the query, when the index column has a large number of repeated data, SQL queries may not use the index, such as a table in the field sex, male, female almost half, so even if the index is built on sex also has no effect on the query efficiency.
15. More indexes is not always better. While indexes can improve the efficiency of select operations, they can also reduce the efficiency of insert and update operations. The number of indexes in a table should not exceed 6. If there are too many, you should consider whether it is necessary to build indexes on infrequently used columns.
16. Avoid updating clustered index columns as much as possible because the order of clustered index columns is the physical storage order of table records. Changes in clustered index columns will result in the adjustment of the order of table records, which will consume considerable resources. If your application system needs to update clustered index data columns frequently, you need to consider whether to build the clustered index as a clustered index.
17. Use numeric fields as far as possible. If fields containing only numeric information should not be designed as characters, which will reduce query and join performance and increase storage overhead. This is because the engine compares each character in the string one by one while processing queries and joins, whereas for numeric types it only needs to compare once.
18, use varchar/nvarchar instead of char/nchar whenever possible, because the storage space of a longer field is small, and the search efficiency of a relatively small field is obviously higher.
19, Do not use select * from t anywhere, replace “*” with a list of specific fields, and do not return any fields that are not needed.
Use table variables instead of temporary tables whenever possible. If the table variables contain a lot of data, be aware that the indexes are very limited (only primary key indexes).
Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.
Temporary tables are not unusable, and their proper use can make some routines more efficient, for example, when a large table or a data set in a commonly used table needs to be referenced repeatedly. However, for one-off events, it is best to use exported tables.
23. When creating a temporary table, if a large amount of data is inserted at one time, you can use select INTO instead of create table to avoid creating a large number of logs. If the amount of data is small, to reduce the resources of the system table, create table first, then insert.
24. If temporary tables are used, you must explicitly delete all temporary tables at the end of the stored procedure, truncate TABLE first, and then DROP TABLE. In this way, system tables cannot be locked for a long time.
25, try to avoid the use of cursors, because the efficiency of cursors is poor, if the cursor operation data more than 10,000 lines, then you should consider rewriting.
26. Before using a cursor – based approach or a temporary table approach, look for a set – based solution to the problem, which is usually more efficient.
Like temporary tables, cursors are not unusable. Using the FAST_FORWARD cursor for small data sets is generally preferable to other line-by-line processing methods, especially if you have to reference several tables to get the data you want. Routines that include “totals” in the result set are generally faster to execute than those that use cursors. If development time permits, both the vernior-based approach and the set-based approach can be tried to see which works better.
28. SET SET NOCOUNT ON at the start of all stored procedures and triggers and SET NOCOUNT OFF at the end. There is no need to send a DONE_IN_PROC message to the client after each statement of the stored procedure and trigger is executed.
29. Try to avoid returning large amounts of data to the client. If the amount of data is too large, the corresponding demand should be considered whether it is reasonable.
Try to avoid large transaction operations and improve system concurrency.