Backend Programmer Must-have: 30 Tips for Writing high-quality SQL

preface

This article will combine with the example demo, expound 30 suggestions about optimizing SQL, most of which are summarized in the actual development, hope to be helpful to everyone.

Github.com/whx123/Java…

Select * from ‘select ’; select from ‘select’;

An example:

select * from employee;
Copy the code

Examples are:

Select id, name from employee;Copy the code

Reason:

Select only the required fields to save resources and network overhead.
When a select * query is performed, it is likely that the overwrite index will not be used, resulting in a backtable query.

2. If you know that the query result has only one record or only a maximum/minimum record, you are advised to use limit 1

Suppose you have the Employee table, and you want to find a person named Jay.

CREATE TABLE `employee` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `date` datetime DEFAULT NULL,
  `sex` int(1) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Copy the code

Example:

Select id, name from employeewhere name='jay'
Copy the code

Is case

Select id, name from employeewhere name='jay' limit 1;
Copy the code

Reason:

Add limit 1, as long as the corresponding record is found, no further scanning, efficiency will be greatly improved.
Of course, if name is a unique index, there is no need to add limit 1, because the limit exists mainly to prevent a full table scan and thus improve performance. If a statement itself can predict that a full table scan is not required, there is no difference in performance.

3. Try to avoid using OR to join conditions in where clauses

Select * from user where userId = userId; select * from user where userId = userId;

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `userId` int(11) NOT NULL,
  `age` int(11) NOT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_userId` (`userId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Copy the code

Suppose you now need to query a user whose userID is 1 or age is 18, you can easily have the following SQL

Example:

select * from user where userid=1 or age =18
Copy the code

Is:

Union all select * from userwhere userid=1 
union all 
select * from user whereSelect * from user where age = 18where userid=1
select * from user where age = 18
Copy the code

Reason:

Using or may invalidate the index, leading to a full table scan.

In the case of or+ age with no index, suppose it goes through the index of userId, but when it goes to the age query condition, it also has to do a full table scan, which is a three-step process: full table scan + index scan + merge. If it started with the full table scan, it will just do one scan. Mysql has an optimizer, and it seems reasonable that the index might fail in the case of an OR condition for efficiency and cost.

4. Optimize limit paging

When we do daily paging requirements, we usually use limit, but when the offset is very large, query efficiency becomes low.

Example:

Select id, name, age from employeelimit10000,Copy the code

Is:

Select id, name from employee; // Select id, name from employeewhere id>10000 limitSelect id, name from employee order by IDlimit// Solution 3: Limit the number of pages as permitted by the service:Copy the code

Reason:

When the offset is maximum, the query is less efficient, because Mysql does not skip the offset directly. Instead, Mysql returns the offset + the number of items to be fetched, and then discard the previous offset.
If you use optimization solution 1 and return the last maximum query record (offset), you can skip the offset and improve efficiency.
The second scheme uses the order by+ index, which can also improve the query efficiency.
Plan three, it is recommended to discuss with the business, there is no need to look up so after the page. Because most users don’t turn too many pages.

5. Optimize your like statements

In everyday development, it’s easy to think of like when you’re using fuzzy keyword queries, but like is likely to invalidate your index.

Example:

Select userId, name from userwhere userId like '% 123';
Copy the code

Is:

Select userId, name from userwhere userId like '123%';
Copy the code

Reason:

Put % in front, without indexing, as follows:
If you put % after the keyword, it will still be indexed. As follows:

6. Use the WHERE condition to limit the data to be queried to avoid returning redundant rows

Suppose the business scenario is to query whether a user is a member. Have seen the old implementation code like this…

Example:

List<Long> userIds = sqlMap.queryList("select userId from user where isVip=1");
boolean isVip = userIds.contains(userId);
Copy the code

Is:

Long userId = sqlMap.queryObject("select userId from user where userId='userId' and isVip='1' ") Boolean isVip = userId! =null;Copy the code

Reason:

Need what data, to look up what data, avoid returning unnecessary data, save overhead.

7. Avoid using mysql’s built-in functions on index columns

Business requirements: Query users who logged in in the last seven days (assuming loginTime is indexed)

Example:

select userId,loginTime from loginuser where Date_ADD(loginTime,Interval 7 DAY) >=now();
Copy the code

Is:

explain  select userId,loginTime from loginuser where  loginTime >= Date_ADD(NOW(),INTERVAL - 7 DAY);
Copy the code

Reason:

Mysql built-in function is used on index column, index is invalid

If the index column does not have built-in functions, the index will still go.

8. Try to avoid expression operations on fields in the WHERE clause. This will cause the system to abandon the use of indexes and perform a full table scan

Example:

select * from user whereThe age - 1 = 10;Copy the code

Is:

select * from user whereAge = 11;Copy the code

Reason:

Although age is indexed, the index is lost because of the operation on it…

9, Inner Join, left join, right join, Inner join preferentially, if left join, the left table result as small as possible

Inner join: When a join query is performed on two tables, only the perfectly matched result sets in the two tables are retained

Left Join returns all rows in the left table, even if there are no matching records in the right table.

Right join returns all rows in the right table, even if there are no matching records in the left table.

On the premise that both meet THE REQUIREMENTS of SQL, Inner Join is recommended to be used first. If left join is used, the data results of the left table should be as small as possible, and if there is a condition, it should be placed on the left.

Example:

select * from tab1 t1 left join tab2 t2  on t1.size = t2.size where t1.id>2;
Copy the code

Is:

select * from (select * from tab1 where id >2) t1 left join tab2 t2 on t1.size = t2.size;
Copy the code

Reason:

If the inner Join is equivalent, perhaps fewer rows will be returned, so the performance will be relatively better.
Similarly, the left join is used, and the results of the left table data are as small as possible, and the conditions are processed on the left, which means that the number of rows returned is likely to be small.

10, try to avoid using in the WHERE clause! = or <> operators, otherwise the engine will abandon the index and proceed to a full table scan.

Example:

select age,name  from user where age <>18;
Copy the code

Is:

Select age,name from userwhere age <18;
select age,name  from user where age >18;
Copy the code

Reason:

Use! = and <> are likely to invalidate indexes

11. When using a federated index, pay attention to the order of index columns. Generally, the left-most matching principle is followed.

Table structure :(idx_userid_age, userId first, age second)

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `userId` int(11) NOT NULL,
  `age` int(11) DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_userid_age` (`userId`,`age`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

Copy the code

Example:

select * from user where age = 10;
Copy the code

Is:

Select * from user select * from userwhereUserid =10 and age =10; Select * from user select * from userwhere userid =10;
Copy the code

When we create a joint index, such as (k1,k2,k3), we create indexes (k1), (k1,k2), and (k1,k2,k3). This is the leftmost matching principle.
A federated index does not satisfy the leftmost rule and will generally fail, but this is a matter of concern to the Mysql optimizer.

12. To optimize the query, consider creating indexes on the columns involved in where and Order BY, and try to avoid full table scans.

Example:

select * from user where address ='shenzhen' order by age ;
Copy the code

Alter table user add index idx_address_age (address,age)Copy the code

13. If there is too much data inserted, consider batch inserts.

Example:

for(User u :list){
 INSERT into user(name,age) values(#name#,#age#)   
}
Copy the code

Is:

Insert into user(name,age) values <foreach collection= insert into user(name,age) values <foreach collection="list" item="item" index="index" separator=",">
    (#{item.name},#{item.age})
</foreach>
Copy the code

Reason:

Batch insert performance is good, more time saving

For example, if you need to move 10,000 bricks to the top of the building, and you have an elevator, the elevator can put the right amount of bricks at a time (up to 500 bricks), you can choose to transport one brick at a time, or 500 bricks at a time, which time do you think is the most expensive?

14. Use overwrite indexes when appropriate.

Overwriting indexes allows your SQL statements to get all the data they need without going back to the table.

Example:

Select * from user where id = 1 select * from user where id = 1where userid like '% 123%'
Copy the code

// Id is the primary key, so it is a normal index, that is, overwrite index. select id,name from userwhere userid like '% 123%';
Copy the code

Use the distinct keyword with caution

The DISTINCT keyword is used to filter duplicate records to return the records that are not duplicate. When used to query a single field or few fields, it optimizes the query. However, when used in many fields, it will greatly reduce the query efficiency.

Example:

SELECT DISTINCT * from  user;
Copy the code

Is:

select DISTINCT name from user;
Copy the code

Reason:

Statements with DISTINCT have higher CPU time and usage than statements without DISTINCT. If distinct is used when querying many fields, the database engine will compare the data and filter out duplicate data. However, this comparison and filtering process will consume system resources and CPU time.

16. Delete redundant and duplicate indexes

Example:

  KEY `idx_userId` (`userId`)  
  KEY `idx_userId_age` (`userId`,`age`)
Copy the code

Is:

// Delete index userId, because index (A, B) is equivalent to index (A) and (A, B) KEY idx_userId_age (' userId ', 'age')Copy the code

Reason:

Duplicate indexes need to be maintained and the optimizer needs to consider them individually when optimizing queries, which can affect performance.

17. Optimize your modify/delete statements for large amounts of data.

Do not modify or delete too much data at the same time, because the CPU usage is too high and other people’s access to the database is affected.

Example:

// Delete 100,000 or 1 million + at a time? delete from userwhereid <100000; // Or use a single cycle operation, low efficiency, long timefor(User User: list) {delete from User; }Copy the code

Is:

// Delete in batches, for example, 500 delete users each timewhere id<500
delete product whereId > = 500 and id < 1000;Copy the code

Reason:

Delete too much data at once, you may have an error with Lock wait timeout exceed, so batch operations are recommended.

18. Consider using the default value in the WHERE clause instead of NULL.

Example:

select * from user where age is not null;
Copy the code

Select * from user; // Select * from userwhere age>0;
Copy the code

Is not null or is not NULL will not be used, it depends on the mysql version and query cost.

If the mysql optimizer finds that it costs more to index than not to index, it will definitely abandon the index. These conditions! =, >is NULL, is not NULL is not NULL is not null is not null

If you replace the null value with the default value, it will be possible to walk off the index in many cases, and the meaning will be relatively clear.

19, do not have more than 5 table joins

The more tables you have, the more time and cost it takes to compile.
It is more readable to break the join table into smaller executions.
If you have to join many tables to get the data, it’s bad design.

20. Rational utilization of exist & in

If table A represents the employee table of an enterprise, table B represents the department table, query all employees of all departments, it is easy to have SQL:

select * from A where deptId in (select deptId from B);
Copy the code

Writing this is equivalent to:

Query department table B first

select deptId from B

Then query the employees of A by deptId

select * from A where A.deptId = B.deptId

This can be abstracted as a loop like this:

   List<> resultSet ;
    for(int i=0; i<B.length; i++) {for(int j=0; j<A.length; j++) {if(A[i].id==B[j].id) {
             resultSet.add(A[i]);
             break; }}}Copy the code

Obviously, in addition to using in, we can also implement the same query function with exists, as follows:

select * from A where exists (select 1 from B where A.deptId = B.deptId); 
Copy the code

The data obtained from the primary query is checked in subqueries, and the results (true or false) are used to determine whether the results of the primary query should be retained.

So, writing this is equivalent to:

Select * from (select * from (select * from (select * from))

Select * from B where a.deptid = B.deptid; select * from B where A.deptid = B.deptid;

Similarly, it can be abstracted as a loop like this:

   List<> resultSet ;
    for(int i=0; i<A.length; i++) {for(int j=0; j<B.length; j++) {if(A[i].deptId==B[j].deptId) {
             resultSet.add(A[i]);
             break; }}}Copy the code

The database is the most laborious with the program link release. Suppose you link twice, you do a million queries of the data set each time, and then you go, and you only do it twice; Instead, it creates millions of links, requests for link release and repeats, and the system becomes overwhelmed. The mysql optimization principle is that small tables drive large tables and small data sets drive large data sets for better performance.

Therefore, we should choose the one with small outermost loop, that is, if the amount of data of B is less than that of A, it is suitable to use in; if the amount of data of B is greater than that of A, it is suitable to choose Exist.

21, Try to replace union with union all

If there are no duplicate records in the search results, union all is recommended to replace union.

Example:

select * from user where userid=1 
union  
select * from user where age = 10
Copy the code

Is:

select * from user where userid=1 
union all  
select * from user where age = 10
Copy the code

Reason:

If you use a union, you try to merge the retrieved results regardless of whether they are duplicated, and then sort them before printing the final results. If it is known that the retrieval results are not duplicated, use union all instead of union to improve efficiency.

Index should not be too much, generally within 5.

More indexes is not always better, and while indexes improve the efficiency of queries, they also reduce the efficiency of inserts and updates.
An index may be rebuilt during insert or Update, so be careful when creating an index, depending on the situation.
The number of indexes in a table should not exceed 5. If there are too many indexes, it is necessary to consider whether some indexes do not exist.

23. Use numeric fields as far as possible. If only numeric fields are used, do not design characters as far as possible

Example:

King_id 'vARCHar (20) NOT NULL COMMENT'Guardian Id'
Copy the code

Is:

`king_id` int(11) NOT NULL COMMENT 'Guardian Id'`
Copy the code

Reason:

Compared to numeric fields, character types reduce query and join performance and increase storage overhead.

24. Indexes are not suitable for fields with large amounts of duplicate data, such as database fields such as gender.

Because the SQL optimizer optimizes queries based on the amount of data in the table, if there is a lot of duplicate data in the index column, the Mysql query optimizer will probably abandon the index because it will be cheaper not to index.

25. Avoid returning too much data to the client.

Suppose the business requirement is that a user requests to see the live streaming data they have watched in the last year.

Example:

Select * from LivingInfo where id = 1where watchId =useId and watchTime >= Date_sub(now(),Interval 1 Y)
Copy the code

Is:

Select * from LivingInfowhere watchId =useId and watchTime>= Date_sub(now(),Interval 1 Y) limitSelect * from LivingInfo where (select * from LivingInfo where (select * from LivingInfo where (select * from LivingInfo))where watchId =useId and watchTime>= Date_sub(now(),Interval 1 Y) limit200;Copy the code

26. When joining multiple tables in an SQL statement, use the table alias and prefix each column with the alias to make the semantics clearer.

Example:

select  * from A inner
join B on A.deptId = B.deptId;
Copy the code

Is:

select  memeber.name,deptment.deptName from A member inner
join B deptment on member.deptId = deptment.deptId;
Copy the code

27. Use varchar/ nvARCHar instead of char/ nCHAR whenever possible.

Example:

  `deptName` char(100) DEFAULT NULL COMMENT 'Department Name'
Copy the code

Is:

  `deptName` varchar(100) DEFAULT NULL COMMENT 'Department Name'
Copy the code

Reason:

Because first of all the variable length field storage space is small, can save storage space.
Secondly, for queries, it is more efficient to search in a relatively small field.

28. To improve the efficiency of the group by statement, you can filter out unnecessary records before executing the statement.

Example:

Select job, avG (salary) from employee group by job having job ='president' 
or job = 'managent'
Copy the code

Is:

Select job, avg (salary) from employeewhere job ='president' 
or job = 'managent'Group by the job;Copy the code

29. When the field type is a string, enclose it in quotation marks; otherwise, the index is invalid

Example:

select * from user where userid =123;
Copy the code

select * from user where userid ='123';
Copy the code

Why is the first statement not indexed without single quotes? This is because without single quotes, it’s a comparison between a string and a number, they don’t match, and MySQL will implicitly convert them to a floating point and then compare them.

Use Explain to analyze your PLAN for SQL

Try to make a habit of writing SQL on a daily basis. Use explain to analyze the SQL you have written, especially in the area of indexing.

explain select * from user where userid =10086 or age =18;
Copy the code

Reference and Thanks

Mysql Optimization Principles _ Small table drives large table IN and EXISTS
SQL statement optimization analysis

Personal public Account

If you think it’s good, please give me a thumbs up and follow me. Thank you
If there are any inaccuracies, please kindly point them out.
At the same time, I am looking forward to friends can pay attention to my public number, and slowly introduce better dry goods ~ xi xi
Github: github.com/whx123/Java…

Backend Programmer Must-have: 30 Tips for Writing high-quality SQL

preface

Select * from ‘select *’; select * from ‘select’;

2. If you know that the query result has only one record or only a maximum/minimum record, you are advised to use limit 1

3. Try to avoid using OR to join conditions in where clauses

4. Optimize limit paging

5. Optimize your like statements

6. Use the WHERE condition to limit the data to be queried to avoid returning redundant rows

7. Avoid using mysql’s built-in functions on index columns

8. Try to avoid expression operations on fields in the WHERE clause. This will cause the system to abandon the use of indexes and perform a full table scan

9, Inner Join, left join, right join, Inner join preferentially, if left join, the left table result as small as possible

10, try to avoid using in the WHERE clause! = or <> operators, otherwise the engine will abandon the index and proceed to a full table scan.

11. When using a federated index, pay attention to the order of index columns. Generally, the left-most matching principle is followed.

12. To optimize the query, consider creating indexes on the columns involved in where and Order BY, and try to avoid full table scans.

13. If there is too much data inserted, consider batch inserts.

14. Use overwrite indexes when appropriate.

Use the distinct keyword with caution

16. Delete redundant and duplicate indexes

17. Optimize your modify/delete statements for large amounts of data.

18. Consider using the default value in the WHERE clause instead of NULL.

19, do not have more than 5 table joins

20. Rational utilization of exist & in

21, Try to replace union with union all

Index should not be too much, generally within 5.

23. Use numeric fields as far as possible. If only numeric fields are used, do not design characters as far as possible

24. Indexes are not suitable for fields with large amounts of duplicate data, such as database fields such as gender.

25. Avoid returning too much data to the client.

26. When joining multiple tables in an SQL statement, use the table alias and prefix each column with the alias to make the semantics clearer.

27. Use varchar/ nvARCHar instead of char/ nCHAR whenever possible.

28. To improve the efficiency of the group by statement, you can filter out unnecessary records before executing the statement.

29. When the field type is a string, enclose it in quotation marks; otherwise, the index is invalid

Use Explain to analyze your PLAN for SQL

Reference and Thanks

Personal public Account

Related Posts

Java Abstract classes and interfaces

Wechat public account integrated third party development nanny tutorial

ThinkPHP6 initial build API service

Select * from ‘select ’; select from ‘select’;