preface

Welcome to our GitHub repository Star: github.com/bin39232820… The best time to plant a tree was ten years ago, followed by now. I know many people don’t play QQ anymore, but for a moment of nostalgia, welcome to join the six-vein Shenjian Java rookie learning group, group chat number: 549684836 encourage everyone to write blog on the road of technology

omg

Let’s continue exploring mysql. Mysql > mysql > mysql > mysql > mysql > mysql > mysql > mysql > mysql

Schema data type optimization and indexing foundation
Mysql > Select and Update
(3) InnoDB storage structure
Mysql > select * from B+ tree
Mysql > select * from table_name

The above chapter is the basis, about the single table query, if you have not seen, please move the above chapter.

Slow query basics: Optimize data access

The main reason for low query performance is that we need to query too much data, you have a report query is slow, for inefficient query, we may refer to the following two steps.

Determine if the application is retrieving more data than it needs.
Determine if the mysql server layer is parsing a large amount of data than is needed.

Whether the database requests data that is not needed.

There is a lot of SQL that requests more data than it really needs, which places an extra burden on the server and increases network overhead. The following example is a bad example

For example, you should not use * when you only need to return 2 columns.
The second is that you need to query 10 items of data, but you do not limit the number of queries.

Whether to scan additional records

After determining and confirming the number of rows we need to return, the next thing we need to optimize is to see if too much data is being queried. For mysql, the simplest three metrics to measure the query cost are as follows:

The response time
Scan lines
The number of rows returned

Number of rows scanned and number of rows returned

If we find that we scan too many rows, but we only return a small number of rows, there are several aspects of this optimization.

An index override scan is used to place all required columns in the index so that the storage engine can return the result without going back to the table to fetch the corresponding row
Change the library table structure to use a separate summary table
Rewrite the complex SQL and break it up with the logic of the code.

Refactoring the query mode

In the optimization problem of query, our goal is to find a way, as long as the result is the same as the original result, then less time cost method, has always emphasized the need to the database level to accomplish as much as possible, such logic in used to think of the reason is they think network overhead will be very high, but these do not apply for mysql, Mysql connections and disconnections are lightweight, so small queries are king

One complex query or multiple simple queries

I recently worked on a project where a former colleague’s SQL was full of logic. Result in system performance special slow, and then we took over, after doing a wave of optimization, put him all into several simple queries, though the connection number is much, but our query efficiency, and the reusability of the code, can expand sex don’t know how many times stronger, so it depends, if can be optimized into several simple query, will try to optimize.

Shard query (divide and conquer)

Take batch updates, for example, I changed a topic chapters, I need to change the subject before a user section, so I will need a batch update, if one-time update so much data, will lock a lot of data, causing big transaction, if I put him into multiple updates, each update 1 k will be much better.

Decompose associated query

Many high-performance applications decompose associative queries. Simply, you can perform a single table query for each table and associate the results with the application layer, as shown in the following example

Why are we doing this? What are the benefits of this

After splitting the query, performing a single query reduces lock contention
Decomposition in the application layer, it is easier to split the database, it is easier to achieve high-performance expansion
The performance of the query itself also increases
Splitting queries reduces the number of redundant queries

Associated subquery optimization

In fact, many associated subqueries are very bad, the worst is where followed by in subquery, we generally recommend using left Join for such subqueries

Optimization of maxima, minima,

In many cases, a field to ask him the maximum, minimum, we will use the function to do it, but it is not the best, such as some of the scenes, is that we can know the query field if index is, the index itself is a sort of, so we just need to sort good him, take limit 1 using function would be much better than you.

Optimization of count

Many bloggers make a mistake when they say they don’t want to count() at the end of the query. The best way to query results is to count(), because they are optimized at the bottom.

In order of efficiency, count(field)

Principle of connection

An unavoidable concept in database is Join. I believe that many friends are confused when they first learn to connect. After understanding the semantics of the connection, they may not understand how the records in each table are connected, so that they often fall into the following two misunderstandings when using:

Myth # 1: Business matters, no matter how complex a query is, it can be done in a join statement.
Myth 2: Stay away, the last time the DBA reported that the slow query was caused by the use of connections, never use again.

Concepts of inner join and outer join

For the two inner join tables, the records in the driver table cannot be found in the driven table matching the record, the record will not be added to the final result set, we mentioned above are the so-called inner join.
For two externally joined tables, records in the driver table need to be added to the result set even if there are no matching records in the driven table.
- In MySQL, external connections can still be divided into two types according to the selected driver table: – left connection – right connection

There is still a problem though, even for external joins, sometimes we do not want to add all the records of the driver table to the final result set. This is difficult, sometimes the match failed to join the result set, and sometimes do not join the result set, how to do this, a little sad ah… This problem is solved by dividing the filter criteria into two types, so the filter criteria have different semantics when placed in different places:

Filter criteria in the WHERE clause
- The filter conditions in the WHERE clause are the same as those we usually see. Records that do not meet the filter conditions in the WHERE clause are not added to the final result set, whether it is an inner join or an outer join.
Filter criteria in the ON clause
- If no record matching the filter condition in the ON clause is found in the driven table, the record is still added to the result set, and the fields of the corresponding driven table record are filled with NULL values.

Principle of connection

Nested-loop Join

As mentioned earlier, for a two-table join, the driven table is accessed only once, but the driven table is accessed many times, depending on the number of entries in the result set after a single table query is performed on the driven table. For the inner join, it doesn’t matter which table is the driver table, and the outer join driver table is fixed, that is to say, the left (outer) connected driver table is the left table, the right (outer) connected driver table is the right table. SQL > alter table T1; alter table T2; alter table T2;

Step 1: Select the driver table, use the filter conditions related to the driver table, and select the single table access method with the lowest cost to perform the single table query on the driver table.
Step 2: For each record in the result set obtained from querying the driver table in the previous step, search for the matching record in the driven table.

If you have three tables to connect, then the result set in step 2 is like a new driver table, then the third table has become a driver table, repeat the above process, also is the result of step 2 of each record in the table need to t3 find any matching records, expressed in pseudo code once this process is like this:

for each row in t1 {   Each entry in the t1 single table query result set is traversed
    
    for each row in t2 {   For a record of a t1 table, traverse every record that satisfies the query result set of a t2 single table
    
        for each row in t3 {   A single table query is performed on table T3 for a combination of t1 and T2 records
            if row satisfies join conditions, send to client
        }
    }
}
Copy the code

Use indexes to speed up connections

For example, index the driven table and make their query type cast or ref to speed up the query

Block nested-loop Join

A join buffer is allocated in advance to store the records in the result set of the driven table, and then the driven table is scanned. The records of each driven table match the records of multiple driven tables in the memory at one time, which can significantly reduce the I/O operations of the driven table.

At the end

We’ll continue the fight in the next chapter.

Daily for praise

Ok, everybody, that’s all for this article, you can see people here, they are real fans.

Creation is not easy, your support and recognition, is the biggest motivation for my creation, we will see in the next article

Six pulse excalibur | article “original” if there are any errors in this blog, please give criticisms, be obliged!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Mysql from entry to entry (6) query performance optimization

preface

omg

Slow query basics: Optimize data access