preface
Welcome to our GitHub repository Star: github.com/bin39232820… The best time to plant a tree was ten years ago, followed by now. I know many people don’t play QQ anymore, but for a moment of nostalgia, welcome to join the six-vein Shenjian Java rookie learning group, group chat number: 549684836 encourage everyone to write blog on the road of technology
omg
Let’s continue exploring mysql. Mysql > mysql > mysql > mysql > mysql > mysql > mysql > mysql > mysql
- Schema data type optimization and indexing foundation
- Mysql > Select and Update
- (3) InnoDB storage structure
- Mysql > select * from B+ tree
- Mysql > select * from table_name
The above chapter is the basis, about the single table query, if you have not seen, please move the above chapter.
Slow query basics: Optimize data access
The main reason for low query performance is that we need to query too much data, you have a report query is slow, for inefficient query, we may refer to the following two steps.
- Determine if the application is retrieving more data than it needs.
- Determine if the mysql server layer is parsing a large amount of data than is needed.
Whether the database requests data that is not needed.
There is a lot of SQL that requests more data than it really needs, which places an extra burden on the server and increases network overhead. The following example is a bad example
- For example, you should not use * when you only need to return 2 columns.
- The second is that you need to query 10 items of data, but you do not limit the number of queries.
Whether to scan additional records
After determining and confirming the number of rows we need to return, the next thing we need to optimize is to see if too much data is being queried. For mysql, the simplest three metrics to measure the query cost are as follows:
- The response time
- Scan lines
- The number of rows returned
Number of rows scanned and number of rows returned
If we find that we scan too many rows, but we only return a small number of rows, there are several aspects of this optimization.
- An index override scan is used to place all required columns in the index so that the storage engine can return the result without going back to the table to fetch the corresponding row
- Change the library table structure to use a separate summary table
- Rewrite the complex SQL and break it up with the logic of the code.
Refactoring the query mode
In the optimization problem of query, our goal is to find a way, as long as the result is the same as the original result, then less time cost method, has always emphasized the need to the database level to accomplish as much as possible, such logic in used to think of the reason is they think network overhead will be very high, but these do not apply for mysql, Mysql connections and disconnections are lightweight, so small queries are king
One complex query or multiple simple queries
I recently worked on a project where a former colleague’s SQL was full of logic. Result in system performance special slow, and then we took over, after doing a wave of optimization, put him all into several simple queries, though the connection number is much, but our query efficiency, and the reusability of the code, can expand sex don’t know how many times stronger, so it depends, if can be optimized into several simple query, will try to optimize.
Shard query (divide and conquer)
Take batch updates, for example, I changed a topic chapters, I need to change the subject before a user section, so I will need a batch update, if one-time update so much data, will lock a lot of data, causing big transaction, if I put him into multiple updates, each update 1 k will be much better.
Decompose associated query
Many high-performance applications decompose associative queries. Simply, you can perform a single table query for each table and associate the results with the application layer, as shown in the following example
Why are we doing this? What are the benefits of this
- After splitting the query, performing a single query reduces lock contention
- Decomposition in the application layer, it is easier to split the database, it is easier to achieve high-performance expansion
- The performance of the query itself also increases
- Splitting queries reduces the number of redundant queries
Associated subquery optimization
In fact, many associated subqueries are very bad, the worst is where followed by in subquery, we generally recommend using left Join for such subqueries
Optimization of maxima, minima,
In many cases, a field to ask him the maximum, minimum, we will use the function to do it, but it is not the best, such as some of the scenes, is that we can know the query field if index is, the index itself is a sort of, so we just need to sort good him, take limit 1 using function would be much better than you.
Optimization of count
Many bloggers make a mistake when they say they don’t want to count() at the end of the query. The best way to query results is to count(), because they are optimized at the bottom.
In order of efficiency, count(field)
Principle of connection
An unavoidable concept in database is Join. I believe that many friends are confused when they first learn to connect. After understanding the semantics of the connection, they may not understand how the records in each table are connected, so that they often fall into the following two misunderstandings when using:
- Myth # 1: Business matters, no matter how complex a query is, it can be done in a join statement.
- Myth 2: Stay away, the last time the DBA reported that the slow query was caused by the use of connections, never use again.
Concepts of inner join and outer join
- For the two inner join tables, the records in the driver table cannot be found in the driven table matching the record, the record will not be added to the final result set, we mentioned above are the so-called inner join.
- For two externally joined tables, records in the driver table need to be added to the result set even if there are no matching records in the driven table.
- In MySQL, external connections can still be divided into two types according to the selected driver table: – left connection – right connection
There is still a problem though, even for external joins, sometimes we do not want to add all the records of the driver table to the final result set. This is difficult, sometimes the match failed to join the result set, and sometimes do not join the result set, how to do this, a little sad ah… This problem is solved by dividing the filter criteria into two types, so the filter criteria have different semantics when placed in different places:
-
Filter criteria in the WHERE clause
- The filter conditions in the WHERE clause are the same as those we usually see. Records that do not meet the filter conditions in the WHERE clause are not added to the final result set, whether it is an inner join or an outer join.
-
Filter criteria in the ON clause
- If no record matching the filter condition in the ON clause is found in the driven table, the record is still added to the result set, and the fields of the corresponding driven table record are filled with NULL values.
Principle of connection
Nested-loop Join
As mentioned earlier, for a two-table join, the driven table is accessed only once, but the driven table is accessed many times, depending on the number of entries in the result set after a single table query is performed on the driven table. For the inner join, it doesn’t matter which table is the driver table, and the outer join driver table is fixed, that is to say, the left (outer) connected driver table is the left table, the right (outer) connected driver table is the right table. SQL > alter table T1; alter table T2; alter table T2;
- Step 1: Select the driver table, use the filter conditions related to the driver table, and select the single table access method with the lowest cost to perform the single table query on the driver table.
- Step 2: For each record in the result set obtained from querying the driver table in the previous step, search for the matching record in the driven table.
If you have three tables to connect, then the result set in step 2 is like a new driver table, then the third table has become a driver table, repeat the above process, also is the result of step 2 of each record in the table need to t3 find any matching records, expressed in pseudo code once this process is like this:
for each row in t1 { Each entry in the t1 single table query result set is traversed
for each row in t2 { For a record of a t1 table, traverse every record that satisfies the query result set of a t2 single table
for each row in t3 { A single table query is performed on table T3 for a combination of t1 and T2 records
if row satisfies join conditions, send to client
}
}
}
Copy the code
Use indexes to speed up connections
For example, index the driven table and make their query type cast or ref to speed up the query
Block nested-loop Join
A join buffer is allocated in advance to store the records in the result set of the driven table, and then the driven table is scanned. The records of each driven table match the records of multiple driven tables in the memory at one time, which can significantly reduce the I/O operations of the driven table.
At the end
We’ll continue the fight in the next chapter.
Daily for praise
Ok, everybody, that’s all for this article, you can see people here, they are real fans.
Creation is not easy, your support and recognition, is the biggest motivation for my creation, we will see in the next article
Six pulse excalibur | article “original” if there are any errors in this blog, please give criticisms, be obliged!