This is the fifth day of my participation in the August Wen Challenge.More challenges in August
preface
If you can only use JOIN and don’t understand join, you may be muddling along without much understanding of SQL. Today we will understand thoroughly how a JOIN can associate multiple tables.
select * from table1 t1
left join table2 t2 on t1.id = t2.tid;
Copy the code
How would you design an algorithm to get a legitimate result set?
You might take the first value in T1 and match it in T2. We’re going to loop through this until we run out of values in T1. If table T1 has 10,000 records and table T2 has 10,000 records, then the final count is 100 million. Obviously not.
for (data1 in t1) {
for (data2 in t2) {
if(Data1 matches data2 by ON) {result}}}Copy the code
Obviously you’re on the right track, but how do you optimize this simple algorithm? Let’s take a look at how mysql designed the join algorithm. You can get a sense of the importance of indexes and why they are not used.
Index nested loop join
When the TID of t2 table is indexed, this algorithm is matched. Select * from t1; select * from t2; select * from t2; select * from t2; This reduces the number of comparisons to t2 tables.
Cache block nested loop connection
When the index cannot be used, the algorithm is matched. Select multiple values from t1 and match them with t2. Reduce the number of table sweeps for T1.
Table sweep: The process of reading data from memory, consuming performance.
show variables like 'optimizer_switc%'; # check whether Block nested-loop Join is enabled, which is enabled by default
show variables like 'join_buffer_size%'; # check join buffer size
Copy the code
conclusion
Index nested-loop Join: optimized for Loop matching word count. Cache Block Nested Loop Join: optimized for I/O counts. Optimization idea:
- Small table driving large table: reduce outer loop (outer in this case refers to T1 table)
- Add indexes for matched conditions: less inner loop (inner loop here refers to T2 table)
- Increase join_BUFFer_SIZE: Cache more data and reduce the number of inner table sweeps.
- Reduce unnecessary field queries: The fewer fields there are, the more data join_buffer stores.
At this point, I think we don’t just understand cartesian products, we really understand how joins work.