This is the fifth day of my participation in the August Wen Challenge.More challenges in August

preface

If you can only use JOIN and don’t understand join, you may be muddling along without much understanding of SQL. Today we will understand thoroughly how a JOIN can associate multiple tables.

select * from table1 t1
left join table2 t2 on t1.id = t2.tid;
Copy the code

How would you design an algorithm to get a legitimate result set?

You might take the first value in T1 and match it in T2. We’re going to loop through this until we run out of values in T1. If table T1 has 10,000 records and table T2 has 10,000 records, then the final count is 100 million. Obviously not.

for (data1 in t1) {
  for (data2 in t2) {
    if(Data1 matches data2 by ON) {result}}}Copy the code

Obviously you’re on the right track, but how do you optimize this simple algorithm? Let’s take a look at how mysql designed the join algorithm. You can get a sense of the importance of indexes and why they are not used.

Index nested loop join

When the TID of t2 table is indexed, this algorithm is matched. Select * from t1; select * from t2; select * from t2; select * from t2; This reduces the number of comparisons to t2 tables.

Cache block nested loop connection

When the index cannot be used, the algorithm is matched. Select multiple values from t1 and match them with t2. Reduce the number of table sweeps for T1.

Table sweep: The process of reading data from memory, consuming performance.

show variables like 'optimizer_switc%'; # check whether Block nested-loop Join is enabled, which is enabled by default
show variables like 'join_buffer_size%'; # check join buffer size
Copy the code

conclusion

Index nested-loop Join: optimized for Loop matching word count. Cache Block Nested Loop Join: optimized for I/O counts. Optimization idea:

  • Small table driving large table: reduce outer loop (outer in this case refers to T1 table)
  • Add indexes for matched conditions: less inner loop (inner loop here refers to T2 table)
  • Increase join_BUFFer_SIZE: Cache more data and reduce the number of inner table sweeps.
  • Reduce unnecessary field queries: The fewer fields there are, the more data join_buffer stores.

At this point, I think we don’t just understand cartesian products, we really understand how joins work.