Record scattered knowledge points, notes to strengthen understanding, have written wrong please big guy give directions

A Join uses the nested-loop Join algorithm, and nested-loop Join has three types

select * from t1 join t2 on t1.a = t2.a;
-- A 100 data, B 1000 data
Copy the code

Simple Nested-Loop Join

The whole table t1 will be traversed. T1 acts as the driver table, and every data in T1 will be queried in T2 for the whole table. This process will compare 100*1000 times.

When a full table scan is performed in t2, it is not guaranteed to be in memory, and the Buffer Pool will be eliminated, possibly on disk.

Block nested-loop Join (MYSQL driver link does not use index)

In join_buffer, t1 data is loaded into the join_buffer, and t2 data is traversed to make each t2 data match the T1 data in join_buffer.

T1 Full table scan = 100 times

T2 full table scan = 1000

Query times = 1100

In join_buffer, the number of comparisons is 100 x 1000

The number of comparisons is the same as that of Simple nested-loop Join, but the comparison process is much faster and has better performance than Simple nested-loop Join.

Join_buffer has a size. If the size of data detected by T1 is larger than that of Join_buffer, part of the data in T1 will be loaded first. After comparing t2, the data in JOIN_buffer will be emptied, and then the rest data in T1 will be loaded.

The number of scans in T1 is the same as that in Join_buffer, but the number of scans in T2 is multiplied according to the number of segments.

Assume that the number of data rows in the driven table is N, which needs to be divided into K segments to complete the algorithm process, and the number of data rows in the driven table is M.

K = λ * N

Number of times of scanning the driven table = M * λ * N

λ is related to the size of the join_buffer. If the join_buffer is large enough, the time of large table drivers is the same as that of small table drivers.

When segmentation is required, the fewer times the segmentation is, the fewer times the driven table will be scanned, so a small table driver should be used.

Index nested-loop Join (MYSQL driver link uses Index)

For example, if the a field is indexed.

Table T1 scans all tables. Each data in table T1 is queried in table T2 for its index, and then queried back to the table after the ID is found. (If the join field is the primary key of table T2, the back-table operation is omitted.)

T1 Scans all tables = 100 times

T2 index query = log1000 times

T2 query = log1000 times

Suppose that the number of rows in the driven table is N and the number of rows in the driven table is M.

Total query times = N + N x 2logM

As can be seen from the above, the larger the data in the driver table, the more times the query will be performed. Therefore, the smaller table should be used as the driver table.

MySQL > MySQL > MySQL > MySQL > MySQL