MySql series: the principle and algorithm of the JOIN little summary | August more challenges

This is the seventh day of my participation in the August More text Challenge. For details, see:August is more challenging

📖 preface

It's important to be able to make complex things simple

In the process of learning, we have seen a lot of materials, videos, documents, etc. Because there are so many materials and videos now, a knowledge point often has a variety of videos to explain it. In addition to the promotion and marketing, there are indeed many excellent video explanations. For example, teacher Li Yongle’s short video lessons can explain such complex knowledge so easily and thoroughly on a blackboard. And those of us who are learning to code, we need to learn not only how to speak clearly, but how to write clearly.

All of the following is related to previous posts and you can check out the blog post.Chen Yongjia’s blog

💕 What is JOIN?

The meaning of JOIN is just like the English word “JOIN”, which joins two tables, roughly divided into inner JOIN, outer JOIN, right JOIN, left JOIN and natural JOIN. This describes throwing out an overused graph and inserting test data.

🚀 Cartesian product:`CROSS JOIN`

To understand all kinds of joins you have to understand cartesian products. The Cartesian product is forcing together every record in table A with every record in table B. So, if table A has n records and table B has M records, the cartesian product will produce n by m records.

👏 create table (create table first)

CREATE DATABASE db0206; USE db0206; CREATE TABLE `db0206`.`tbl_dept`( `id` INT(11) NOT NULL AUTO_INCREMENT, `deptName` VARCHAR(30), `locAdd` VARCHAR(40), PRIMARY KEY (`id`) ) ENGINE=INNODB CHARSET=utf8mb4; CREATE TABLE `db0206`.`tbl_emp`( `id` INT(11) NOT NULL AUTO_INCREMENT, `name` VARCHAR(20), `deptId` INT(11), PRIMARY KEY (`id`), FOREIGN KEY (`deptId`) REFERENCES `db0206`.`tb_dept`(`id`) ) ENGINE=INNODB CHARSET=utf8mb4; INSERT INTO tbl_dept(deptName,locAdd) VALUES('CC',21); INSERT INTO tbl_dept(deptName,locAdd) VALUES('MM',21); INSERT INTO tbl_dept(deptName,locAdd) VALUES('SC',22); INSERT INTO tbl_dept(deptName,locAdd) VALUES('SC',23); INSERT INTO tbl_dept(deptName,locAdd) VALUES('EH',65); INSERT INTO tbl_emp(NAME,deptId) VALUES('c1',1); INSERT INTO tbl_emp(NAME,deptId) VALUES('c2',1); INSERT INTO tbl_emp(NAME,deptId) VALUES('c3',1); INSERT INTO tbl_emp(NAME,deptId) VALUES('m5',2); INSERT INTO tbl_emp(NAME,deptId) VALUES('m6',2); INSERT INTO tbl_emp(NAME,deptId) VALUES('s7',3); INSERT INTO tbl_emp(NAME,deptId) VALUES('s8',4);Copy the code

The following queries are different from the data in this table, please ignore them

The connection:

The SQL statement executed and the result of the query executed: the common part of two joined tables

Left outer connection:

The SQL statement executed and the result of the query executed: the unique part of the left table and the empty part of the right table are left empty

Right outer connection:

SQL statement executed and query results executed:

Left connection:

SQL statement executed and query results executed:

Right connection:

SQL statement executed and query results executed:

All connections:

The SQL statement executed and the result of the query executed

Data sets not shown in either table:

The SQL statement executed and the result of the query executed

Table join algorithm

Ested Loop Join (NLJ) algorithm:

Firstly, a basic algorithm :NLJ, nested loop algorithm is introduced. The outer layer of the loop is the driving table, and the inner layer of the loop is the driven table. The driver table drives the driven table for join operations. First, the driver table finds the first record, and then scans the driven table from scratch, one by one to find the records matching the first record of the driver table and then join them to form a record in the result table. After the driver table is searched, the second record is retrieved from the driver table, and then the driven table is scanned from the beginning, and the records matching the second record of the driver table are searched one by one, and a record in the result table is connected. Repeat until all the records in the driver table have been processed. This is the basic idea of the nested loop join algorithm. The pseudocode is as follows.

Foreach row1 from T1 foreach Row2 from T2 if row2 match row1 Join ROW1 and Row2 into result // Join Row1 and Row2 into the result setCopy the code

First load T1, then fetch the first record from T1, then load T2 table, match the records in T2 table one by one, and join the matched records.

Block Nested Loop Join(BNLJ) algorithm:

Another advanced algorithm: BNLJ, block nested loop algorithm, can be regarded as the optimization of NLJ. The general idea is to create a cache, take multiple records from the driver table at a time, and then scan the driven table. Each record of the driven table tries to match multiple records in the buffer, and if it matches, join and add it to the result set. The larger the buffer, the more records the driver table can pull out at one time. The optimization idea of this algorithm is to reduce the number of inner loops to improve the efficiency of table join. (Bloggers rarely use it!)

Factors that affect performance

1. Number of inner loops: Now consider a scenario where T1 has 100 records and T2 has 10,000 records. So, t1 drives T2 or T2 drives T1, which is more efficient? If it’s just the number of times the analysis instructions are executed, they are all 100*10000, but consider the number of times the table is loaded. Firstly, t1 driver T2 is analyzed. Table T1 is loaded once and table T2 needs to be loaded 100 times. Then analyze T2 to drive T1, t2 table is loaded 1 time first, but T1 table is loaded 10000 times. So t1 drives T2 more efficiently than T2 drives T1. Small tables driving large tables can reduce the number of inner loops and improve join efficiency.

In addition, if the Block Nested Loop Join algorithm is used, the number of inner loops can be reduced by increasing the size of the cache. Therefore, setting a reasonable buffer size can improve the connection efficiency

2. Quick matching: Scanning the driven table for appropriate records can be considered as a query operation, how to improve the efficiency of the query? Build an index! It can also be concluded that the indexing of the driven table can improve the join efficiency

3. Sort: Suppose that table T1 drives table T2 to join, and the join condition is t1.id= T2.id, and the query results are required to sort by ID. […ORDER BY T1.id], […ORDER BY T2.id]. If we use Method 1, we can sort t1 first and then perform the table join algorithm. If we use Method 2, we can only sort the result set after performing the table join algorithm, which is inefficient. Finally, it can be concluded that prioritizing the attributes of the driver table can improve the join efficiency.

✨ reference

“Introduction to database system” : my.oschina.net/xinxingegey…

😎 How to Use

In version 5.6 and later, the optimizer manages parametersoptimizer_switchIn theblock_nested_loopThe parameter controls whether BNL is used by the optimizer. By default, the join mode is enabled. If this parameter is set to OFF, the optimizer selects the NLJ algorithm when selecting the join mode.

🎉 summary:

The optimization of MySql needs to be used in the actual development process to summarize.
I hope you can stand on the shoulders of predecessors and walk higher and farther!
If you like me, remember to follow and like me!

MySql series: the principle and algorithm of the JOIN little summary | August more challenges

📖 preface

💕 What is JOIN?

🚀 Cartesian product:CROSS JOIN

👏 create table (create table first)

The connection:

Left outer connection:

Right outer connection:

Left connection:

Right connection:

All connections:

Data sets not shown in either table:

Table join algorithm

Ested Loop Join (NLJ) algorithm:

Block Nested Loop Join(BNLJ) algorithm:

Factors that affect performance

✨ reference

😎 How to Use

🎉 summary:

Related Posts

Python crawler (3) – Practice simple crawling blogs

RabbitMQ(1)

Experience of distributed task scheduling XXL-Job

🚀 Cartesian product:`CROSS JOIN`