preface

In daily development, most of the usages of LEFT JOIN and JOIN should be the same. If there are two tables A and B, if the data of both tables is wanted, JOIN is used; if only all the data of one table is wanted, and the data of the other table is optional, LEFT JOIN is used. (Of course this is not an accurate description, but it fits my daily business development).

As we have seen in this article, LEFT JOIN selects the driver table by itself, while JOIN selects the driver table by the MYSQL optimizer.

So, when we write a LEFT JOIN statement, will MYSQL optimize this statement to be a JOIN statement?

If so, when will it be optimized?

In fact, this is one of my online problems. Let’s take a look.

Problem description

There is such a slow SQL (processed) on our line that it takes more than 0.5 seconds to execute.

select 
    count(distinct order.order_id) 
from order force index(shop_id) 
left join `order_extend`
on `order`.`order_id` = `order_extend`.`order_id` 
where `order`.`create_time`> ="The 2020-08-01 00:00:00" 
and `order`.`create_time`< ="The 2020-08-01 23:59:59" 
and `order`.`shop_id` = 328449726569069326 
and `order`.`status` = 1 
and `order_extend`.`shop_id` = 328449726569069326 
and `order_extend`.`status` = 1
Copy the code

Explain results are as follows:

+----+-------------+--------------+------------+--------+------------------+----------+---------+------------------------ +------+-------------+
| id | select_type | table        | partitions | type   | possible_keys    | key      | key_len | ref                    | rows | Extra       |
+----+-------------+--------------+------------+--------+------------------+----------+---------+------------------------ +------+-------------+
|  1 | SIMPLE      | order_extend | NULL       | ref    | order_id,shop_id | shop_id  | 8       | const                  | 3892 | Using where |
|  1 | SIMPLE      | order        | NULL       | eq_ref | shop_id          | shop_id  | 16      | example.order.order_id |    1 | Using where |
+----+-------------+--------------+------------+--------+------------------+----------+---------+------------------------ +------+-------------+
2 rows in set.1 warning (0.00 sec)
Copy the code

Problem analysis

Explain, in conjunction with the MYSQL join query algorithm we discussed earlier, drive table order_EXTEND, loop 3892 times, say no more, say no less, say no less, drive table data query type eq_REF, so it should not be too slow. Then the problem is 3,892 times. Find a way to bring that number down.

Wait a minute! Why is the driver table ORDER_EXTEND? The driver table is order_extend and the driver table is order_extend and the driver table is order_extend. Is MYSQL internally optimized?

Since the driver table has changed, this SQL becomes a JOIN statement.

Let’s analyze this statement in the same way we analyzed the JOIN statement. (ps: You need to understand the internal execution process of MYSQL JOIN. If you are not familiar with it, please read this article first → MYSQL JOIN Query Algorithm)

MYSQL selects order_extend as the driver table, which indicates that order_extend queries less data in the WHERE condition. MYSQL selects a smaller table as the driver table.

Let’s run the SELECT count(*) statement separately with the ABOVE WHERE condition to see roughly how many SQL records are involved in each table.

In order not to affect our analysis, we use the explain statement, so that the whole process is estimated, and simulate the process of MYSQL analysis.

mysql> explain select 
    count(distinct order.order_id) 
from order force index(shop_id) 
where `order`.`create_time`> ="The 2020-08-01 00:00:00" 
and `order`.`create_time`< ="The 2020-08-01 23:59:59" 
and `order`.`shop_id` = 328449726569069326 
and `order`.`status` = 1;


+----+-------------+-------+------------+------+--------------------------------+---------+---------+-------+--------+--- ----------+
| id | select_type | table | partitions | type | possible_keys                  | key     | key_len | ref   | rows   | Extra       |
+----+-------------+-------+------------+------+--------------------------------+---------+---------+-------+--------+--- ----------+
|  1 | SIMPLE      | order | NULL       | ref  | PRIMARY,shop_id,create_time... | shop_id | 8       | const | 320372 | Using where |
+----+-------------+-------+------------+------+--------------------------------+---------+---------+-------+--------+--- ----------+
1 row in set.1 warning (0.00 sec)
Copy the code
select 
    count(distinct order_extend.order_id) 
and `order_extend`.`shop_id` = 328449726569069326 
and `order_extend`.`status` = 1

+----+-------------+--------------+------------+------+------------------+---------+---------+-------+------+----------+- ------------+
| id | select_type | table        | partitions | type | possible_keys    | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+--------------+------------+------+------------------+---------+---------+-------+------+----------+- ------------+
|  1 | SIMPLE      | order_extend | NULL       | ref  | order_id,shop_id | shop_id | 8       | const | 3892 |    10.00 | Using where |
+----+-------------+--------------+------------+------+------------------+---------+---------+-------+------+----------+- ------------+
1 row in set.1 warning (0.00 sec)
Copy the code

As you can see, in the above WHERE condition, order_EXTEND only queries 3892 pieces of data, while order queries 320372 pieces, so order_EXTEND has no problem driving the table.

So why does the order table scan so much data? There may not be so much data on 2020-08-01. Force index(shop_id) = shop_id (shop_id) = shop_id (shop_id) = shop_id (shop_id) = shop_id (shop_id

mysql> explain select 
    count(distinct order.order_id) 
where `order`.`create_time`> ="The 2020-08-01 00:00:00" 
and `order`.`create_time`< ="The 2020-08-01 23:59:59" 
and `order`.`shop_id` = 328449726569069326 
and `order`.`status` = 1;


+----+-------------+-------+------------+------+---------------+-------------+---------+-------+-------+----------+------ --------------------+
| id | select_type | table | partitions | type | possible_keys | key         | key_len | ref   | rows  | filtered | Extra                    |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+-------+----------+------ --------------------+| | 1 SIMPLE | order | NULL | ref | create_time | create_time 8 | | const | < 3892 | | 10.00 Using the where; Using index | +----+-------------+-------+------------+------+---------------+-------------+---------+-------+-------+----------+------ --------------------+
1 row in set.1 warning (0.00 sec)
Copy the code

If the shop_id index is not forced, the create_time index will scan fewer rows. If the shop_id index is not forced, the create_time index will scan 100 rows. If the shop_ID index is not forced, the create_time index will scan 100 x 3892 rows. Scan 3892 x 300000 rows of data.

Problem conclusion

The reason for the slow SQL is that we forced the shop_id index, which caused MYSQL to scan more rows. We just need to remove the forced index. MYSQL selects the correct index most of the time, so you must be careful when using the forced index.

Problem extension

Now that we’ve solved the problem of slow SQL, let’s go back to the question at the beginning of this article: Will the LEFT JOIN be optimized as a JOIN?

The answer is yes. So when does this happen?

Let’s review the MYSQL LEFT JOIN detail article again.

We have pasted some of the content to make it easier to read.

mysql> select * from goods left join goods_category on goods.category_id = goods_category.category_id;
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+| | 1 men's shoes | 1 | 1 | | | 2 | 2 | | 1 | 1 men's shoes shoes | | 3 | 3 | 3 | 3 | men's shoes down jacket | | | | 4 T-shirt 1 2 | 2 | | T-shirt 5 | 2 | 2 | 2 | | T-shirt T-shirt | +----------+------------+-------------+-------------+---------------+
5 rows in set (0.00 sec)

mysql> select * from goods left join goods_category on goods.category_id = goods_category.category_id;
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+| | 1 men's shoes | 1 | 1 | | | 2 | 2 | | 1 | 1 men's shoes shoes | | 3 | 3 | | 4 men's shoes NULL | NULL | | | | 4 T-shirt 1 2 | 2 | | T-shirt | 2 | 2 | 2 | T-shirt | | T-shirt +----------+------------+-------------+-------------+---------------+
5 rows in set (0.00 sec)

mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id and g.goods_name = 'T-shirt 1');
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+| | 1 men's shoes | 1 | NULL | NULL | | 2 | 2 | 1 men's shoes | NULL | NULL | | 3 | 3 | | 4 men's shoes NULL | NULL | | | | 4 T-shirt 1 2 | 2 | | T-shirt | | 5 T-shirt 2 | 2 | NULL | NULL | +----------+------------+-------------+-------------+---------------+
5 rows in set (0.00 sec)

mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id and c.category_name = 'T shirt');
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+| | 1 men's shoes | 1 | NULL | NULL | | 2 | 2 | 1 men's shoes | NULL | NULL | | 3 | 3 | | 4 men's shoes NULL | NULL | | | | 4 T-shirt 1 2 | 2 | | T-shirt | | 5 The T-shirt 2 | 2 | 2 | | T-shirt +----------+------------+-------------+-------------+---------------+
5 rows in set (0.00 sec)

mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id) where c.category_name = 'shoes';
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+| | 1 men's shoes | 1 | 1 | | | 2 | 2 | 1 | 1 men's shoes | | + shoes----------+------------+-------------+-------------+---------------+
2 rows in set (0.00 sec)

mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id) where g.goods_name = 'T-shirt 1';
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+4 | | T-shirt | 2 | 2 | | T-shirt + 1----------+------------+-------------+-------------+---------------+
1 row in set (0.00 sec)

mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id and g.goods_name = 'the T-shirt 2') where g.goods_name = 'T-shirt 1';
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+1 | 2 | | | 4 T-shirt NULL | NULL | +----------+------------+-------------+-------------+---------------+
1 row in set (0.00 sec)
Copy the code

We can see that when there is a condition of the driven table in the WHERE condition, the query result is consistent with the JOIN result, and there is no NULL value.

Therefore, we can think that the condition for optimizing LEFT JOIN to JOIN is: When there is a non-empty condition of the driven table in the WHERE condition, LEFT JOIN is equivalent to JOIN.

LEFT JOIN will return all data from the driver table. If there is a WHERE condition of the driven table, NULL value will be filtered out. In this case, MYSQL will optimize the LEFT JOIN to JOIN. This allows you to choose your own driver table.

The instance test

Let’s write another test case to verify our conclusion.

CREATE TABLE `A` (
  `id` int(11) auto_increment,
  `a` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`))ENGINE=InnoDB;

delimiter ;;
create procedure idata()
begin
  declare i int;
  set i=1;
  while(i<=100)do
    insert into A (`a`) values(i);
    set i=i+1;
  end while;
end;;
delimiter ;
call idata();

CREATE TABLE `B` (
  `id` int(11) auto_increment,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `b` (`b`))ENGINE=InnoDB;

delimiter ;;
create procedure idata()
begin
  declare i int;
  set i=1;
  while(i<=100)do
    insert into B (`b`) values(i);
    set i=i+1;
  end while;
end;;
delimiter ;
call idata();
Copy the code

We create two identical tables, each with 100 pieces of data, and then execute the LEFT JOIN statement.

mysql> explain select * from A left join B on A.id = B.id where A.a <= 100;
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+- -------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref           | rows | filtered | Extra                    |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+- -------------------------+| 1 | | SIMPLE | A NULL | index | A | A | | NULL | 100 | | 100.00 Using the where; Using the index | | | 1 SIMPLE | B | NULL | eq_ref | PRIMARY | PRIMARY 4 | | example2. Anderson d | | | NULL | 100.00 + 1----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+- -------------------------+
2 rows in set.1 warning (0.00 sec)
Copy the code
mysql> explain select * from A left join B on A.id = B.id where A.a <= 100 and B.b <= 50;
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+- -------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref           | rows | filtered | Extra                    |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+- -------------------------+| | 1 SIMPLE | | NULL | B range | PRIMARY, B | | | NULL 5 B | | | 100.00 50 Using the where; Using the index | | 1 | | SIMPLE | A NULL | eq_ref | PRIMARY, A | PRIMARY 4 | | example2. B.i d | 1 | | 100.00 Using the where | +----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+- -------------------------+
2 rows in set.1 warning (0.00 sec)
Copy the code
mysql> explain select * from A left join B on A.id = B.id where A.a <= 100 and B.b <= 100;
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+- -------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref           | rows | filtered | Extra                    |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+- -------------------------+| | 1 SIMPLE | A | NULL | index | PRIMARY, A 5 | | A | NULL | 100 | | 100.00 Using the where; Using the index | | | 1 SIMPLE | B | NULL | eq_ref | PRIMARY, B | PRIMARY 4 | | example2. Anderson d | 1 | | 100.00 Using the where | +----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+- -------------------------+
2 rows in set.1 warning (0.00 sec)
Copy the code

From the above point of view, after adding the WHERE condition to table B, if the number of rows scanned by table B is less, it is possible to drive the table, which also shows that the LEFT JOIN statement is optimized to JOIN statement.

conclusion

Above we analyzed a slow SQL problem, the process of analysis involves a lot of knowledge, I hope you can seriously study.

At the same time, we come to a conclusion: when there is a non-empty WHERE condition of the driven table, MYSQL will optimize the LEFT JOIN statement to JOIN statement.