Cabbage Java self study room covers core knowledge

1. Let’s start with SQL

Company development colleagues write query order paging statement:

SELECT o1.* FROM Orders o1 INNER JOIN (SELECT id FROM orders WHERE sn='XD12345678' LIMIT 1000010,10) O2 ON o1.id = O2. id;Copy the code

The new Java developer wondered why it was so unnecessary. The old man smiled, don’t jump to conclusions, let me slowly explain to you.

2. Compare SQL again

Let’s start with an example of a table called undo_log:

CREATE TABLE `undo_log`  (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `branch_id` bigint(20) NOT NULL,
  `xid` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `context` varchar(128) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `rollback_info` longblob NOT NULL,
  `log_status` int(11) NOT NULL,
  `log_created` datetime(0) NOT NULL,
  `log_modified` datetime(0) NOT NULL,
  `ext` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE INDEX `ux_undo_log`(`xid`, `branch_id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
Copy the code

An almost identical SQL query with one more attribute will result in a completely different retrieval process:

SELECT id, xid, branch_id FROM undo_log WHERE xid='' AND branch_id='';
Copy the code
SELECT id, xid, branch_id, context FROM undo_log WHERE xid='' AND branch_id='';
Copy the code

Why does this result? Is this really superfluous? Let me explain it to you slowly.

3. MySQL back table concept

For a primer on MySQL, read the author’s article MySQL: The Path to Java Engineers

3.1. What is a table-back query?

Let’s start with InnoDB’s index implementation. InnoDB has two main categories of indexes:

  • Clustered Index

  • Secondary index

3.2. What are the differences between InnoDB clustered indexes and normal indexes?

The leaf node of InnoDB’s normal index stores primary key values.

Note: Only InnoDB normal indexes store primary key values. MyISAM secondary indexes point directly to data blocks.

The leaf node of InnoDB’s clustered index stores row records, so InnoDB must have only one clustered index:

  1. If the table defines a primary key, the primary key is the clustered index;

  2. If the table does not define a primary key, the first unique column that is not null is the clustered index;

  3. Otherwise, InnoDB creates a hidden row-ID as the clustered index;

Note: So primary key queries are very fast, locating row records directly.

For example, simply set up a table with several entries:

CREATE TABLE `user`  (
  `id` int(11) NOT NULL,
  `name` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `sex` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  `flag` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  INDEX `idx_name`(`name`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_bin ROW_FORMAT = Compact;

INSERT INTO `user` VALUES (1, 'shenjian', 'm', 'A');
INSERT INTO `user` VALUES (3, 'zhangsan', 'm', 'A');
INSERT INTO `user` VALUES (5, 'lisi', 'm', 'A');
INSERT INTO `user` VALUES (9, 'wangwu', 'f', 'B');
Copy the code

The two B+ tree indexes are shown as follows:

  • Id primary key, clustered index, leaf node stores row records;
  • The leaf node stores the primary key value, that is, ID.

Since row records cannot be located directly from a normal index, what is the query process for a normal index?

Typically, you need to scan the index tree twice. Such as:

SELECT * FROM t WHERE name='lisi';
Copy the code

How is it implemented?

If the path is pink, it needs to scan the index tree twice:

  • The primary key id=5 is located through the common index.
  • Locate the row record by clustering the index;

This is known as a back-table query, which locates the primary key first and then the row record, and has lower performance than a sweep of the index tree.

4. MySQL overwrite index

Mysql > insert into Mysql;

If an index contains (or overwrites) the values of all the fields to be queried, it is called a “overwriting index.”

We know that InnoDB storage engine, if not primary key index, leaf node stores primary key + column value. Eventually, you have to “back to the table,” that is, look it up again by the primary key, which is slower. Overwrite index is to query the column and index is the corresponding, do not do back table operation!

4.2. How to achieve index coverage?

A common method is to create a joint index for the fields being queried (or the fields being queried are already indexed).

Using the above example user table, let’s analyze the results using EXPLAIN keywords.

First SQL statement:

EXPLAIN SELECT id, name FROM user WHERE name='shenjian';
Copy the code

The index leaf node stores the primary key ID. The id and name can be obtained through the index tree of name without returning to the table, which is in line with index coverage and high efficiency.

Second SQL statement:

EXPLAIN SELECT id, name, sex FROM user WHERE name='shenjian';
Copy the code

The name index can be matched. The index leaf node stores the primary key ID, but the sex field can only be obtained by querying back to the table, which does not meet the index coverage. Therefore, the efficiency of obtaining the sex field through the ID value scanning code aggregation index will be reduced.

If (name) single-column index is upgraded to union index (name, sex) :

ALTER TABLE `user` 
DROP INDEX `idx_name`,
ADD INDEX `idx_name`(`name`, `sex`) USING BTREE;
Copy the code

Execute again, second SQL statement:

EXPLAIN SELECT id, name, sex FROM user WHERE name='shenjian';
Copy the code

Can hit the joint index, index leaf node stores the primary key ID, through the index tree of the joint index can obtain the name and sex, no need to return to the table, in line with index coverage, high efficiency.

5. MySQL back table optimization

Now we can explain why the developer wrote this in Chapter 1.

If the following method is followed (which is generally accepted), it can become very difficult when the number of pages reaches a higher order.

SELECT o1.* FROM Orders WHERE sn='XD12345678' LIMIT 1000010Copy the code

Because the data table is InnoDB, according to the structure of InnoDB index, the query process is:

  1. Find primary key by secondary index (find all ids with sn=’XD12345678′).
  2. Then find the corresponding data block by primary key index (find the corresponding data block content by ID).
  3. Query the data of the primary key index for 10010 times based on the value of offset. Discard the previous 10000 entries and retrieve the last 10 entries.

Because we need to query o1.*, the 10000 pieces of data discarded in the previous part are returned to a large number of tables, resulting in a large amount of I/O consumption and a waste of performance, resulting in a long query time.

SELECT o1.* FROM Orders o1 INNER JOIN (SELECT id FROM orders WHERE sn='XD12345678' LIMIT 1000010,10) O2 ON o1.id = O2. id;Copy the code

This method does not need to return to the table when O2 paging query only query ID, and finally make an internal join according to the primary key to fetch data, although it increases the complexity of SQL statements, but the performance is very good.