Record a SUMMARY of SQL optimization

One requirement is the number of monthly work reports for an organization. There are two tables structured as follows:

The diary log table

CREATE TABLE `diary` ( `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'ID', 'content' text CHARACTER SET UTf8MB4 COLLATE UTf8MB4_bin NOT NULL COMMENT ' 'space_id' bigint(10) NOT NULL COMMENT 'space_id' bigint(20) NOT NULL DEFAULT '0' COMMENT 'creator ID, ', 'member_id' bigint(20) DEFAULT '0', 'user_name' varchar(50) NOT NULL DEFAULT 'COMMENT' user name ', 'visibLE_range' int(1) NOT NULL COMMENT ' ', 'create_time' TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'create time ', 'record_status' int(1) NOT NULL DEFAULT '0' COMMENT' 0 normal ', 'data_status' int(1) NOT NULL DEFAULT '1' COMMENT' 0 draft ', 'source' int(1) NOT NULL COMMENT ' 'data_type' int(1) unsigned NOT NULL DEFAULT '1' COMMENT '1: daily, 2: weekly, 3: monthly, 4: Other ', 'version' bigint(20) DEFAULT NULL COMMENT 'version ',' Gather 'int(1) NOT NULL DEFAULT '0' COMMENT' Whether the summary log 0: no 1: If ', 'type_id' int(20) DEFAULT NULL COMMENT '表ID',' type_name 'varchar(50) DEFAULT NULL COMMENT' 表ID', 'old_diary_id' bigint(20) DEFAULT '0' COMMENT 'old_diary_id' bigint(20) DEFAULT '0', 'read_count' bigint(20) DEFAULT '0', `update_time` timestamp NULL DEFAULT NULL, `diary_time` timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), KEY `space_member_index` (`space_id`,`member_id`), KEY `idx_space_id_create_time` (`space_id`,`create_time`), KEY `idx_space_id_diary_time` (`space_id`,`diary_time`), KEY `idx_template_id_space_id` (`template_id`,`space_id`), KEY `idx_space_id_visible_range_create_time` (`space_id`,`visible_range`,`create_time`) ) ENGINE=InnoDB AUTO_INCREMENT=414221 DEFAULT CHARSET=utf8 COMMENT=' increment 'Copy the code

Diary_visiable_range department table

CREATE TABLE `diary_visible_range` ( `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'ID', 'diary_id' bigint(20) NOT NULL COMMENT 'log ID',' user_id 'bigint(20) DEFAULT '0' COMMENT 'memberId = userId', 'user_name' varchar(50) DEFAULT 'COMMENT' 评 论 ', 'member_id' bigint(10) DEFAULT '0' COMMENT 'open platform memberId for native users ', 'team_id' bigint(10) DEFAULT '0' COMMENT 'space ID,' team_id 'bigint(10) DEFAULT '0' COMMENT' department/team/group ID', 'team_name' varchar(200) DEFAULT NULL COMMENT 'id ', 'create_time' TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'create time ',' data_type 'int(1) NOT NULL COMMENT '1: department, 2: Group, 3 team, 4: individual ', 'group_id' bigint(20) DEFAULT '0', 'visible_type' int(1) DEFAULT NULL COMMENT The last selection did not include ', PRIMARY KEY (' id '), KEY 'team_index' (' team_id '), KEY `diary_user_data_type_index` (`diary_id`,`data_type`) USING BTREE, KEY `idx_team_id_create_time` (`team_id`,`create_time`), KEY `index_space_id_member_id` (`space_id`,`member_id`) ) ENGINE=InnoDB AUTO_INCREMENT=59897 DEFAULT CHARSET=utf8 COMMENT=' viewtable 'Copy the code

The relationship between the two tables is as follows: diary records the work log, diary_visiable_range records the visible scope of the department and the work log. That is, which department or team can see a work log.

Let’s look at the SQL query for the original requirement:

SELECT team_id as teamId, DATE_FORMAT(d.diary_time, '%c') AS MONTH, count(*) AS num
        FROM
        diary d
        JOIN
        diary_visible_range v
        ON d.id = v.diary_id AND v.data_type = 1     
            AND v.team_id in (143289)       WHERE
        d.space_id = 104242
        and d.diary_time BETWEEN '2020-01-01' and  '2021-01-01' AND d.data_status = 1
        AND d.record_status = 0
        GROUP BY
        DATE_FORMAT(d.diary_time, '%Y-%m'),v.team_id
Copy the code

Diary_visiable_range = team_id = team_id = team_id = team_id = team_id = team_id = team_id = team_id = team_id;

Then let’s see what the execution time of this SQL is.

It takes 47 seconds, long enough for the HTTP request to be disconnected. Then we execute explain to see the SQL execution status:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	d	ref	PRIMARY,idx_space_id_create_time,space_member_index,idx_space_id_diary_time,idx_space_id_visible_range_create_time	idx_space_id_create_time	8	const	339992	Using where; Using temporary; Using filesort
1	SIMPLE	v	ref	team_index,diary_user_data_type_index,diary_user_index,idx_team_id_create_time	diary_user_data_type_index	12	db_logger.d.id,const	1	Using where

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

SIMPLE

ref

PRIMARY,idx_space_id_create_time,space_member_index,idx_space_id_diary_time,idx_space_id_visible_range_create_time

idx_space_id_create_time

const

339992

Using where;

Using temporary;

Using filesort

SIMPLE

ref

team_index,diary_user_data_type_index,diary_user_index,idx_team_id_create_time

diary_user_data_type_index

db_logger.d.id,const

Using where

Let’s review the meanings of each field again:

1, ID: serial number in SQL query.

The larger the id column number, the better the execution. If the id column number is the same, the execution is performed from the top down.

2, select_type: query type, can be any of the following types:

select_type	Type specification
SIMPLE	Simple SELECT(no UNION or subquery)
PRIMARY	Outermost SELECT
UNION	The second or subsequent SELECT statement in UNION
DEPENDENT UNION	The second or subsequent SELECT statement in the UNION depends on the external query
UNION RESULT	The result of the UNION
SUBQUERY	The first SELECT in the subquery
DEPENDENT SUBQUERY	The first SELECT in the subquery depends on the external query
DERIVED	Derived table (subquery in FROM clause)
MATERIALIZED	Materialized subqueries
UNCACHEABLE SUBQUERY	Subqueries for which the result set cannot be cached must be reevaluated for each row of the external query
UNCACHEABLE UNION	The second or subsequent SELECT in the UNION is a subquery that cannot be cached

3. Table: indicates the name of the queried table. This may not be an actual table name. Can be the following values:

<unionM,N>: result of referencing ids M and N UNION.

: references the table derived from the result with id N. A derived table can be a result set, such as the result derived FROM a FROM neutron query.

: references the table materialized from the subquery result with id N. That is, a temporary table is generated to hold the results of the subquery.

4, Type: This is one of the most important fields to show what type the query is using. The connection types, from best to worst, are:

System, const, eq_ref, ref, fulltext, ref_or_NULL, index_merge, unique_subquery, index_subquery, range, index, ALL

All types except all can use indexes. All types except index_merge can use only one index.

The system:

A table with only one row of data or an empty table is a special case of const type. And only for MyISAM and Memory tables. In an Innodb engine table, the type column is usually all or index in this case.

Const:

At most one row of records matches. The join type is const when all fields of the union primary key or unique index are compared to constant values. Other databases are also called unique index scans

Eq_ref:

When a multi-table JOIN occurs, only one row can be found in the current table for each row from the previous table. This is probably the best type besides system and const. This type is used when all fields of a primary key or unique non-null index are used as join joins. Eq_ref can be used for indexed columns that use the ‘=’ operator for comparison. The value can be a constant or an expression that uses a column of a table that was read before this table.

Ref:

For each row from the previous table, multiple rows can be matched in the index of this table. Use the ref type when the join applies only to the leftmost prefix of the index or when the index is not a primary key or unique index (that is, the join can match multiple row records).

Ref can be used for indexed columns that are compared using the ‘=’ or ‘<=>’ operators

Fulltext:

This type is used when using full-text indexes. Note that full-text indexes are of high priority. If both full-text indexes and normal indexes exist, mysql will use full-text indexes regardless of the cost

Ref_or_null:

Similar to the ref type, but with the addition of null comparisons. Not much actually.

Index_merge:

Ref_or_null indicates that the query uses more than two indexes, and finally selects the intersection or union. The common and, or conditions use different indexes. The official sort is after ref_OR_NULL, but in practice, the performance may be worse than range most of the time because multiple indexes are read.

Unique_subquery:

Used for in subquery in where, subquery returns a unique value that does not repeat the value, can completely replace the subquery, more efficient. This type replaces the IN subquery ref: ‘value IN (SELECT primary_key FROM single_table WHERE some_expr)’

Index_subquery:

This join type is similar to unique_subquery. Applies to non-unique indexes and can return duplicate values.

The range:

Index range queries, common use =, < >, >, > =, <, < =, IS NULL, the < = >, BETWEEN, IN () or the like operator IN the query.

SELECT * FROM tbl_name
  WHERE key_column BETWEEN 10 and 20;

SELECT * FROM tbl_name
  WHERE key_column IN (10,20,30);
Copy the code

Index:

Index full table scan, sweep the index from top to bottom. There are two cases: one is if the query uses an overwrite index, then it only needs to scan the index to get the data, which is faster than a full table scan because the index is usually smaller than the data table and avoids a second query. Using index is displayed in extra. Otherwise, if a full table scan is performed on an index, there is no Using index prompt.

Mysql > alter table name; Mysql > explain select name from testa; mysql> explain select name from testa; +----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+ | id | select_type  | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+ | 1 | SIMPLE | testa | index | NULL | idx_name | 33 | NULL | 2 | Using index | +----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+ 1 row in set # Mysql > explain select cusno from testa; mysql> explain select cusno from testa; mysql> explain select cusno from testa; mysql> explain select cusno from testa; +----+-------------+-------+------+---------------+------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------+ | 1 | SIMPLE | testa | ALL | NULL | NULL | NULL | NULL | 2 | NULL | + - + -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- + 1 row in set # contains no index column mysql> explain select * from testa; +----+-------------+-------+------+---------------+------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------+ | 1 | SIMPLE | testa | ALL | NULL | NULL | NULL | NULL | 2 | NULL | +----+-------------+-------+------+---------------+------+---------+------+------+-------+ 1 row in setCopy the code

All: indicates full table scan, with the worst performance.

5, possible_keys * * * *

The indexes that might be used by the query are listed here

6, the key

Query the index that is actually used. If select_Type is index_merge, more than two indexes may appear, and only one of other select_types may appear.

7, key_len

The length of the index used by the query (in bytes). If it is a single-column index, then the entire length of the index is counted. If it is a multi-column index, then the query may not use all columns. Take a look at the value of this column and calculate the total length of your multi-column index to see if you are using all columns.

Key_len only calculates the length of the index used by the WHERE condition, and sorting and grouping do not count to key_len if indexes are used.

8, ref * * * *

If it is a join query, the execution plan of the driven table will be displayed. If the condition uses an expression or function, or if the condition column has been implicitly converted, it may be displayed as func

Rows 9, * * * *

Rows is also an important field. This is mysql’s estimate of the number of rows that need to be scanned (not an exact value). This value is a very intuitive indication of how efficient SQL is, and in principle the fewer rows the better.

Extra 10, * * * *

A lot of additional information in EXplain will be displayed in the Extra field. Common ones are the following:

Distinct: The distinc keyword is used in the SELECT section

**Using filesort** : MySQL needs to sort Extra files Using filesort in Extra. Using filesort is recommended to remove this query because it consumes large CPU resources.

Using index: “overwrite index scan “, indicating that the query can find the required data in the index tree without scanning the table data files, usually indicating good performance.

Using temporary: A query uses temporary tables, which are used for sorting, grouping, or join of multiple tables. Optimization is recommended because the query efficiency is not high.

From the above explanation, we can see that this SQL associative query is performed in two simple SELECT. The first SQL query is on the D diary table and hits the union index idx_space_ID_create_time (‘ space_id ‘, ‘create_time’), but the number of rows scanned through the index is about 34W. **Using filesort, Using temporary **Using temporary **Using filesort, Using temporary This could be an optimization point, and we need to figure out how to optimize these two out. **

The second simple select query, on v diary_visibLE_range, hits the combined index diary_user_data_type_index(‘ diary_id ‘, ‘data_type’). The number of rows scanned by the index is only 1. So there’s basically no room for optimization.

To sum up, it can be seen that the points we need to optimize are mainly in diary table. Specifically, there are too many rows scanned by index, temporary memory is needed when two tables are joined due to too many rows, and additional files need to be applied for sorting due to insufficient space.

Through this analysis, we try to reduce the number of lines scanned. Here is a group by query that scans for a year. Let’s try it for a month. What’s the effect?

The time is only 200ms, and the efficiency is greatly improved. Let’s explain the execution plan of this query:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	d	range	PRIMARY,idx_space_id_create_time,space_member_index,idx_space_id_diary_time,idx_space_id_visible_range_create_time	idx_space_id_diary_time	13	NULL	59374	Using where; Using temporary; Using filesort
1	SIMPLE	v	ref	team_index,diary_user_data_type_index,diary_user_index,idx_team_id_create_time	diary_user_data_type_index	12	db_logger.d.id,const	1	Using where

You can see that the number of rows scanned has dropped to 6W, but temporary tables and temporary file memory are still being used. GROUP BY DATE_FORMAT(d.diary_time, ‘%Y-%m’) GROUP BY DATE_FORMAT(d.diary_time, ‘%Y-%m’); Let’s remove this condition and look at the execution plan:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	d	range	PRIMARY,idx_space_id_create_time,space_member_index,idx_space_id_diary_time,idx_space_id_visible_range_create_time	idx_space_id_diary_time	13	NULL	59374	Using where
1	SIMPLE	v	ref	team_index,diary_user_data_type_index,diary_user_index,idx_team_id_create_time	diary_user_data_type_index	12	db_logger.d.id,const	1	Using where

You can see that the extra field is only used by using WHERE, indicating that temporary memory and temporary tables are no longer used for join and sort.

Group by DATE_FORMAT(d.diary_time, ‘%Y-%m’);

The time is less than 200ms, and the efficiency is improved a little. In this way, we can greatly improve the query efficiency by checking the time range of one year, and then change the query aggregated by month to directly query by month. Because it is a SELECT query, we can divide 12 threads into concurrent query, so that the original 47s query we can basically control within 0.4s.

To sum up: because there are too many rows scanned, we sort and aggregate so many rows. If mysql memory is insufficient, we need to apply for additional files to sort, which will take a long time. Therefore, when making queries, try to reduce the number of mysql scan rows and minimize sorting in mysql memory.

The above.

Record a SUMMARY of SQL optimization

Related Posts

Go learn – (RabbitMQ)

Mysql based

RPC Microservices Architecture: AN Overview of RPC Individuals (Absolute Dry Goods)