When it comes to MySQL optimization, many people immediately think of indexes. Yes, using indexes correctly can achieve the purpose of optimizing SQL.

What are the details of how to optimize by index? Let’s talk about it.

If you are interested, you can also read the first two:

【 Tuning directions 】 Get started with MySQL tuning performance monitoring

MySQL Tuning prerequisites – Execution plan Explain and index data structure deduction

If you can wave, give me a “like”, I will be more motivated, thanks ~

The hash index

In MySQL, only Memory’s storage engine explicitly supports hash indexes.

Hash indexes are hash table-based implementations, and only queries that exactly match all columns of the index are valid.

Hash indexes themselves only need to store the corresponding hash value, so the index structure is very compact, which makes hash index lookup very fast.

Limits on hash indexes

  • Hash indexes only containHash valueandRow pointerInstead of storingThe field values, so you cannot use values in the index to avoid reading rows
  • Hash index data is not stored in order of index value, so it cannot be sorted
  • A hash index does not support partial column matching lookups. A hash index uses the entire contents of an indexed column to calculate a hash value
  • They are intended for use only=or< = >Operator equality comparison (but very fast).

They are not used to compare operators (such as <) to find ranges of values.

  • Accessing the data of a hash index is very fast unless there are a lot of hash collisions. When there are hash collisions, the storage engine must traverse all the row Pointers in the linked list, comparing them row by row until it finds all the rows that match the criteria

A Hash, commonly known as a Hash, means to take a bunch of arbitrarily long strings, numbers, or binary inputs and run them through an algorithm (a lot of hashing) to produce a fixed length number (string). For algorithmic reasons, different inputs get different hashes.

A Hash Table, commonly called a Hash Table, maps key values to a certain location in the Table by calculating the Hash value. Then the same key value, the next access or modification will be the same mapping position, different key value because of the calculation of the Hash value will be different mapping position.

Since the Hash value is generated by a certain algorithm, there is a certain possibility that different inputs will get the same Hash value. Even if we can adjust the algorithm to minimize this situation, it cannot be completely avoided. When this happens, we have two different keys mapped to the same location, which is a hash collision.

  • High hashing conflicts can be costly to maintain

Hash index use cases

When you need to store a large number of urls and search by URL, if you use B+ trees, the stored content will be very large (for example, if the URL is too long) :

select id from url where url=' '
Copy the code

You can also hash urls using CRC32. You can use the following query:

select id fom url where url=' ' and url_crc=CRC32(' ')
Copy the code

The high performance of this query is due to the use of small indexes to perform the lookup

Composite index

When multiple columns are used as an index, it is important to note that the correct order depends on the query for the index and how best to meet the sorting and grouping requirements.

SQL > create index (a, B, c); SQL > create index (a, B, c);

Query conditions Whether the composite index is functional
where a=3 Yes, only a is used
where a=3 and b=5 Yes, we use a, B
where a=3 and b=5 and c=6 Yes, we use a, B, and C
where a=3 and b=5 Yes, we use a, B
==where b=3 or c=5== = = = =
where a=3 and c=6 Yes, only a is used
where a=3 and b<10 and c=7 Yes, we use a, B
where a=3 and b like ‘%xxoo%’ and c=7 Yeah, I just used the A

Composite index use cases

Build tables:

create table staffs(
    id int primary key auto_increment,
    name varchar(24) not null default ' ' comment 'name',
    age int not null default 0 comment 'age',
    pos varchar(20) not null default ' ' comment 'job',
    add_time timestamp not null default current_timestamp comment 'Entry Time'
) charset utf8 comment 'Staff Record';
Copy the code

Create a composite index:

alter table staffs add index idx_nap(name, age, pos);
Copy the code
  • Use name,age and pos fields to make query conditions, and the order is in combination index order:
mysql> explain select * from staffs where name='zhangsan' and age=18 and pos='programmer'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: staffs
   partitions: NULL
         type: ref
possible_keys: idx_nap
          key: idx_nap
      key_len: 140
          ref: const,const,const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set.1 warning (0.01 sec)
Copy the code

This query uses the index idx_NAP whose type is ref and whose ref is const,const,const. This is the ideal condition.

  • Name,age and pos fields are used for query conditions, and the order is in combination index order, but age uses range query:
mysql> explain select * from staffs where name='zhangsan' and age>18 and pos='programmer'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: staffs
   partitions: NULL
         type: range
possible_keys: idx_nap
          key: idx_nap
      key_len: 78
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using index condition
1 row in set.1 warning (0.01 sec)
Copy the code

Index IDx_NAP is also used in this query, but the type is range. We know that the query efficiency of range is lower than that of REF. The reason is that the query only uses the name and age indexes. The POS field of the combined indexes is not used.

  • The query criteria are applied only to the last two fields of the composite index
mysql> explain select * from staffs where age=18 and pos='programmer'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: staffs
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where
1 row in set.1 warning (0.00 sec)
Copy the code

This type is ALL, in which case the index is invalidated, no index is used, and the efficiency is minimal.

Clustered index and non-clustered index

Clustering index

Not a single index type, but a type of data storage in which rows of data are stored close together with adjacent key values.

InnoDB is stored in clustered indexes, which are stored in the same file, an.ibd file.

Advantages of clustered indexes

  • You can keep related data together
  • Data access is faster because indexes and data are kept in the same tree
  • Queries that use an overridden index scan can directly use primary key values in leaf nodes

Disadvantages of clustered indexes

  • Clustered data maximizes performance in IO intensive applications. If the data is all in memory, what is the advantage of clustered indexes
  • The insertion speed depends heavily on the insertion order, with the primary key order being the fastest
  • Updating clustered index columns is expensive because it forces each updated row to be moved to a new location
  • Tables based on clustered indexes can suffer from page splitting when new rows are inserted or when the primary key is updated so that rows need to be moved
  • Clustered indexes can cause slow full table scans, especially if rows are sparse or the data store is discontinuous due to page splitting

Non-clustered index

Data files are stored separately from index files, as is the case with MyIsam storage engine.

Cover index

If an index contains the values of all the fields to be queried, it is called an overwrite index, such as this:

select name,age,pos from staffs where name='zhangsan' and age=18 and pos='programmer';
Copy the code

Not all types of indexes can be called overwritten indexes. Overwritten indexes must store the values of indexed columns.

Different stores implement overwriting indexes in different ways. Not all engines support overwriting indexes. Memory does not support overwriting indexes.

What are the benefits of overwriting indexes

  • Index entries are usually much smaller than the row size, and if you only need to read the index, MySQL will greatly reduce the number of visits to the data
  • Because indexes are stored in order of column values, IO intensive range queries are much less costly than randomly reading each row of data from disk
  • Some storage engines like MYISAM only cache indexes in memory, and the data is dependent on the operating system for caching, so accessing the data requires a system call, which can cause serious performance problems
  • Overwrite indexes are especially useful for INNODB tables because of INNODB’s clustered indexes

case

MySQL official website is very thoughtful to give some examples, and very thoughtful to build some tables for us to use, we can download it directly use!!

For example, MySQL provides sakila database:

We can get it from here:

After downloading it, we can import it locally for testing:

mysql> source /root/soft/sakila-schema.sql
mysql> source /root/soft/sakila-data.sql
Copy the code

All right, you can have fun with these tables.

  1. When an indexed overridden query is initiated, theexplaintheextraYou can see the columnsusing indexAt this point, the overwrite index is used
mysql> explain select store_id,film_id from inventory\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: inventory
   partitions: NULL
         type: index
possible_keys: NULL
          key: idx_store_id_film_id
      key_len: 3
          ref: NULL
         rows: 4581
     filtered: 100.00
        Extra: Using index
1 row in set.1 warning (0.00 sec)
Copy the code
  1. In most storage engines, an overwrite index can only cover queries that access only the middle column of the index. However, it can be further optimized to use InnoDB secondary indexes to overwrite queries

For example, actors:

mysql> desc actor;
+-------------+----------------------+------+-----+-------------------+-----------------------------+
| Field       | Type                 | Null | Key | Default           | Extra                       |
+-------------+----------------------+------+-----+-------------------+-----------------------------+
| actor_id    | smallint(5) unsigned | NO   | PRI | NULL              | auto_increment              |
| first_name  | varchar(45)          | NO   |     | NULL              |                             |
| last_name   | varchar(45)          | NO   | MUL | NULL              |                             |
| last_update | timestamp            | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------------+----------------------+------+-----+-------------------+-----------------------------+
4 rows in set (0.00 sec)
Copy the code

Actor uses the innoDB storage engine and has a secondary index on the last_name field. Although the index column does not contain the primary key actor_ID, it can also be used to override the actor_ID

mysql> explain select actor_id,last_name from actor where last_name='HOPPER'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: actor
   partitions: NULL
         type: ref
possible_keys: idx_actor_last_name
          key: idx_actor_last_name
      key_len: 182
          ref: const
         rows: 2
     filtered: 100.00
        Extra: Using index
1 row in set.1 warning (0.00 sec)
Copy the code

Some of the details are optimized by indexing

1. Avoid expressions when using indexed columns for queries, and put calculations in the business layer rather than the database layer

Look at an example:

mysql> select actor_id from actor where actor_id=4;
+----------+
| actor_id |
+----------+
|        4 |
+----------+
1 row in set (0.00 sec)

mysql> select actor_id from actor where actor_id+1=5;
+----------+
| actor_id |
+----------+
|        4 |
+----------+
1 row in set (0.02 sec)
Copy the code

The query result of the first statement is the same as that of the second statement, but the execution efficiency is different:

mysql> explain select actor_id from actor where actor_id=4; +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+--------- ----+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+--------- -- -- + | | 1 SIMPLE | actor | NULL | const | PRIMARY | PRIMARY | 2 | const | 1 | | 100.00 Using index | +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+--------- ----+ 1 row in set, 1 warning (0.00 SEC) mysql> explain select actor_id from actor where actor_id+1=4; +----+-------------+-------+------------+-------+---------------+---------------------+---------+------+------+--------- -+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------------------+---------+------+------+--------- -+--------------------------+ | 1 | SIMPLE | actor | NULL | index | NULL | idx_actor_last_name | 182 | NULL | 200 | 100.00 | Using the where; Using index | +----+-------------+-------+------------+-------+---------------+---------------------+---------+------+------+--------- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set, 1 warning (0.02 SEC)Copy the code

The first type is const more efficient than the second index statement.

2. Use primary key queries instead of other indexes because primary key queries do not trigger back queries

3. Use prefix indexes

Sometimes you need to index very long strings, which can make the index larger and slower. In general, you can use a portion of the string at the beginning of a column to greatly save index space and improve index efficiency.

However, this reduces index selectivity, which is the ratio of non-duplicate index values to the total number of table records, ranging from 1/#T to 1.

The more selective an index is, the more efficient the query is, because a more selective index allows MySQL to filter out more rows in a lookup.

Columns of BLOB,TEXT, and VARCHAR types must be indexed with prefix indexes. MySQL does not allow indexes to be full length. The trick is to select prefixes that are long enough to be highly selective. It shouldn’t take too long to pass.

Let’s construct some tables and data:

mysql> create table citydemo(city varchar(50) not null);
Query OK, 0 rows affected (0.04 sec)

mysql> insert into citydemo(city) select city from city;
Query OK, 600 rows affected (0.03 sec)
Records: 600  Duplicates: 0  Warnings: 0

mysql> insert into citydemo(city) select city from citydemo;
Query OK, 600 rows affected (0.03 sec)
Records: 600  Duplicates: 0  Warnings: 0

mysql> insert into citydemo(city) select city from citydemo;
Query OK, 1200 rows affected (0.01 sec)
Records: 1200  Duplicates: 0  Warnings: 0

mysql> insert into citydemo(city) select city from citydemo;
Query OK, 2400 rows affected (0.05 sec)
Records: 2400  Duplicates: 0  Warnings: 0

mysql> insert into citydemo(city) select city from citydemo;
\Query OK, 4800 rows affected (0.09 sec)
Records: 4800  Duplicates: 0  Warnings: 0

mysql> insert into citydemo(city) select city from citydemo;
Query OK, 9600 rows affected (0.16 sec)
Records: 9600  Duplicates: 0  Warnings: 0
Copy the code

Thus, we construct some duplicate data.

Find a list of the most common cities:

mysql> select count(*) as cnt,city from citydemo group by city order by cnt desc limit 10;
+-----+------------------+
| cnt | city             |
+-----+------------------+
|  64 | London           |
|  32 | Omiya            |
|  32 | Pontianak        |
|  32 | Antofagasta      |
|  32 | Salala           |
|  32 | Batna            |
|  32 | Shubra al-Khayma |
|  32 | Brescia          |
|  32 | Sunnyvale        |
|  32 | Clarksville      |
+-----+------------------+
10 rows in set (0.04 sec)
Copy the code

Finding that each value occurs 32+ times, let’s find the most frequent city prefixes, starting with 3 prefix letters:

mysql> select count(*) as cnt,left(city,3) as pref from citydemo group by pref order by cnt desc limit 10;
+-----+------+
| cnt | pref |
+-----+------+
| 448 | San  |
| 192 | Cha  |
| 160 | Sal  |
| 160 | Tan  |
| 160 | Sou  |
| 160 | al-  |
| 128 | Hal  |
| 128 | Bat  |
| 128 | Man  |
| 128 | Sha  |
+-----+------+
Copy the code

Left (city, 5) left(city, 5) left(city, 5) left(city, 5)

mysql> select count(*) as cnt,left(city,5) as pref from citydemo group by pref order by cnt desc limit 10;
+-----+-------+
| cnt | pref  |
+-----+-------+
| 128 | South |
|  96 | Santa |
|  64 | Chang |
|  64 | Toulo |
|  64 | Santi |
|  64 | Xiang |
|  64 | Valle |
|  64 | Londo |
|  64 | Saint |
|  64 | San F |
+-----+-------+
Copy the code

As you can see, the difference becomes smaller, continue with this method, and finally determine that left(City, 7) is closest to the final result:

mysql> select count(*) as cnt,left(city,7) as pref from citydemo group by pref order by cnt desc limit 10;
+-----+---------+
| cnt | pref    |
+-----+---------+
|  64 | Valle d |
|  64 | Santiag |
|  64 | London  |
|  64 | San Fel |
|  32 | Antofag |
|  32 | Batna   |
|  32 | Brescia |
|  32 | Clarksv |
|  32 | El Mont |
|  32 | Greensb |
+-----+---------+
Copy the code

That is, the selectivity of the prefix is close to that of the full column.

Index selectivity is the ratio of non-duplicate index values to the total number of records in the table.

select count(distinct left(city,3))/count(*) as sel3,
count(distinct left(city,4))/count(*) as sel4,
count(distinct left(city,5))/count(*) as sel5,
count(distinct left(city,6))/count(*) as sel6,
count(distinct left(city,7))/count(*) as sel7,
count(distinct left(city,8))/count(*) as sel8 
from citydemo;
Copy the code

Results:

At this point, we can happily build prefix indexes:

alter table citydemo add key(city(7));
Copy the code

Take a look:

Type is ref level, efficiency is still very good!

Note: Prefix indexes are an effective way to make indexes smaller and faster, but there are drawbacks: MySQL cannot use prefix indexes for order BY and group BY.

4. Use index scans to sort

MySQL has two ways to generate ordered results: by sorting or by index. If the type column in explain is index, MySQL uses index scanning to sort results.

Scanning the index itself is fast, because you only need to move from one index record to the next. However, if the index does not cover all the columns required by the query, then you have to go back to the table to query the corresponding row for each index record scanned. This is basically random I/O, so reading data in index order is usually slower than sequential full table scanning.

MySQL can use the same index for both sorting and row lookup, and indexes should be designed to do both if possible.

MySQL can use indexes to sort results only when the order of the columns in the index is exactly the same as that in the ORDER by clause, and all columns are sorted in the same way. MySQL can use indexes to sort results only when the order by clause references all the columns in the first table. The order by clause has the same restriction as a lookup query. It must satisfy the left-most prefix of the index. Otherwise, MySQL will need to perform sequential operations and cannot use index sort.

Ex. :

The rental table in sakila database is a composite index named rental_date on rental_date,inventory_id,customer_id:

eg.1

mysql> explain select rental_id,staff_id from rental where rental_date='2005-05-25' order by inventory_id,customer_id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: rental
   partitions: NULL
         type: ref
possible_keys: rental_date
          key: rental_date
      key_len: 5
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index condition
1 row in set.1 warning (0.03 sec)
Copy the code

The order by clause here does not satisfy the left-most prefix requirement for an index and can also be used for query sorting because the first column of the query rental_date=’2005-05-25′ is specified as a constant.

eg. 2

mysql> explain select rental_id,staff_id from rental where rental_date='2005-05-25' order by inventory_id desc\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: rental
   partitions: NULL
         type: ref
possible_keys: rental_date
          key: rental_date
      key_len: 5
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where
1 row in set.1 warning (0.03 sec)

Copy the code

The query provides a constant condition for the first column of the index, and the second column is used for sorting, combining the two columns together to form the left-most prefix of the index (order rental_date,inventory_id)

eg.3

mysql> explain select rental_id,staff_id from rental where rental_date>'2005-05-25' order by rental_date,inventory_id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: rental
   partitions: NULL
         type: ALL
possible_keys: rental_date
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 16005
     filtered: 50.00
        Extra: Using where; Using filesort
1 row in set.1 warning (0.00 sec)
Copy the code

The first column of this query uses the range query, and the following column will not be sorted by index, regardless of the order. You can also see from the results that the query does not use indexes (Using filesort).

eg. 4

mysql> explain select rental_id,staff_id from rental where rental_date='2005-05-25' order by inventory_id desc,customer_id asc\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: rental
   partitions: NULL
         type: ref
possible_keys: rental_date
          key: rental_date
      key_len: 5
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index condition; Using filesort
1 row in set.1 warning (0.00 sec)
Copy the code

This query uses two different sorting directions (desc, ASC), but the index columns are sorted in positive order, Using filesort.

eg. 5

mysql> explain select rental_id,staff_id from rental where rental_date='2005-05-25' order by inventory_id desc,customer_id desc\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: rental
   partitions: NULL
         type: ref
possible_keys: rental_date
          key: rental_date
      key_len: 5
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where
1 row in set.1 warning (0.00 sec)
Copy the code

This one uses index sort.

Combined with eg. 4, why not use indexes when sorting in two directions (desc first, asC later), but in the same direction?

Select * from desc; select * from desc; select * from desc; select * from desc; But you’re looking in both directions at the same time, B+ tree, how do you sort a node, how do you read the index directly?

5. Indexes for union all,in,or can be used, but in is recommended

I’m going to use the sakila database again

mysql> explain select * from actor where actor_id = 1 union all select * from actor where actor_id = 2; +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ | | 1 PRIMARY | actor | NULL | const | PRIMARY | PRIMARY | 2 | const | 1 | | NULL 100.00 | | 2 | UNION | actor | NULL | Const | PRIMARY | PRIMARY | 2 | const | 1 | | NULL | 100.00 +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ 2 rows in set, 1 warning (0.00 SEC) mysql> explain select * from actor where actor_id in (1,2); +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+---------- ---+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+---------- -- -- -- + | | 1 SIMPLE | actor | NULL | range | PRIMARY | PRIMARY | | NULL | 2 | | 100.00 Using the where | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+---------- --+ 1 row in set, 1 warning (0.00 SEC) mysql> explain select * from actor where actor_id = 1 or actor_id =2; +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+---------- ---+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+---------- -- -- -- + | | 1 SIMPLE | actor | NULL | range | PRIMARY | PRIMARY | | NULL | 2 | | 100.00 Using the where | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+---------- --+ 1 row in set, 1 warning (0.00 SEC)Copy the code

The execution plan was used to test union ALL, IN and OR respectively, and it was found that union All was executed in two steps, while IN and OR only used one step, with higher efficiency.

Set profiling=1; / / show profiles if profiling=1; / / show profiles if profiling=1 :

mysql> show profiles;
+----------+------------+-------------------------------------------------------+
| Query_ID | Duration   | Query                                                 |
+----------+------------+-------------------------------------------------------+
|        1 | 0.00081575 | select * from actor where actor_id in (1.2)           |
|        2 | 0.02360075 | select * from actor where actor_id = 1 or actor_id =2 |
+----------+------------+-------------------------------------------------------+
2 rows in set.1 warning (0.00 sec)
Copy the code

As you can see, execution with OR takes longer than in.

Because the USE of OR condition query, will first judge a condition for screening, and then judge the other conditions in OR and then filter, while in query directly in the collection of in once screening.

So, union all,in,or can all use indexes, but in is recommended

6. Range columns can use indexes

  • The scope conditions are:<,< =,>,> =,between
  • A range column can use the index, but the column following the range column cannot use the index, which can be used for at most one range column

The index optimization details (I) about the use of indexes for range columns and the rules for indexing are described.

7. The cast will scan all tables

For example, here is a table where the phone column is indexed by vARCHAR and the phone number is stored:

create table user(id int,name varchar(10),phone varchar(11));

alter table user add index idx_1(phone);
Copy the code

Explain where phone=13800001234 and phone=’13800001234′ :

As you can see, the former triggers a full table scan (type ALL), while the latter uses the index for the query.

So, this detail reminds us that MySQL will help us to do some casts when querying, but if there is an index, the index will not take effect, so just use the defined data type to query.

4. Updates are frequent, and indexes should not be built on fields with low data differentiation

Data update operations change the B+ tree, so indexing frequently updated fields can significantly degrade database performance.

For example, it is meaningless to build indexes for attributes such as gender, which are not very differentiated, and cannot effectively filter data.

Generally, indexes can be created when the distinctness is above 80%. The distinctness can be calculated using count(distinct(column name))/count(*).

5. Do not create a column that is null in the index. Otherwise, the expected result may be obtained

If an index is created on a column, it is best not to null it. However, it depends on the case, since many fields are allowed to be null in actual business scenarios.

6. When you need to join a table, it is recommended that there be no more than three tables, because the data types of the columns to be joined must be the same

There’s a clause in ali’s statute that says:

[Mandatory] Disallow join when more than three tables are used. The data types of the fields to be joined must be the same. In the case of multi-table associated query,

Ensure that the associated field has an index.

Note: Pay attention to table index, SQL performance even if double table JOIN.

Associated fields without indexes can significantly degrade MySQL performance.

MySQL’s join uses a nested loop algorithm

  • Nested-Loop Join Algorithm

A simple nested loop join (NLJ) algorithm that reads rows at a time from the first table in a loop and passes each row to a nested loop that processes the next table in the join. Repeat this process the same number of times as the table to join.

Suppose you want to perform a join between three tables T1, T2, and T3 using the following join type:

Table   Join Type
t1      range
t2      ref
t3      ALL
Copy the code

Then, using the NLJ algorithm, the join executes like this:

for each row in t1 matching range {
  for each row in t2 matching reference key {
    for each row in t3 {
      if row satisfies join conditions, send to client
    }
  }
}
Copy the code

Because the NLJ algorithm passes rows from the outer loop to the inner loop once, it typically reads the tables processed in the inner loop multiple times.

  • Block Nested-Loop Join Algorithm

Block nested loop (BNL) The nesting algorithm uses buffering of rows read in the outer loop to reduce the number of times tables in the inner loop must be read.

For example, if 10 lines are read into the buffer and the buffer is passed to the next inner loop, each line read in the inner loop can be compared to all 10 lines in the buffer.

This reduces the number of times the inner table must be read by an order of magnitude.

for each row in t1 matching range {
  for each row in t2 matching reference key {
    store used columns from t1, t2 in join buffer
    if buffer is full {
      for each row in t3 {
        for each t1, t2 combination in join buffer {
          if row satisfies join conditions, send to client
        }
      }
      empty join buffer
    }
  }
}

if buffer is not empty {
  for each row in t3 {
    for each t1, t2 combination in join buffer {
      if row satisfies join conditions, send to client
    }
  }
}
Copy the code

If S is the size of combinations T1 and T2 for each store in the connection buffer, and C is the number of combinations in the buffer, then the number of times table t3 is scanned:

(S * C)/join_buffer_size + 1

Join_buffer_size you can see how big it is:

mysql> show variables like '%join_buffer%';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| join_buffer_size | 262144 |
+------------------+--------+
Copy the code

By default, the size of join_BUFFer_SIZE is 256K

7. Use limit when you can

Don’t think of limit as pagination. Limit means to limit output, and pagination is just a basic use of it.

If limit is not used, MySQL will select the first x rows of the query and return the result. If limit is found, MySQL will return the query result.

8. You are advised to limit the number of indexes in a single table to five

It’s not that more indexes are better, indexes take up space too!

9. The number of fields in the combined index cannot exceed five

10. The following misconceptions should be avoided when creating indexes

  • The more indexes, the better
  • Optimize too early, optimize without knowing the system

The index monitoring

Index usage status:

mysql> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 6     |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 1117  |
+-----------------------+-------+
7 rows in set (0.05 sec)
Copy the code

Definitions of various variables:

Handler_read_first Number of times the first entry in the index was read

Handler_read_key Number of times to get data by index

Handler_read_last Number of times the last entry in the index was read

Handler_read_next Number of times the next piece of data is read by index

Handler_read_prev Number of times that the last piece of data was read by the index

Handler_read_rnd Number of times data is read from a fixed location

Handler_read_rnd_next Number of times the next piece of data is read from the data node

Usually we just focus on Handler_read_key and Handler_read_rnd_next. If their values are large, it indicates that the index is used frequently and the index utilization is high. On the other hand, if both values are 0 or very small, this is the time to panic, indicating that the index is not working, time to check the SQL statement!

Look at two examples of index optimization

Ready to watch

SET FOREIGN_KEY_CHECKS=0;
DROP TABLE IF EXISTS `itdragon_order_list`;
CREATE TABLE `itdragon_order_list` (
  `id` bigint(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary key ID, default growth',
  `transaction_id` varchar(150) DEFAULT NULL COMMENT 'Transaction Number',
  `gross` double DEFAULT NULL COMMENT 'Gross Income (RMB)',
  `net` double DEFAULT NULL COMMENT 'Net Income (RMB)',
  `stock_id` int(11) DEFAULT NULL COMMENT 'Delivery warehouse',
  `order_status` int(11) DEFAULT NULL COMMENT 'Order Status',
  `descript` varchar(255) DEFAULT NULL COMMENT 'Customer Service Notes',
  `finance_descript` varchar(255) DEFAULT NULL COMMENT 'Financial Notes',
  `create_type` varchar(100) DEFAULT NULL COMMENT 'Create type',
  `order_level` int(11) DEFAULT NULL COMMENT 'Order Level',
  `input_user` varchar(20) DEFAULT NULL COMMENT 'Entry person',
  `input_date` varchar(20) DEFAULT NULL COMMENT 'Entry time'.PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10003 DEFAULT CHARSET=utf8;

INSERT INTO itdragon_order_list VALUES ('10000'.'81X97310V32236260E'.'6.6'.'6.13'.'1'.'10'.'ok'.'ok'.'auto'.'1'.'itdragon'.'the 2017-08-28 17:01:49');
INSERT INTO itdragon_order_list VALUES ('10001'.'61525478BB371361Q'.'18.88'.'18.79'.'1'.'10'.'ok'.'ok'.'auto'.'1'.'itdragon'.'the 2017-08-18 17:01:50');
INSERT INTO itdragon_order_list VALUES ('10002'.'5RT64180WE555861V'.'20.18'.'20.17'.'1'.'10'.'ok'.'ok'.'auto'.'1'.'itdragon'.'the 2017-09-08 17:01:49');
Copy the code

Case 1:

select * from itdragon_order_list where transaction_id = “81X97310V32236260E”;

View by execution plan

mysql> explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E";
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+- ------------+
| id | select_type | table               | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+- ------------+
|  1 | SIMPLE      | itdragon_order_list | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    3 |    33.33 | Using where |
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+- ------------+
Copy the code

If type is set to ALL, scan ALL tables.

Optimization 1: Create a unique index for transaction_id:

create unique index idx_order_transaID on itdragon_order_list (transaction_id);
Copy the code

Let’s look at the execution plan:

mysql> explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E";
+----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+------ -+------+----------+-------+
| id | select_type | table               | partitions | type  | possible_keys      | key                | key_len | ref   | rows | filtered | Extra |
+----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+------ -+------+----------+-------+
|  1 | SIMPLE      | itdragon_order_list | NULL       | const | idx_order_transaID | idx_order_transaID | 453     | const |    1 |   100.00 | NULL  |
+----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+------ -+------+----------+-------+
Copy the code

When an index is created, the type of the unique index is const, and the result can be found through the index once. The type of a normal index is ref, which means that a non-unique index scan is performed until the index file is scanned.

Optimization 2: Select transaction_id instead of select * from extra where using index is used

mysql> explain select transaction_id from itdragon_order_list where transaction_id = "81X97310V32236260E"\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: itdragon_order_list
   partitions: NULL
         type: const
possible_keys: idx_order_transaID
          key: idx_order_transaID
      key_len: 453
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index
Copy the code

Case 2:

Create composite indexes

create index idx_order_levelDate on itdragon_order_list (order_level,input_date);
Copy the code

perform

mysql> explain select * from itdragon_order_list order by order_level,input_date\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: itdragon_order_list
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using filesort
Copy the code

Type: ALL, Extra: Using filesort;

  • You can use force index to force an index
mysql> explain select * from itdragon_order_list force index(idx_order_levelDate) order by order_level,input_date\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: itdragon_order_list
   partitions: NULL
         type: index
possible_keys: NULL
          key: idx_order_levelDate
      key_len: 68
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: NULL
Copy the code

At this point, type is at index level, slightly improving efficiency.

  • It doesn’t make much sense to sort the order, and it doesn’t make much sense to index the order level, so you can first determine the value of order_level and then sort the input_date
mysql> explain select * from itdragon_order_list where order_level=3 order by input_date\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: itdragon_order_list
   partitions: NULL
         type: ref
possible_keys: idx_order_levelDate
          key: idx_order_levelDate
      key_len: 5
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index condition
Copy the code

In this way, the type can reach the ref level, and the effect is better!