How do I create a high-performance index

EXPLAIN type analysis

The Explain directive helps us see some of the details of how the query optimizer handles the execution plan

Syntax: explain + execution plan

If we had two tables like this (category table and item table), we would combine the Explain field to explain:

CREATE TABLE category (id int primary key auto_increment, name varchar(50) not null, c_desc varchar(200), create_time datetime not null default now(), key name_index(`name`) )ENGINE = INNODB; P_id int PRIMARY KEY auto_increment, category_id int not null name VARCHAR(50) not null )ENGINE = INNODB;Copy the code

Field Description:

Id: Select the identifier

From this field, we can know the order in which the SQL statements are executed.

If the ids are the same, the command is executed from top to bottom. If the ids are different, the command is executed earlier

Select_type: indicates the query type

SIMPLE: indicates SIMPLE query

PRIMARY: The statement outside the subquery is marked as PRIMARY, the PRIMARY table

UNION: The later query when queried using the UNION join table

DEPENDENT UNION: The second or subsequent select statement in the UNION, depending on the external query

UNION RESULT: The RESULT of the UNION, the RESULT set of the start of the second query in the UNION statement and all subsequent select statements

SUBQUERY: The first SELECT in a SUBQUERY whose results are independent of external queries

DEPENDENT SUBQUERY: The first SELECT in a SUBQUERY whose results depend on an external query

DERIVED: A subquery of the SELECT FROM clause of a DERIVED table

UNCACHEABLE SUBQUERY: The result of a SUBQUERY cannot be cached and the first line of the external link must be reevaluated

Table: output result set table

This line analyzes which database table corresponds to and displays aliases if any exist

Partitions: Matching partitions (not available in version 5.5 and earlier, explain partitions select… To display columns with partitions)

Indicates which partition is used, invisible if the table is not explicitly partitioned

Type: indicates the index star used

The order from good to bad is: system > const > eq_ref > ref > range > index > all

Index optimization generally requires a minimum range level

Blog.csdn.net/weixin_4434…

Possible_keys: possible index during query

Represents the index that can be used in this query, or NULL if there is no related index

Key: indicates the actual index

If no index is used in this query, this column is NULL

Key_len: length of the index field

The number of bytes used in the index. This column shows the maximum possible length of the index field, not the century length used. It is calculated at the time the index is defined, not in real time.

Length calculation formula:

NULL = 10 * (character set: UFF8MB4 =4, UTF8 =3, GBK =2,latin1=1)+1(NULL)+2(latin1=1)

NULL = 10 *(character set: uFF8MB4 =4, UTF8 =3, GBK =2,latin1=1)+2

Char (10) fixed field allow NULL = 10 * (character set: uFF8MB4 =4, UTf8 =3, GBK =2,latin1=1)+1(NULL)

Char (10) * (character set: uff8MB4 =4, UTf8 =3, GBK =2,latin1=1)

Ref: a column referenced to the previous table

Rows: number of rows scanned (estimated number of rows)

Filtered: percentage of rows filtered by table criteria

Extra: Additional information description

Here are some common ones to illustrate:

Using index: An overwrite index is used

Using where:

SQL > select * from ‘where’; SQL > select * from ‘where’;

2. The query column is not covered by the index, and the WHERE clause filters non-index columns

Using index condition: Range filtering index

Using filesort: Using disk sorting, which should be avoided

Using temporary: Use temporary tables. This should be avoided

What is the index?

An index is an efficient query data structure that helps queries find records quickly.

Indexing is the most effective tool for query optimization. Indexes can improve query performance by several orders of magnitude.

How does the index work?

In Mysql, the storage engine first finds the corresponding value on the index based on the matched index, and then finds the corresponding data row based on the matched index record.

This process is similar to trying to find a specific topic in a book. We first turn to the “table of contents” of the book, and then find the corresponding page number according to the “table of contents”.

CREATE TABLE hero ( id int primary key auto_increment, name varchar(50) not null, hero_desc varchar(200), key name_index(`name`) )ENGINE = INNODB; INSERT INTO hero VALUES(1,' hero ',' hero ',' hero... '); INSERT INTO hero VALUES(2,'... '); INSERT INTO hero VALUES(3,' hero ','... '); INSERT INTO hero VALUES(4,'... '); INSERT INTO hero VALUES(5,' feng 7 ','... '); INSERT INTO hero VALUES(6,' 1 ',' 2 ','... '); INSERT INTO hero VALUES(7,' mo ','... ');Copy the code

Suppose we want to find data with “name” as “Zhang SAN”. Select * from primary key where id = 1; select * from primary key where id = 1

Select * from hero where name = 'zhang3 ';Copy the code

Type of index

There are many types of indexes, and we can choose different indexes according to different scenarios. Mysql indexes are used in the storage engine layer. Different storage engines implement indexing in different ways. Each storage engine supports different indexes, and not all storage engines support all index types.

INNODB is a common storage engine, and the default index is b-tree unless otherwise specified.

B-tree indexes

B-tree uses a B+ Tree data structure to store data. Most mysql storage engines support this kind of index. Storage engines use b-tree indexes in different ways, with different performance. For example, MyISAM uses prefix compression to make indexes smaller, while InnoDB stores them in raw data format. MyISAM index refers to index columns by physical location of the data store, while InnoDB refers to index columns by primary key.

A B-tree usually means that all values are stored sequentially and that each leaf is the same distance from its parent.

Index on B+ Tree data structure (InnoDB engine):

B-tree speeds up data access because the storage engine no longer needs to perform a full table scan to find the specified data. It starts at the root of the tree and stores Pointers to child nodes, which the storage engine uses to look down. By than

Compare the value of the node page and the value to look for to find the appropriate pointer to the next level of child nodes. These Pointers actually define the lower and upper limits of values in the child node page. The storage engine either eventually finds the value it is looking for, or it does not exist.

A leaf node is not used for child nodes, and its Pointers point to the data being indexed, which may be Pointers to clustered indexes or rows of data.

The type of query that can be used with the B-tree index:

B-tree is suitable for full key value, key value range, or key prefix lookup. The key prefix search only applies to the search for the leftmost prefix of the key.

Create a list of items. Create primary key indexes, single-value indexes, and compound indexes.

Create table product(id int primary key auto_increment, name varchar(50) not null comment '表 名 ', Price DECIMAL NOT NULL COMMENT 'price ', category_id product_name(' name'), key category_and_name(`category_id`,`name`) )engine = innodb;Copy the code

Full value matching:

Full value matching refers to matching all columns in the index. Such as:

Select * from product where name = 'product ';Copy the code

Select * from product WHERE category_id = 1 and category_id = 'c002 ';Copy the code

Matches the leftmost prefix:

The left-most prefix applies to composite indexes (multi-column indexes), such as the category_and_NAME index above.

The left-most prefix, hence the name, matches the index from left to right. Create a compound index (a,b,c) for a,b,c:

So when does the index take effect?

a ,

a b ,

a b c

select * from product where category_id = 1;
Copy the code

Match column prefixes:

Matches the beginning of the value of an index column,like ‘a%’,like ‘b%’.startwith

Select * from product where name = '%';Copy the code

Matching range value:

Range lookup of index columns, such as between, > <,≥,≤

select * from product where id < 5;
Copy the code

select * from product where id between 1 and 5;
Copy the code

Matches exactly one column and ranges exactly another column:

Select * from product WHERE name = 'c002' and category_id > 1;Copy the code

Index-only queries:

That is, overwrite the index, no need to query the data row back to the table, use explain analysis,extra column is Using index;

Select name from product where name = 'product';Copy the code

A B-tree index uses the data storage structure of B+ Tree. Therefore, the nodes in the index Tree are ordered. In addition to searching By value, the index can also be searched By Order. In general, if a B-tree can find a value a certain way, it can be sorted that way.

So if the Order by clause satisfies the above query types, then the index can also satisfy the corresponding sorting requirements.

There are several types of query that can use indexes. Following the above query rules, indexes can be used reasonably and query efficiency can be improved.

The hash index

Hash tables are implemented based on hash tables, and only queries that accurately match all columns of the index are valid.

The storage engine computes a hash code for each row in the index column. The hash code is a small value, and the hash code is different for rows with different key values. Hash indexes store all hash codes in the index, along with a pointer to each row of data in the hash table. If a hash collision occurs, it is stored as a linked list in a hash entry.

In mysql, only the Memory engine explicitly supports hash indexes. We will not explain him much here.

Spatial data index

The MyISAM engine supports spatial indexing and can be used as a geographic data store.

Unlike the B-tree index, which is mainly related to the storage structure, it does not require a prefix index. Spatial indexes index data from all dimensions. When querying, you can effectively combine queries using any dimension. You must use MYSQL’s GIS related functions such as MBRCONTAINS() to maintain data. MySQL· GIS support is not perfect, so this index is rarely used.

One of the best GIS solutions in open source relational library system is PostgreSQL PostGIS.

The full text indexing

A full-text index is a special type of index that looks for keywords in text rather than directly comparing values in the index. Full-text indexing is more akin to what search engines do than simple WHERE matching, and it won’t be explained too much here.

What are the benefits of using indexes?

Most directly, indexes can help us quickly find the data we need from a large number of data rows. Like the most common B-tree index, because of its storage structure, it can help us to do Order by and Group by operations. Summarize the following advantages:

1. The number of data lines scanned by the server is greatly reduced and the query efficiency is improved

2. Avoid generating temp tables and file sorted

3. Random I/ OS are avoided and ordered I/ OS are converted

The more indexes, the better. The more data, the more the cost of creating, using, and maintaining indexes. When the size of the table is small, direct full table scan is faster than querying the index (when the table needs to be queried). An index is most effective when the benefit of helping the storage engine find records quickly outweighs the extra work it does.

How to create a high-performance index?

Follow index features to avoid index failures.

Index failure:

Use OR (unless all OR fields are indexed)

The principle of optimal left prefix is not satisfied.

Like queries start with %.

The character string is not marked with “” and requires type conversion

After a range query is used, the following query criteria cannot use index matching

The index column in the query condition uses a function;

The mysql optimizer optimizes after the full table scan is considered faster

Prefix index and index selectivity

Sometimes long strings need to be indexed, which makes the index larger and slower.

Usually, the first part of the characters can be indexed, which greatly saves the index space and improves the index efficiency. But this also reduces the selection rows of the index.

Index selectivity refers to the ratio of non-repeating index values (also known as index cardinality) to the total number of entries (#T) in the table’s records. Ranges from 1/#T to 1.

The more selective an index is, the more efficient the query is, because a more selective index allows MySQL to filter out more rows in a lookup (reducing the number of index hits).

For example, a primary key index has a unique index selectivity of 1, which is the best index selectivity and the best performance.

The prefix length of the InnoDB engine can reach 767 bytes, or 3072 bytes if the Innodb_large_prefix option is enabled.

MyISAM engine prefix limit is 1000 bytes. Text, BLOb, or very long VARCHAR type columns must use prefix index

Prefix indexes should be selective but not too long.

The cardinality of the prefix index should approach the cardinality of the completion column,

1. Cardinality of index column ≈ cardinality of complete column (cardinality of index column/cardinality of complete column ≈1)

2. Index cardinality/index total entries ≈ Complete column cardinality/complete column total entries

The effect of prefix indexes on overwrite indexes

Using a prefix index will not take advantage of the optimization of an override index.

When querying, the system is not sure whether the definition of the prefix index truncates the complete information.

Composite index (multi-column index)

Indexes can cover multiple data columns at the same time. For composite indexes:

Mysql uses fields in the index from left to right. A query can use only a portion of the index, but only from the left.

Such as:

We define the composite index index (c1, c2 and c3), when can we find the c1, c2 and c3 | c1, c2 | c1 to find these three combinations, only from the left to start,

The index is invalid if c2, C3 are used for lookups. An index is useful when the leftmost field is a constant reference.

We can create composite indexes when we need to use certain fields frequently and we can determine the order in which to use them. 12

However, if we are not sure which fields to use, we can only add indexes for these fields separately. Adding useless composite indexes will cause index failure and bring pressure to mysql to change and delete.

Compound index optimization for sorting:

Remember that compound indexes are only optimized for order BY statements that are in exactly the same or opposite order as when the index was created

Order of index columns

The order of the index is critical, and the correct order depends on the queries that use the index, as well as how best to meet the sorting and needs.

The optimal left-prefix rule for composite indexes means that the indexes are sorted first by the leftmost column and then by the next column.

When sorting and grouping are not a concern, the index columns with high selectivity should be placed first.

Suppose we had a table of chapters like this:

CREATE TABLE 'chapter' (' id 'int(11) NOT NULL AUTO_INCREMENT,' name 'varchar(50) NOT NULL COMMENT' 名 ', 'category_id' int(2000) NOT NULL COMMENT 'category_id ',' project_id 'int(2000) NOT NULL COMMENT' category_id ', 'category_id' int(2000) NOT NULL COMMENT 'category_id ',' category_id 'int(2000) NOT NULL COMMENT' category_id ', 'category_id' int(2000) NOT NULL COMMENT 'category_id' 'subject_id' int(11) NOT NULL COMMENT 'subject ID ', PRIMARY KEY (' id')) ENGINE=InnoDB;Copy the code

According to the business requirements, we want to query the eligible chapters according to the classification, project, subject and other query conditions. How should we design a composite index?

Let’s first try to calculate the selectivity of these columns.

It is shown that the most selective columns are subject_ID,project_id, and category_id, and regardless of grouping and sorting, the index should be set up as:

ALTER TABLE `chapter` 
ADD INDEX `chapter_category`(`subject_id`, `project_id`, `category_id`)
Copy the code

Clustering index

Clustering index

Clustering: Indicates that rows of data are stored closely together with adjacent key values.

Cluster indexes are declared by the primary key. A table can have only one cluster index (overwrite indexes can simulate multiple cluster indexes).

Clustered index is not a type of index, but a data storage structure (B-tree).

Different from other B-tree indexes in Mysql, the clustered index stores data rows in the leaf page. The leaf page of other B-tree indexes stores the primary key ID, that is, the key of the clustered index. If the query result cannot meet the current query, the clustered index will be deleted

You can perform the Back table operation to query cluster indexes.

Since not all storage engines support clustered indexing, we will focus on InnoDB here, but the principle applies to any storage engine that supports clustered indexing.

The following figure shows how records in a clustered index are stored. Notice that the leaf page contains all the data for the row, but the node also contains only the index column.

Innodb’s primary key index is defined as a clustered index. If a table does not actively declare a primary key, Innodb selects a unique non-empty index instead. If neither primary key is specified, Innodb implicitly defines a primary key as the clustered index.

InoDB only aggregates indexes on the same page, and pages with adjacent key values may be far apart.

Clustering primary keys can help with performance, but it can also cause serious performance problems. So clustered indexes need to be carefully considered, especially when the storage engine of tables from InnoDb is humming other engines (and vice versa).

Advantages of clustered indexes:

You can keep related data together

Faster data access (clustered indexes keep indexes and data in the same B-tree)

Queries that use an overridden index scan can directly use the primary key value in the page node

Disadvantages of clustered indexes:

Clustering data improves IO performance, and if the data is all in memory, the order of access is less important.

The insertion speed depends heavily on the insertion sequence. The primary key order is the fastest. But if the data is not loaded in primary key order, it is best to use the Optimize Table to reorganize the table after loading.

Updating clustered index columns is expensive. Because InnoDB is forced to move every updated row to a new location.

Tables based on clustered indexes can suffer from page splitting when new rows are inserted or when the primary key is updated so that rows need to be moved. Page splitting causes tables to take up more disk space.

Clustered indexes can cause slow full table scans, especially if rows are sparse or the data store is discontinuous due to page splitting.

A non-clustered index is larger than expected because the leaf nodes of the secondary index contain primary key columns that reference rows.

Non-clustered index access requires two index lookups (in non-clustered indexes, leaf nodes hold row Pointers that point to the primary key of the row), which can be reduced with an adaptive innoDB hash index.

Data distribution comparison between InnoDB and MyISAM

The data distribution of the clustered index is different from that of the non-clustered index. The leaf node of the non-clustered index stores the key of the clustered index, that is, the primary key ID.

When non-clustered index queries are used, both indexes need to be queried. First, the primary key ID is queried based on the index column, and then the specified data row is retrieved from the clustered index based on the primary key ID.

Suppose we had a table like this:


CREATE TABLE layout_test( col1 int NOT NULL, col2 int NOT NULL, PRIMARY KEY(col1), KEY(col2) ); Insert into layout_test values (99,8),(12,56),(3000,62),.... (18, 8), (4700, 13), (3) combine;Copy the code

MyISAM:

MyISAM does not support clustered indexes, so data distribution is relatively simple.

The leaf node in each index of MyISAM stores the address value pointing to the data row. We use two diagrams to illustrate the storage mode roughly.

Table layou_test data distribution:

MyISAM is stored on disk in the order in which data is inserted.

Col1 primary key index distribution:

Col2 Normal index distribution:


The two storage structures are consistent, except that the nodes store keys in order according to the index column.

InnoDB:

InnoDB distinguishes clustered indexes from non-clustered indexes.

Col1 primary key index distribution:

Each leaf node of the clustered index contains the primary key value, transaction ID, rollback pointer for transaction and MVVC, and all remaining columns

Col2 Normal index distribution:

The leaf node stores the node key of the cluster index

Difference:

Importance of inserting rows in primary key order in An InnoDB table

Data page:

The default size for each page is 16K (8/16K can be selected by innodb_page_size after mysql5.6).

Each page can store a maximum of 16K / 2-200 = 7992 rows of records (at least 2 records per record, each page needs to reserve 200 bytes)

An I/O read is required from disk to memory each time a data page is loaded.

Store a minimum of 2 rows per page (virtual record, used to define record boundaries, Max virtual record – min virtual record)

InnoDB’s data page consists of the following 7 sections:

  • The File Header is fixed at 38 bytes (the page position, the previous page position, the LSN).
  • Data Page Header fixed 56 bytes including slot number, start address of reusable space, first record address, number of records, maximum transaction ID, etc
  • Virtual Maximum and minimum Record (Infimum + Supremum Record)
  • User Records contain deleted Records in the form of linked lists that form reusable Spaces
  • Free Spaces Unallocated space
  • Page Directory Slot information
  • File Trailer is fixed at 8 bytes to ensure page integrity

Non-leaf nodes store Pointers to keys. Assuming keys are of type Bigint, a node can store 16K /(8byte + 6byte).

If the amount of data in the table is small and the growth of the table is stable, it is better to define a primary key that can control the insertion order. The simplest method is to use the declaration auto_INCREMENT column. This ensures the sequence of data rows and improves the performance of associating operations based on primary keys.(Foreign key association is not recommended, but is recommended for services.)

It is best to avoid random (discontinuous and very wide distribution of values) clustered indexes, especially for I/O intensive applications. For example, using UUID as a cluster index is bad from a performance perspective. It makes the insertion of clustered indexes completely random, which is the worst case, leaving the data without any clustering characteristics.

Because it is stored sequentially, the last piece of data on the same page is always adjacent to the next. When the back of the page fills to the maximum threshold, new pages are generated to store subsequent records. The subsequent data is stored in the new page in order.

This always ensures that every page is filled.

However, if the storage is not sequential, there may be frequent page splitting, fragmentation of data, more storage space, and random I/O reads, which often require mysql to do more processing.

Because the newly inserted key is not necessarily larger than the previously inserted key, index InnoDB cannot simply always insert new rows to the end of the index. Instead, it needs to find and allocate space for new rows, usually in the middle of the data. This adds a lot of extra work and leads to less optimal data distribution.

Disadvantages:

  • The written target page may have been flushed to disk and removed from the cache, or may not have been loaded into memory. InnoDB has to find and read the target page from disk into memory before inserting. This results in a lot of random I/O
  • Because writes are unordered,InnoDB has to split pages frequently to allocate space for new rows. Page splitting causes large amounts of data to be moved, and at least three pages need to be modified at a time instead of one
  • Frequent page splitting results in sparse and irregular data storage, resulting in data fragmentation.

By comparison, it is intuitive to see the benefits of ensuring the insertion order of index keys.

Cover index

Overwriting an index is a way to optimize a query. It avoids the secondary index from performing a back-table query and only uses the information provided by the secondary index tree to satisfy the current query.

When we create an index, we always create the appropriate index according to the needs of the WHERE condition, which only uses one aspect of the index. We should consider the entire query when we create the index, not only the part of the WHERE condition, but also the part of the SELECT result set.

If the index meets our desired result set, there is no need to read rows. When we just need to query the title of a given table of contents in a book, the table of contents page can satisfy our needs, and we do not need to turn to the specified page to see the table of contents.

If an index contains the values of all the fields to be queried (the part after SELECT), we call it a “override index.” We should specify which fields we need to select so that mysql can further optimize the query.

Suppose there was a table of classifications:

CREATE TABLE category ( id int primary key auto_increment, name varchar(50) not null, c_desc varchar(200), create_time datetime not null default now(), key name_index(`name`) )ENGINE = INNODB; INSERT INTO category VALUES(1,' category 1',' default '); INSERT INTO category VALUES(2,' category 2',' default '); INSERT INTO category VALUES(3,' category 3',' default '); INSERT INTO category VALUES(4,' category 4',' default '); INSERT INTO category VALUES(5,' category 5',' default '); INSERT INTO category VALUES(6,' category 6',' default '); INSERT INTO category VALUES(7,' category 7',' default ');Copy the code

There is a requirement for the id of the corresponding category name

1. Directly query all fields by select *

EXPLAIN SELECT * FROM category WHERE name = '1';Copy the code

Use the explain command to analyze the query. You can see that mysql is using name_index for the query, but there is no additional information in Extra, indicating that the query is back to the table

(After Mysql5.6, optimization techniques were added to reduce the number of back table queries and the number of interactions between the Mysql Server layer and the storage engine layer.)

2. Select ID,name to query the specified field

EXPLAIN SELECT id FROM category WHERE name = 'category ';Copy the code

Mysql > select name_index (Extra = Using index) from ‘explain’;

What are the benefits of overwriting an index? Overwriting index avoids the query back to the table, which greatly improves the query efficiency.

  • Different from the primary key index, the leaf node on the secondary index stores the index column + the primary key ID. If we only need to read the secondary index, our query can be satisfied, and there is no need to query back to the table, which greatly reduces the data access.

    Keep in mind that mysql is paging data from disk to cache, which can be time consuming. Indexes are relatively small and easier to fit all into memory, reducing I/O consumption and helping in I/O intensive applications.

  • Because indexes are stored in order of index column values, there is less I/O for an I/ O-intensive range query than for a random reading of each row from disk.

When a query is overridden with index optimizations, we can see Using Index in the Extra column with the EXPLAIN directive.

A few things to be aware of

1. Use index scan to sort

Mysql has two ways to generate ordered results: by sorting or by index. If the type column in explain is index, mysql uses index scanning to sort resultsCopy the code

Scanning the index itself is fast, because you only need to move from one index record to the next. But if the index does not cover all the columns required by the query, you have to go back to the table for each row scanned

This is mostly random IO, so reading data in index order is usually slower than sequential full table scans

Mysql can use the same index for both sorting and row lookup, and indexes should be designed to do both if possible.

Mysql can use indexes to sort results only if the index column order is exactly the same as the order by clause, and all columns are sorted the same way

Small). If a query requires multiple tables to be associated, the orderby clause can only be used for sorting if all the fields referenced by the orderby clause are the first table. The order BY clause has the same constraints as a lookup query, requiring the left-most prefix of the index,

Otherwise, mysql would need to perform sequential operations and not take advantage of index sort

2. Avoid redundant and duplicate indexes

MySQL allows multiple indexes to be created on the same column, and duplicate indexes need to be maintained separately. If these indexes can be utilized in the query, the query optimizer will need to consider each index individually when evaluating the cost, increasing the performance cost.

You can create indexes with the same order, same index fields and different index names for the same column. We should avoid such operations. It should also be removed immediately after discovery. Such as

create index index_name1(name);

create index index_name2(name);

create unique index index_name3(name);

Redundant indexes are different from duplicate indexes. For example, if an index (A,B) has been created and then an index (A) has been created, it is A redundant index because the prefix of (A,B) already contains A.

For InnoDB primary key columns are included in secondary indexes, so this is also redundant in terms of performance, but it is unavoidable.

Redundant indexes are not needed in most cases, and when an index does not satisfy our query, our first thought should be how to extend it. Try not to create new indexes, the existence of unreasonable indexes will only increase our space consumption and aggravate performance loss.

However, redundant indexes are sometimes required for performance reasons because extending an existing index can cause it to become too large, affecting query performance when used by other queries.

3. Indexes and locks

Indexes allow queries to lock fewer rows. If queries publish access to rows that are not needed, fewer rows are locked, which is good for performance in both ways. Although InnnoDB line lock efficiency is very high, memory use is also very small, but the lock line will still bring additional lock overhead; Second, locking more rows than needed also increases lock contention and reduces concurrency.

4. Reduce fragmentation of data and indexes

First of all, the reasons for fragmentation, how fragmentation occurs. Massive deletion of data, modification of index column values, and irregular random inserts will lead to fragmentation of data and indexes

B-tree indexes may be fragmented, which reduces query efficiency. A fragmented index may be stored on disk in a poor or disordered manner.

By design,B-Tree requires random disk access to locate leaf pages, so random access is unavoidable.

Queries perform better if the leaf pages are physically distributed sequentially and tightly.

Otherwise, the speed may be many times slower for range queries, index coverage scans, etc., and even more obvious for index coverage scans.

The data store of the table can also be fragmented. However, fragmentation of data stores is more complex than indexing. There are three types of data fragmentation:

Row fragmentation

This fragmentation refers to rows of data being stored in multiple fragments in multiple places. Even if the query accesses only one record from the index, row fragmentation can cause performance degradation

Intra-row fragmentation

Interrow fragmentation is pages that are logically sequential or not stored sequentially on disk. Interrow fragmentation has a significant impact on operations such as full table scans and clustered index scans that would otherwise benefit from sequential storage of data on disk.

Free space fragmentation

Residual space debris is a large amount of free space in the data sheet. This causes the server to read a lot of unneeded data, resulting in waste.

All three types of fragmentation can occur for MyISAM tables. But InnoDB does not have short row fragments. InnoDB moves short rows and rewrites them to a fragment.

The data can be reorganized by executing the OPTIMIZE TABLE or exporting and importing. This works for most everywhere engines.

  • For InnoDB tables, MySQL actually reconstructs tables and indexes online and recollects statistics.

MySQL > TABLE fragmentation > OPTIMIZE TABLE

1.MySQL recommends not to defragment every hour or every day, but only once a week or once a month according to the actual situation.

2.OPTIMIZE TABLE only for MyISAM, BDB and InnoDB tables, especially MyISAM tables. In addition, not all tables need to be defragmented. generally, only text data classes containing the variable length described above are required

Type of the table can be sorted.

3. OPTIMIZE TABLE runtime:

MyISAM keeps the table locked. InnoDB prior to Mysql 5.6.17, optimized tables did not use online DDL. Therefore, concurrent DML (INSERT, UPDATE, DELETE) is not allowed on the optimized table when it is running.

TABLE OPTIMIZE, Doing Math + Analyze instead by default, TABLE does not support OPTIMIZE, Doing math + Analyze instead will be displayed. At this point, we can

Alter table engine=InnoDB alter table engine=InnoDB

conclusion

In MySQL, b-tree indexes are used in most cases. If used in the right scenario, indexes can greatly improve query response time,

There are three principles to keep in mind when selecting indexes and using them for queries:

  1. Single-line access is slow, especially in mechanical hard disk storage.

  2. Sequential access to range data is fast.

    Sequential I/O does not require multiple disk seeks and is faster than random I/O,

    If the server can read sequentially, there is no additional sorting, and GROUP BY does not need to be sorted again and aggregated BY GROUP.

  3. Index coverage is fast. If the index contains all the columns that need to be queried, the storage engine does not need to go back to the table to look up, avoiding a large number of single-row accesses.