After table creation and index creation with appropriate data types, the work is not done – you need to maintain tables and indexes to ensure that they work well. The main goals of data table maintenance are to find and repair conflicts, maintain accurate index statistics, and reduce fragmentation.

Find and fix table conflicts

The worst thing about data tables is that they collide. When using the MyISAM storage engine, collisions are usually caused by crashes. However, all storage engines can suffer from index collisions when there are hardware failures, MySQL bugs, or operating system bugs.

Conflicting indexes can result in incorrect query results, an increase in duplicate index errors when there are no duplicate values, and even a full table scan or crash. If you encounter an unexpected event, such as an error that you do not think will occur, run the CHECK TABLE command to CHECK for TABLE conflicts (note that some database engines do not support this command, and others support multiple options to specify how to CHECK tables). Typically, the CHECK TABLE command catches most of the TABLE and index errors.

You can use the REPAIR TABLE command to REPAIR TABLE errors, but not all storage engines support this command. In this case, you need to execute a “no operation” ALTER statement, such as changing the engine of a table to the same as the current engine. For example, you can execute the following statement on InnoDB tables:

ALTER TABLE innodb_tb1 ENGINE=INNODB;
Copy the code

Instead, you can use a storage engine-specific offline repair tool, such as Myisamchk, or export data and re-import it. However, you may not be able to use these methods if the conflict occurs in the system area, or in the data row area of the table, rather than the index. In this case, you may need to restore data from your backup or from conflicting files.

If you run into conflicts in InnoDB, this is a serious mistake and you need to analyze the problem in the right way. InnoDB usually does not have conflicts. Its design is robust for conflict handling. Conflicts can be the result of hardware failures (such as memory errors or disk errors), DBA errors (such as manipulating database files outside the MySQL environment), or (very rarely) bugs in InnoDB itself. A common reason is similar to a view error using the rsync tool to create a backup. There are no queries to execute at this point – because that would cause InnoDB data collisions, which you think would be avoided. If you cause InnoDB data conflicts with a questionable query, it’s not your fault, it’s an InnoDB Bug.

If you do encounter a data conflict, the most important thing is to understand the cause of the conflict. Don’t simply fix the data before then, and the conflict may disappear automatically. You can use the innodb_force_recovery parameter to restore InnoDB to a forced recovery mode (see MySQL’s manual). You can also use the open source Percona InnoDB data recovery tools (www.percona.com/software/my)… Extract data from a corrupted data file.

Update index statistics

The MySQL query optimizer calls two apis to get the distribution of index values before deciding how to use the index. The first is the records_in_range method, which takes a range parameter and returns the number of results for that range. The result is accurate for MyISAM, but is an estimate for InnoDB.

The second API is the INFO method, which returns multiple types of data, including index candidates (that is, estimates of the number of records for each index).

When the storage engine gives the query optimizer inaccurate information about the number of rows of data, or when the query plan is too complex to estimate the exact number of rows, the optimizer uses index statistics to estimate the number of rows of data. The MySQL optimizer makes decisions based on the cost of the query, the primary cost criterion being the amount of data to be found in the query. If the index statistics are never generated or are out of date, the optimizer may make the wrong decision. The solution is to run the ANALYZE TABLE command, which recreates the index statistics.

Each storage engine implements index statistics in a different way, so you can run the ANALUZE TABLE command differently and at different costs. A typical storage engine handles index statistics as follows:

  • The Memory engine does not store index statistics.
  • MyISAM stores index statistics on disk, and the ANALYZE TABLE uses full index scans when calculating candidate rows. The entire table is locked during this process.
  • InnoDB does not store index statistics on disk in MySQL 5.5. Instead, InnoDB performs random index sampling and stores the results in memory.

You can check the candidates for an INDEX by running the SHOW INDEX FROM command. Such as:

This command gives you a lot of indexing-related information. Check the MySQL manual for details. The Cardinality column is of particular concern here. This column shows how many different values correspond to the index estimated by the storage engine. In MySQL 5.0 and later, this information is also conveniently available from the information_schema.statistics table. For example, you can follow the INFORMATION_SCHEMA query to find indexes with low filtering. Note, however, that for servers with large data volumes, these intermediate tables can cause a significant increase in server load.

InnoDB’s statistics are worth further study. The statistical results are calculated by random sampling of index data pages, assuming that the remaining unsampled data is similarly distributed. In old InnoDB versions, the number of pages sampled was 8, but the latest version can be adjusted using the innodb_stats_sample_pages variable. Setting this value to a value greater than 8 helps to generate more representative index statistics, especially for large tables, but at a different cost.

InnoDB calculates index statistics when the TABLE is first opened, ANALUZE TABLE is run, and the TABLE storage size changes significantly (1/16 of a change or 2 billion rows inserted).

InnoDB also calculates INDEX statistics for certain queries on INFORMATION_SCHEMA tables, such as running SHOW TABLE STATUS, executing SHOW INDEX queries, or MySQL command line client with autocomplete enabled. This can actually cause serious problems for servers with large data volumes, or slow I/O speeds. Resampling by a client program or monitoring tool can result in a lot of locking and a heavy load on the server, as well as an impact on end user startup time. Because the SHOW INDEX command updates the INDEX statistics, you can’t see the INDEX statistics if you don’t change them. You can avoid these problems by disabling the Innodb_stats_on_metadata option (off by default). The following command can find InnoDB index statistics related system variables.

SHOW GLOBAL VARIABLES WHERE Variable_name like 'innodb_stats%'
Copy the code

If you are using a Percona server that contains the Percona XtraDB storage engine that replaces InnoDB, you can do further configuration. The Innodb_statS_auto_UPDATE option allows you to disable automatic sampling and effectively freeze automatic statistical calculations unless you run ANALYZE TABLE manually. This lets you get rid of volatile queries. This feature was added at the request of customers of large deployment systems.

For greater query plan stability and faster system startup, you can use system-level tables to store index statistics. This method does not require recalculation of index statistics when the system restarts or InnoDB starts the table for the first time. This feature is already supported in Percona 5.1 and is already supported in standard MySQL 5.6. Percona server this feature is enabled with the innodb_use_sys_stats_table option. After MySQL 5.6, this is controlled by the Innodb_stats_persistent option, which defaults to ON. At the same time, there is also a variable control single table, innodb_stats_auto_recalc variable is ON by default, the variation in the data table of more than 10% when a recount the table index statistics (manual can refer to: dev.mysql.com/doc/refman/…). .

If you are not configured to automatically update the index statistics, you need to update the index statistics periodically using the ANALYZE TABLE command, unless you know that not updating the index statistics will not result in a poor query plan.