This article summarizes some common MySQL usage tips for teams without dbAs. InnoDB is used as storage engine unless otherwise specified.

The characteristics of the MySQL

Understanding the features of MySQL helps you use MySQL better. The biggest difference between MySQL and other common databases is the concept of storage engine. Storage engine is responsible for storing and reading data. Different storage engines have different features. You can select a suitable storage engine or even develop a new one. The logical architecture of MySQL is as follows:

The default storage engine of MySQL is InnoDB. The main features of the storage engine are:

  • Support transaction processing
  • Row-level locking is supported
  • Data is stored in a table space, which consists of column data files
  • Using MVVC(Multi-version Concurrency control) mechanism to achieve high concurrency
  • Tables are built based on clustered indexes for primary keys
  • Support hot backup

Other common storage engine features:

  • MyISAM: the default MySQL engine does not support transaction and row-level locking. Developers can manually control table locking. Support full-text indexing; Unable to safely recover from a crash; Compressed tables are supported. Data in compressed tables cannot be modified but occupy less space, improving query performance
  • Archive: Supports only Insert and Select. Batch inserts are fast and data is queried through full table scan
  • SCV: An SCV file is treated as a table
  • Memory: Data is stored in Memory

There are many more, I will not list them all.

Data type optimization

Principles for selecting data types:

  • Select data types that take up less space
  • Choose simple types
  • Avoid unnecessary empty columns

Types with small footprint save more hardware resources, such as disks, memory, and CPU. Try to use simple types, such as int, rather than char, because the sorting of the latter involves character set selection and is more complex than using int. Nullable columns use more storage space, and MySQL requires extra bytes to record if indexes are created on nullable columns. When creating a table, the default value is nullable, which is easily ignored by the developer. It is best to manually change the value to nullable if the data to be stored is not nullable.

The integer types

Integer types include:

  • tinyint
  • smallint
  • mediumint
  • int
  • bigint

They use 8, 16, 24, 32, and 64 bits respectively to store numbers that can be represented

The number of ranges can be preceded by an unsigned modifier that doubles the representable range of a positive number but does not represent a negative number. In addition, there is little use in specifying the length of an integer; once the data type is fixed, the length is fixed accordingly.

The decimal type

  • float
  • double
  • decimal

Float and double are just float and double in the usual sense. The former stores data in 32 bits, and the latter stores data in 64 bits.

The type of Decimal is complex, supports accurate calculation, and occupies a large space. Decimal uses every 4 bytes to represent 9 numbers. For example, Decimal (18,9) indicates that the number length is 18, including 9 small digits,9 integer digits, and the decimal point itself, occupying a total of 9 bytes. Consider using Bigint instead of Decimal for large data volumes due to its large footprint and the complexity of precision computations, and you can do some scaling of real data before persisting and reading.

String type

  • varchar
  • char
  • varbinary
  • binary
  • blob
  • text
  • The enumeration

Varchar data takes up space equal to the length of the string plus 1 or 2 bytes used to record the length of the string (when row-format is not set to fixed). Varchar is space-saving. Varchar is used when the length of a column in a table varies greatly.

The actual footprint of char is fixed, and chart is suitable when the string data in the table is of similar length or very short.

Varchar and char are matched by varbinary and binary, the latter of which stores binary strings. The latter is case-sensitive and performs comparisons faster regardless of encoding.

Note that: Although varchar(5) and varchar(200) use the same storage space to store the string “hello”, it does not mean that setting the length of vARCHar too large will not affect performance. In fact, some internal calculations of MySQL, For example, when creating temporary tables in memory (some queries cause MySQL to create temporary tables automatically), a fixed amount of space is allocated to store data.

Blob uses binary strings to hold large text, text uses characters to hold large text, and InnoDB uses special external storage for such data. Only Pointers to them are stored in rows. Such data should not be indexed (they should only be created against string prefixes), but nobody does.

If a column contains a large number of repeated strings and has limited contents, enumerations can be used instead. MySQL maintains a “numeral-string” table when handling enumerations. Using enumerations can save a lot of storage space.

Time to type

  • year
  • date
  • time
  • datetime
  • timestamp

The datetime store ranges from 1001 to 9999, accurate to the second. Timestamp stores the number of seconds since midnight on January 1, 1970, and can represent the year 2038. It takes up 4 bytes, half the space of datetime. The timestamp column has a timestamp value that depends on the time zone. MySQL automatically updates the first timestamp column to the current time when an INSERT or update statement is executed. A lot of tables have a column called UpdateTime, and it’s appropriate that this column uses TIMESTAMP, and it’s automatically updated, provided the system doesn’t use it until 2038.

Primary key type selection

Use integers whenever possible; they take up less space and can be set to auto-grow. In particular, do not use GUID, MD5 and other hash strings as primary keys, which are very random. Since InnoDB primary keys are clustered index columns by default, data storage is too scattered. In addition, InnoDB’s secondary index columns contain primary key columns by default. If the primary key is too long, the secondary index will take up too much space.

Special type of data

It is best to use 32-bit unsigned integers to store IP addresses. MySQL provides functions inet_aton() and inet_ntoa() to convert numeric and string representations of IP addresses.

The index optimization

InnoDB uses a B+ tree to implement indexes. For example, suppose there is a People table

CREATE TABLE `people` (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `Name` varchar(5) NOT NULL,
  `Age` tinyint(4) NOT NULL,
  `Number` char(5) NOT NULL COMMENT 'number',
  PRIMARY KEY (`Id`),
  KEY `i_name_age_number` (`Name`,`Age`,`Number`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8;
Copy the code

Insert data:

Its index structure looks something like this:

That is, the order of the index columns is important; if the Name column of two rows is the same, the Age column is used to compare the size, and if the Age column is the same, the Number column is used to compare the size. Sort by column 1, then column 2, and finally column 3.

The use of queries should be matched as far as possible from left to right, in addition, if the left column range lookup, the right column cannot use the index; Also, do not separate the query column, otherwise the following index will not be used. The following SQL is a positive example:

  • SELECT * from people where Name= ‘Abel’ and Age = 2 and Number = 12312
  • SELECT * from people where NameAbel = ‘ ‘
  • SELECT * from people where NameLike the ‘Abel %’
  • SELECT * from people where Name = ‘Andy’ and Age BETWEEN 11 and 20
  • SELECT * from people ORDER BY NAME
  • SELECT * from people ORDER BY NAME, Age
  • SELECT * from people GROUP BY Name

The following SQL is a bad example:

  • SELECT * from people where Age = 2
  • SELECT * from people where NAME like ‘%B’
  • SELECT * from people where age = 2
  • SELECT * from people where NAME = ‘ABC’ AND number = 3
  • SELECT * from people where NAME like ‘B%’ and age = 22

A technique for creating indexes using Hash values

If the table has a column that stores a long string, let’s say the name is URL, and the index created on that column is large, there is a way to alleviate this: create an index of the numeric hash value of the URL string. Create a new field, such as URL_CRC, to place the URL hash value, and create an index for this field.

select * from t where URL_CRC = 387695885 and URL = 'www.baidu.com'

If the amount of data is large, to prevent hash conflicts, you can customize the hash function, or use part of the return value of the MD5 function as the hash value:

SELECT CONV(RIGHT(MD5('www.baidu.com'),16), 16, 10)

The prefix index

If a string column stores long data and creates a large index, you can shorten the index size by using a prefix index, that is, indexing only the first few characters of the string. Obviously, this type of index does not work with order BY and group BY.

It is important to choose the prefix length when creating a prefix index. Choose the shorter prefix as possible without destroying the original data distribution. For example, if most strings start with “ABC”, the index value will contain too many repetitions of “abcX” if the limited prefix index length is 4.

The column index more

The indexes created on “People” mentioned above are multi-column indexes, which are often better than multiple single-column indexes.

  • When performing and queries against multiple indexes, you should create multi-column indexes, not single-column indexes
  • Try something like this:

select * from t where f1 = 'v1' and f2 <> 'v2' union all select * from t where f2 = 'v2' and f1 <> 'v1'

The order of multi-column indexes is important. In general, when sorting and grouping queries are not considered, the selectivity (selectivity refers to the number of different columns in a table index/total number of rows) should be considered. High selectivity means less duplicate data.) The large columns come first. For example, if the Name of the People table is more selective than the Age table, the query statement should read:

select * from people where name = 'xxx' and age = xx

The Name column is appropriate to place on the left side of the index, but if an SQL execution has the highest rating, for example

select * from people where name = 'xxx' and age = 20

When the number of records with age=20 is very small in the database, it is more efficient to place age at the left end of the index column. Age =20; age=20; age=20; age=20;

Clustering index

A clustered index is a data storage structure. InnoDB stores rows directly in the leaf node of an index with a primary key, rather than just the values of the index column and the primary key of the row to which it points, as secondary indexes do. Because of this feature, a table can have only one clustered index. If a table does not define a primary key and does not define a column with a unique index, InnoDB generates a hidden column and sets it as a clustered index column.

Cover index

Simply put, some queries only need to query the index column, eliminating the need for a second query based on the primary key ID of the index B-tree node record.

Duplicate and redundant indexes

Creating an index repeatedly on a column does no good, only harm, and should be avoided. For example, creating unique indexes and normal indexes for the primary key is overused because InnoDB’s primary key is clustered by default.

A redundant index is different from A duplicate index. For example, an index (A,B) and another index (A) are called redundant indexes. The former can replace the latter, but the latter cannot replace the former. But (A,B) and (B), as well as (A,B) and (B,A), are not redundant indexes and cannot be substituted for each other.

If A table already has an index (A) and you want to create an index (A,B), you can just extend the index, there is no need to create A new index. Note that if index (A) already exists, there is no need to create index (A,ID), where ID is the primary key, because index A already contains the primary key, which is also redundant.

However, in some cases, redundant indexes are also desirable. If you already have an index (A) and extend it to (A,B), you can consider creating A new index (A,B) because column B is A long type and the query with A is not as fast as before.

An index that is not used

Unused indexes increase the efficiency of insert, UPDATE, and DELETE and should be removed promptly

Index Usage summary

The three-star principle of indexing:

  • An index that puts the records related to a query together in order gets one star
  • A star is awarded if the order of the data in the index matches the order of the query results
  • An index that contains all columns needed for a query is awarded a star

The first principle is that the order of queries in the WHERE condition is the same as that of the index, which is used from left to right.

Index is not a panacea, when the amount of data is huge, the maintenance of the index itself is also costly performance, should consider partition table storage.

Query optimization

The cause of slow query

Whether extra rows are requested from the database

For example, the application only needs 10 pieces of data, but requests all of it from the database, discarding most of it before displaying it on the UI.

Whether extra columns are requested from the database

For example, the application only needs to display 5 columns, but finds all columns by selecting * from

Whether the same query has been executed multiple times

Whether the application can consider a query and then cache, so that later use can use the records from the first query.

Whether MySQL is scanning for additional records

You can get an idea of the number of records that need to be scanned by looking at the execution plan. If this number exceeds your expectations, try adding indexes, optimizing the SQL (which is the focus of this section), or changing the table structure (such as creating a separate summary table for a particular statement query).

The way queries are refactored

  • Decompose a complex query into multiple simple queries
  • A large query is split into smaller queries that perform the same function each time, but only a small part of the query is completed
  • Decompose associated query. You can change a large relational query to query several tables separately and process them in your application code

The squalid

To optimize the count ()

Count counts a specified column or expression and the number of rows. If the argument is passed a column name or an expression, count counts all rows that are not NULL, and if the argument is *, count counts all rows. Here is an example of passing an expression:

SELECT count(name like 'B%') from people

  • Approximation optimizations can be used instead of count(), such as the number of rows in an execution plan.
  • Index overlay scan
  • Add summary tables
  • Increase the number of data items recorded by the memory cache system


Optimization of associated query

  • For example, if two tables A and B are associated by column C, MySQL will traverse table A and then look up the data in table B based on the values of column C traversed. To sum up, in general, if there is no index, only the c column of table B can be indexed
  • Ensure that the columns involved in order BY and Group BY belong to only one table so that they can function as indexes


Optimized subquery

For MySQL5.5 and below, use joins instead of subqueries whenever possible.

Optimize group by and DISTINCT

If possible, try to apply both operations to primary keys.

Optimize limit, such as SQL

SELECT * from sa_stockinfo ORDER BY StockAcc LIMIT 400, 5
Copy the code

The MySQL optimizer will look for all column data on row 405 and discard 400. If you can use overwrite index query, there is no need to query so many columns.

SELECT * FROM sa_stockinfo I JOIN (SELECT StockInfoID FROM sa_stockinfo ORDER BY StockAcc LIMIT 400,5)t ON i.stockinfoid  = t.StockInfoIDCopy the code

StockAcc has an index, and this query will use index coverage to quickly find the primary key that meets the conditions, and then do the joint query, which is effective when the data volume is large.

Optimization of the union

If not necessary, be sure to use the keyword union all so that MySQL does not have to perform uniqueness verification when placing data into temporary tables

A common way to determine whether a record exists is to

select count(*) from t where condition
Copy the code

It’s better to write:

SELECT IFNULL((SELECT 1 from tableName where condition LIMIT 1),0)Copy the code

I hope this will help you,Many PHPer will always encounter some problems and bottlenecks when they are advanced. They have no sense of direction because they write too much business code, so they don’t know where to start to improve. I have sorted out some information about this, including but not limited to: Distributed architecture, high scalability, high performance, high concurrency, server performance tuning, TP6, Laravel, YII2, Redis, Swoole, Swoft, Kafka, Mysql optimization, shell scripting, Docker, microservices, Nginx, etc. Many knowledge points can be shared for free
I need to poke here
PHP Advanced Architect >>> Video, interview documents for free