MySQL performance

Maximum data volume

Regardless of the amount of data and concurrency, performance is rogue. MySQL does not limit the maximum number of records in a table, it depends on the operating system’s file size limit.

The file system Single file size limit
FAT32 The largest 4 g
NTFS The largest 64 gb
NTFS5.0 The biggest 2 TB
EXT2 The block size is 1024 bytes and the maximum file capacity is 16GB. The block size is 4096 bytes, and the maximum file capacity is 2TB
EXT3 The block size is 4KB and the maximum file capacity is 4TB
EXT4 It could theoretically be larger than 16 terabytes

According to The Alibaba Java Development Manual, only when the number of rows in a single table exceeds 5 million or the capacity of a single table exceeds 2GB is it recommended to divide database into tables. Performance is determined by comprehensive factors. Regardless of service complexity, hardware configuration, MySQL configuration, data table design, and index optimization are in the order of influence. The value of 5 million is for reference only, not an iron rule. Bloggers have operated on a single table with more than 400 million rows of data, and it takes 0.6 seconds to query the latest 20 records in pages. Select field_1,field_2 from table where id < #{prePageMinId} order by ID desc LIMIT 20 PrePageMinId is the minimum ID for the previous data record. While the query speed was fine at the time, it will surely become overwhelmed as the data continues to grow. Dividing database and table is a big job with a long period and high risk. It should be optimized on the current structure as much as possible, such as upgrading hardware, migrating historical data, etc., and then dividing it when there is no way out. Those who are interested in sub-tables can read the basic idea of sub-tables.

Maximum concurrency

The number of concurrent requests is how many requests the database can process at any one time, as determined by max_connections and max_user_connections. Max_connections specifies the maximum number of connections to a MySQL instance. The upper limit is 16384. Max_user_connections specifies the maximum number of connections to a database user. MySQL provides buffers for each connection, which means more memory consumption. If the connection number is set too high for the hardware, too low for the hardware. Generally, the ratio of the two is more than 10%, and the calculation method is as follows:

Max_used_connections/max_connections *100% = 3/100 *100% ≈ 3%Copy the code

To view the maximum number of connections and the maximum number of response connections:

show variables like '%max_connections%';
show variables like '%max_user_connections%';
Copy the code

Modify the maximum number of connections in the configuration file my.cnf

[mysqld]
max_connections = 100
max_used_connections = 20
Copy the code

The query takes 0.5 seconds

You are advised to limit the query time to less than 0.5 seconds, which is an empirical value derived from the three-second rule in user experience. If the user does not respond to an operation within 3 seconds, the user will be bored or even quit. Response time = client UI rendering time + network request time + application processing time + query database time, 0.5 seconds is 1/6 of the processing time left for the database.

Implementation of the principle of

Compared to NoSQL databases, MySQL is a delicate creature. It is like the female students in PE class, a little dispute and classmates will be uncomfortable (difficult to expand), run two steps will be out of breath (small capacity and low), often sick to ask for leave (SQL constraints too much). Everyone is a bit distributed these days, and application scaling is much easier than database scaling, so the principle is less database work, more application work.

  • Make the most of indexes, but don’t abuse them. Indexes also consume disk and CPU.
  • It is not recommended to use database functions to format data for application processing.
  • The use of foreign key constraints is not recommended; data accuracy is guaranteed by the application.
  • Unique indexes are not recommended in the scenario where many data is written but few data is read.
  • With appropriate redundancy of fields, try creating intermediate tables, calculating intermediate results with applications, and trading space for time.
  • Extremely time-consuming transactions are not allowed to be executed and the application is split into smaller transactions.
  • Anticipate load and data growth trends for important data sheets (such as order sheets) and optimize them in advance.

Data table design

The data type

Choice of data type: simpler or less space.

  • If the length is sufficient, use tinyint, Smallint, medium_int instead of int.
  • If the length of the string is specified, use the char type.
  • If the vARCHar meets this requirement, the text type is not used.
  • For high precision, use decimal or BIGINT, such as multiplying two decimal places by 100 and saving.
  • Use timestamp instead of datetime whenever possible.
type Occupy the byte describe
datetime 8 bytes ‘1000-01-01 00:00:00.000000’ to ‘9999-12-31 23:59:59.999999
timestamp 4 bytes ‘1970-01-01 00:00:01.000000’ to ‘2001-19 03:14:07.999999’

Timestamp takes up less space than datetime and is stored in THE AUTOMATIC time zone conversion format UTC.

Avoid empty value

MySQL still occupies space when a field is NULL, which makes indexes and index statistics more complicated. Updates from NULL values to non-NULL values cannot be performed in situ, which is prone to index splitting and affects performance. If possible, replace the NULL value with a meaningful value to avoid the SQL statement containing is not NULL.

Text type optimization

Because the text field stores a large amount of data, the table size can increase very early, affecting the query performance of other fields. It is recommended to extract it and place it in a sub-table and associate it with the business primary key.

The index optimization

The index classification

  1. Plain index: The most basic index.
  2. Composite index: An index created on multiple fields to speed up the retrieval of composite query conditions.
  3. Unique index: Similar to a normal index, but the value of the index column must be unique, allowing empty values.
  4. Combination unique index: The combination of column values must be unique.
  5. Primary key index: a special unique index that uniquely identifies a record in a data table. Empty values are not allowed and are usually constrained by the primary key.
  6. Full text index: Used to query massive text. InnoDB and MyISAM support full text index after MySQL5.6. Due to poor query accuracy and scalability, more enterprises choose Elasticsearch.

The index optimization

  1. Paging queries are important, MYSQL does not use indexes if the volume of the query exceeds 30%.

  2. The number of indexes in a single table does not exceed five, and the number of index fields does not exceed five.

  3. The string can be indexed by prefix, and the prefix length is limited to 5-8 characters.

  4. The uniqueness of the field is too low, so it is meaningless to add indexes, such as whether to delete, gender.

  5. Fair use coverage index, as follows:

    select login_name, nick_name from member where login_name = ?

Nick_name and login_name create a combined index faster than login_name

SQL optimization

batch

When the blogger was young, he saw a fish pond with a small opening to release water, and there were all kinds of floating objects on the surface. Duckweed and leaves always make it through the outlet, while branches block other objects and sometimes get stuck and need to be cleaned manually. MySQL is the fish pond, maximum concurrency and network bandwidth is the outlet, user SQL is floating. Queries without paging parameters or update and DELETE operations that affect a large amount of data are branches that need to be broken up and processed in batches. For example: Update status=0 FROM coupon WHERE expire_date <= #{currentDate} and status=1; If a large number of coupons need to be updated to the unavailable state, executing this SQL may block other SQL, batch processing pseudocode as follows:

int pageNo = 1;
int PAGE_SIZE = 100;
while(true) {
    List<Integer> batchIdList = queryList('select id FROM `coupon` WHERE expire_date <= #{currentDate} and status = 1 limit #{(pageNo-1) * PAGE_SIZE},#{PAGE_SIZE}');
    if (CollectionUtils.isEmpty(batchIdList)) {
        return;
    }
    update('update status = 0 FROM `coupon` where status = 1 and id in #{batchIdList}')
    pageNo ++;
}
Copy the code

The <> operator optimizes

In general, the <> operator cannot use an index, as in the following example: select id from orders where amount! = 100; If the number of orders for 100 is very small and the data distribution is very uneven, it is possible to use indexes. In view of this uncertainty, union is adopted to aggregate the search results, and the rewriting method is as follows:

(select id from orders where amount > 100)
 union all
(select id from orders where amount < 100 and amount > 0)
Copy the code

The OR optimization

Or cannot use composite indexes under Innodb engine, for example:

Select id, product_name from orders where mobile_no = '13421800407' or user_id = 100;Copy the code

OR could not match the combined index mobile_no + user_id.

(select id, product_name from orders where mobile_no = '13421800407') product_name from orders where user_id = 100);Copy the code

When both id and product_name fields are indexed, the query is most efficient.

IN optimization

  1. IN is suitable for the main table big small table, EXIST is suitable for the main table small table big table. Due to the continuous upgrading of the query optimizer, there are many scenarios where the performance is almost the same.

  2. Try changing to join query as follows:

    select id from orders where user_id in (select id from user where level = ‘VIP’);

JOIN is adopted as follows:

select o.id from orders o left join user u on o.user_id = u.id where u.level = 'VIP';
Copy the code

I don’t do column operations

Index invalidation is usually performed in the query condition column, as shown below: Query the order of the day

Select id from order WHERE date_format(create_time, '%Y-%m-%d') = '2019-07-01'; select id from order WHERE date_format(create_time, '%Y-%m-%d') = '2019-07-01';Copy the code

The date_format function causes this query to be unable to use the index.

select id from order where create_time between '2019-07-01 00:00:00' and '2019-07-01 23:59:59';
Copy the code

Avoid to Select all

If you do not query all columns in the table, avoid using SELECT *, which does a full table scan and does not make efficient use of the index.

Like to optimize

Like is used for fuzzy queries, for example (field is indexed) :

SELECT column FROM table WHERE field like '%keyword%';
Copy the code

The query did not match the index.

SELECT column FROM table WHERE field like 'keyword%';
Copy the code

Removing the previous % query will hit the index, but does the product manager have to blur the match before and after? Fulltext can try, but Elasticsearch is the ultimate weapon.

The Join optimization

Join is implemented by the Nested Loop join algorithm, which drives the result set of the table as the basic data, uses the junction data as the filtering condition to query the data in the next table, and then merges the results. If there are multiple joins, the previous result set is used as circular data and the data is queried again in the latter table.

  1. The driven and driven tables should increase the query conditions as much as possible, satisfy the condition of ON and use less Where, and use small result sets to drive large result sets.
  2. If an index is added to the join field of the driven table, set an adequate join Buffer Size if the index cannot be established.
  3. Do not join more than three tables and try to add redundant fields.

Limit optimization

Limit is used for paging query. The performance deteriorates if you scroll backwards. The solution is as follows:

Select * from orders by id desc limit 1000000,10 分 钟 select * from orders by id desc limit 1000000 Takes 5.2 secondsCopy the code

Select ID first to narrow the query scope, written as follows:

select * from orders where id > (select id from orders order by id desc limit 1000000, 1) order by id desc limit 0,10Copy the code

If only the primary key ID is displayed, the following formula is used:

Select ID from Orders WHERE ID between 1000000 and 1000010 order by ID descCopy the code

What if the above solution is still slow? If you are interested in JDBC, you can use a cursor to implement paging queries

Other databases

As a back-end developer, it is important to be proficient in MySQL or SQL Server as the storage core, and also to take an active interest in NoSQL databases, which are mature enough and widely adopted to solve performance bottlenecks in specific scenarios.

classification The database features
The key value type [Memcache](https://memcached.org/) Used for content caching, high access load of large amounts of data
The key value type [Redis](https://redis.io/) For content caching, supports more data types than Memcache, and can persist data
The column type storage [HBase](https://hbase.apache.org/) The core database of Hadoop system, massive structured data storage, big data essential.
The document type [MongoDb](http://couchdb.apache.org/) Well-known document database, can also be used for caching
The document type [CouchDB](https://hbase.apache.org/) Apache is an open source project that focuses on ease of use and supports REST apis
The document type [SequoiaDB](http://www.sequoiadb.com/cn/) Well-known document database in China
graphics [Neo4J](https://neo4j.com/) It is used in social network to build relationship graph, recommendation system, etc