1. Purpose of database optimization

Avoid page access errors

  • Page 5xx error due to database connection timeout
  • The page cannot be loaded due to slow query
  • Data cannot be committed due to blocking

Increase database stability

  • Many database problems are caused by inefficient queries

Optimize the user experience

  • Smooth page access speed
  • Good website functional experience

1.1 Database Optimization Direction

  • Hardware (highest cost, lowest effect)
  • The system configuration
  • Database table structure
  • SQL and indexes (lowest cost, highest performance)

2. SQL optimization

2.1 How do I find faulty SQL?

Use MySQL slow query logs to monitor SQL for efficient issues

2.1.1 Setting Global Variables Enable slow query logs

  1. Query whether the slow query of MySQL logs is enabled
show VARIABLES LIKE 'slow_query_log';
Copy the code
  1. Example Set the file location of slow query logs
# F:/tools/develop-tools/mysql5.7/mysql_log/mysql-slow
set global slow_query_log_file = 'F: / tools/develop the tools/mysql5.7 mysql_log/mysql - missile. The log';
Copy the code
  1. Sets whether to record SQL that does not use an index
set global log_queries_not_using_indexes = on;
Copy the code
  1. Set logging whenever SQL execution exceeds n seconds
0.001 seconds is set here to facilitate testing, generally larger than this
set global long_query_time = 0.001 ;
Copy the code
  1. Enable the MySQL slow query log function
set global slow_query_log = on;
Copy the code

2.1.2 Slow Query of Configuration File Mode Settings

  • Modify the configuration file my.cnf and add the following content under [mysqld]
[mysqld]
slow_query_log = ON
log_queries_not_using_indexes = ON;
slow_query_log_file = F:\tools\develop-tools\mysql5.7\mysql_log\mysql-slow.log
long_query_time = 1
Copy the code
  • View the configured parameters
show variables like 'slow_query%';
show variables like 'long_query__time';
Copy the code

2.1.3 Slowly Querying log Contents

# The time to execute this statement
# Time: 2020-06-01 T01: according to 368780 z
The host where the SQL is executed
# User@Host: root[root] @ localhost [::1] Id: 3
# SQL execution information
# Query_time: 0.006281 Lock_time: 3055 Rows_sent: 2 Rows_examined: 1034
use test;
SQL execution time
SET timestamp=1590976758;
SQL executes content
SHOW VARIABLES LIKE 'slow_query%';
Copy the code

Comparison of five types of slow query logs

Tools/Functions General statistics Advanced statistics The script advantage
mysqldumpslow support Does not support perl Mysql mysql
mysqlsla support support perl Powerful, complete data reports, strong customization ability
mysql-explain-slow-log support Does not support perl There is no
mysql-log-filter support Part of the support perl Keep output concise without losing functionality
Myprofl support Does not support php Very lean
### 2.1.4 How to find problematic SQL through slow query log
  1. SQL that has multiple queries and takes a long time for each query

The first few queries that are usually not parsed by pt-query-digest

  1. IO large SQL

Notice the Rows examine item in the Pt-Query-digest analysis

  1. SQL that did not match the index

Notice the contrast between Rows examine and Rows Send in the Pt-Query-Digest analysis

2.2 How do I Analyze SQL Queries

2.2.1 usingexplainQuery the plan for SQL execution

SQL execute statement
EXPLAIN SELECT customer_id,first_name,last_name FROM customer; 
Copy the code

2.2.2 explain: Returns the meaning of the columns

  1. Table: table name
  2. Type: Displays which type of connection is used. For example: system(system table, small amount of data, often without disk IO), const(constant join), Eq_ref (primary key index or non-empty unique index), REF (non-primary key, non-unique index), range(range scan), index(index tree scan), ALL(full table scan)

System > const > eq_ref > ref > range > index > ALL

  1. Possible_keys: Displays possible indexes that can be applied to this table. If empty, there is no possible index.
  2. Key: indicates the actual index. If NULL, no index is used.
  3. Key_len: the length of the index used. With no loss of accuracy, the shorter the length, the better.
  4. Ref: Shows which column of the index is used, a constant if possible.
  5. Rows: The number of rows that MySQL deems necessary to check to return the request data.

2.2.3 Return value of extra column to note

Using filesort, Using temporary: When you see this, the query needs to be optimized.

2.2.4 Creating an index for a table field

The payment query is a full table query

# query statement
explain select max(payment_date) from payment;
Copy the code

The results are shown below: Add an index to the payment_date field

Select idx_paydate from payment_date; select idx_paydate from payment_date
CREATE index idx_paydate on payment(payment_date;)
Copy the code

The effect is shown below:

2.3 Optimization methods of Count() and Max()

Need: Find the number of 2006 and 2007 movies in one SQL – optimize the count() function

SELECT count(release_year = '2006' or NULL) as 'Number of Films in 2006'.COUNT(release_year='2007' or NULL) as 'Number of Films in 2007' from film;
Copy the code

2.3.1 Difference between count(*) and count(field)

Count (field) does not contain cases where the field is NULL, while count(*) does.

Select * from staff table
SELECT count(*),count(picture) from staff;
Copy the code

The result is:

2.4 Optimization of sub-query

In general, bragging bar subqueries are optimized to join queries, but it is important to pay attention to whether there is a one-to-many relationship between the associated keys (duplicate data may occur in join, and the distinct function is used to remove the duplicate data).

** Query all movies that Sandra appears in

explain select title,release_year,LENGTH FROM film where film_id IN(select film_id from film_actor where actor_id IN(select actor_id from actor where first_name = 'sandra'));
Copy the code

2.5 Group by Query Optimization

Before optimize SQL

Using (id) is equivalent to on a.id = b.id
explain select actor.first_name,actor.last_name,count(*) FROM film_actor INNER JOIN actor USING(actor_id) GROUP BY film_actor.actor_id;
Copy the code

Using filesort, Using temporary

The optimized SQL

explain select actor.first_name,actor.last_name,c.cnt FROM actor INNER JOIN (SELECT actor_id,count(*) as cnt from film_actor GROUP BY actor_id) as c USING(actor_id);
Copy the code

Using filesort or temporary for extra

2.6 Limit Query Optimization

Limit is usually used for pagination, often with an order by clause, so Filesorts is often used which can cause a lot of IO problems

Limit 50, 5 = 5 limit 50, 5 = 5 Order by indicates ascending order (default) and order by XXX desc indicates descending order
SELECT film_id,description FROM film ORDER BY title limit 50.5;
Copy the code

SQL > select * from table where type=ALL; SQL > select * from table where type=ALL1000Therefore, when the amount of data is large, it will cause great I/O problems, and SQL needs to be optimized.

Optimization strategy:

  • Use indexed columns or primary keys for Order BY operations
    SELECT film_id,description FROM film ORDER BY film_id limit 50.5;
    Copy the code

    A primary key index query is used55Bar data and does not use file sorting, so it is greatly optimized.But when we change the constraint toLimit of 500, 5The scanned table is505Line. So it needs to be optimized

  • Records the primary key returned last time. Primary key filtering is used in the next query
    SELECT film_id,description FROM film where film_id>55 and film_id<=50 ORDER BY film_id limit 1.5;
    Copy the code

    I’m only scanning at this point5Strip data and file sorting is not used, so the optimization is greater, but there isThe premise conditionIf (1,2,5) the primary key is in order, and the order is not in order (e.g., 1,2,5), the primary key is in order (e.g., 1,2,5). If (1,2,5), the primary key is in order (e.g., 1,2,5).

3. Index optimization

3.1 How to select an appropriate column to build an index

  • The columns that appear in where,group by,order by, on clauses
  • The smaller the index field, the better
  • Place columns with high dispersion in front of the joint index (dispersion: there are many different values for this field in the database)

For example, select * from payment where staff_id = 2 and customer_id = 584; Is the index (customer_id, staff_id)? Or index(customer_id,staff_id)?

Answer: Since customer_id is more discrete, index(customer_id,staff_id) should be used

3.2 Index optimization method of SQL

  • Repeat index

A duplicate index is an index of the same type that is created on the same column in the same order.

# id add primary key and unique(id) to duplicate index
create table test(
  id int not null primary key.name varchar(10) not null,
  title varchar(50) not null.unique(id))engine=innodb;
Copy the code
  • Redundant indexes

Redundant indexes refer to indexes that have the same prefix column or contain a primary key in the joint index. Key (name, ID) in the following column is a redundant index

# id add primary key and key(name,id) as redundant indexes
create table test(
  id int not null primary key.name varchar(10) not null,
  title varchar(50) not null.key(name.id))engine=innodb;
Copy the code
  • Lookup from duplicate and redundant indexes

Method 1: Use MySQL’s Information_SCHEMA database to search for duplicate and redundant indexes

USE information_schema;
SELECT a.table_schema AS 'Database', a.table_name AS 'the name of the table', a.index_name AS Index of '1', b.index_name AS 'index 2', a.column_name AS 'Duplicate column name'
FROM information_schema.statistics a
    JOIN statistics b ON a.table_schema = b.table_schema
        AND a.table_name = b.table_name
        AND a.seq_in_index = b.seq_in_index
        AND a.column_name = b.column_name
WHERE a.seq_in_index = 1
    ANDa.index_name ! = b.index_nameCopy the code

The results are as follows: Attendace-system s_admin; attendace-system s_adminRepeat indexId.

Check duplicate and redundant indexes using pt-duplicate-key-checker.

Usage Method pt-duplicate-key-checker – HXXX (host) -uxxx(username) -pxxx(password)

3.3 Methods of index maintenance

  • Delete unused index

MySQL currently does not record the index usage, but in PerconMySQL and MariaDB you can use the INDEX_STATISTICS table to check which indexes are not used. However, in MySQL, the index usage can only be analyzed through slow query logs and pt-index-Usage tool.

4. Database structure optimization

4.1 Select an appropriate data type

Data type selection, the key is appropriate two words, how to determine whether the selected data type is appropriate?

  • Use the smallest data type that can hold your data
  • Use simple data types. Int is easier to handle with MySQL than vARCHar
  • Use not NULL to define fields whenever possible
  • Use text as little as possible, and consider tables when you must.

4.1.1 Storage Date

Use FROM_UNIXTIME() to convert an int timestamp to a date-time format,UNIX_TIMESTAMP() to convert a date-time format to an int timestamp. Two functions to do the conversion.

# to create table
create table test(id int AUTO_INCREMENT NOT NULL,timestr INT,primary key(id));
# insert data
insert into test(timestr) values(unix_timestamp('the 2014-06-01 13:12:00'));
select FROM_UNIXTIME(timestr) from test;
Copy the code

4.1.2 Storage IP Address

Bigint is used to store IP addresses, and INET_ATON() is used to convert IP into an unsigned integer (4-8 bits), and INER_NTOA() is used to convert IP into a string address

create table sessions(id int AUTO_INCREMENT NOT NULL,ipaddress BIGINT,primary key(id));
# insert data
insert into sessions(ipaddress) values(INET_ATON('192.168.0.1'));
select INET_NTOA(ipaddress) from sessions;
Copy the code

4.2 Formalized optimization of database tables

Normalization refers to the specification of database design. At present, normalization generally refers to the third design paradigm, that is, the transfer function dependence of any candidate key field in the data table is required to conform to the third paradigm if there is no non-key field.

Tables that do not meet the requirements of the third normal Form have the following problems:

  • Data redundancy
  • Data insertion is abnormal. Procedure
  • Data update is abnormal
  • Data deletion is abnormal. Procedure

Anti-formalization refers to the consideration of the future query efficiency, adding the redundancy of the table that originally conforms to the third normal form appropriately, so as to optimize the query efficiency. Anti-formalization is a kind of operation that swaps space for time.

Vertical splitting is the practice of splitting a table with many columns into multiple tables, which solves the problem of table width. In general, vertical splitting can be carried out according to the following principles.

  • Store uncommon fields in a separate table
  • Store large fields in a separate table
  • Put together fields that are often used together

Horizontal split is to solve the problem of large amount of data in a single table. The structure of each table in horizontal split is identical.

Challenges of horizontal splitting: 1. Query data across partitioned tables. 2. Operation of statistics and background reports.

5. Optimize the system configuration

5.1 Optimization of database System Configuration

5.1.1 Optimized the OPERATING system Configuration

The database is based on the operating system. At present, most MySQL is installed on Linux. Therefore, some configuration parameters of the operating system also affect the performance of MySQL

  • For network configuration, modify the /etc/sysctl.conf file
# increase the number of queues supported by TCP net.ipv4.tcp_max_syn_backlog = 65535 Net.ipv4. tcp_max_tw_buckets = 8000 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout =  10Copy the code
  • Limit on the number of open files.

Can use ulimit – a directory of limitation, can modify the/etc/security/limits file, add the following content to modify opens the file number of restrictions

# soft nofile 65535
# hard nofile 65535
Copy the code

In addition, it is best to turn off iptables, Selinux and other firewall software on MySQL server

5.2 MySQL Configuration File Optimization

In most cases, the configuration file is located in /etc/my.cnf or /etc/mysql.my.cnf. In Windows, the configuration file can be located in C:/ Windows /my.ini. The order in which MySQL looks for configuration files can be obtained by using the following method

$ /usr/sbin/mysld --verbose --help | grep -A 1 'Default options'
Copy the code

Note: If a configuration file exists in more than one place, the last one overwrites the previous one.

5.2.1 Description of Common Parameters

  • innodb_buffer_pool_size

A very important parameter to configure the Innodb buffer pool. If the database has only Innodb tables, it is recommended to configure 75% of the total memory

  • innodb_buffer_pool_instances

The number of buffer pools can be controlled; by default there is only one buffer pool

  • innodb_log_buffer_size

The size of the innoDB log buffer is generally not too large because the log is refreshed at most every second

  • innodb_flush_log_at_trx_commit

Key parameters, which greatly affect innoDB IO efficiency. The default value is 1. The value can be 0, 1, or 2. You are advised to set the value to 2.

  • Innodb_read_io_threads, innodb_write_io_threads

Determines the number of I/O processes Innodb reads and writes. Default is 4

  • innodb_file_per_table

The key parameter controls Innodb to use a separate table space for each table. The default value is OFF, which means that all tables are created in a shared table space

  • innodb_stats_on_metadata

Determines under what circumstances MySQL will refresh Innodb table statistics

5.3 Using third-party Configuration Tools

  • Percon Configuration Wizard

6. Server hardware optimization

6.1 How do I Select a CPU

** Consider: ** Should YOU choose a faster CPU with a single core or a CPU with more cores?

  • MySQL can only use a single-core CPU for some of its work. For example, Replicate, SQL
  • MySQL does not support more CPU cores faster. MySQL5.5 use no more than 32 cores on a 1 server

6.2 Disk I/O Optimization

This section describes common RAID levels

  • RAID0: Also called stripe, it is used to link multiple disks into one hard disk. This level of I/O is the best.
  • “RAID1” : Also called mirroring, the raid group must have at least two disks that store the same data.
  • RAID5: Combines multiple (at least three) hard disks into one logical disk. Data is read and written to create parity effect information, and the parity effect information and corresponding data are stored on different disks. When data on a RAID5 disk is corrupted, the remaining data and the corresponding parity effect information are used to recover the corrupted data
  • RAID1+0: The combination of RAID1 and RAID0 has the advantages and disadvantages of both levels. It is generally recommended that databases use this level.

Consider: Are SNA and NAT appropriate for a database?

  1. Often used in high-availability solutions
  2. Sequential reads and writes are efficient, but random reads and writes are not
  3. The database has a high random read/write ratio

Document video: MySQL Optimization for Performance Optimization