preface

About the previous knowledge points I will not repeat, have not seen the friends can enter my home page for access (foreword part with air tickets). This article will be the summary of the whole topic, and will be the most frequently asked question by interviewers – “What are your thoughts on MySQL performance tuning?”

Many friends out interview should be basically will be asked this question, but may be able to answer less perfect, read my project and able to take things into his belly friend should to be easing the bit in the interviewer ha ha ha ha (for senior), I hope this article today after we can integrate the scattered knowledge in their mind, I look forward to your coming back to tell me the good news (of course, you can complain) and to making progress together with you.

Back to the topic, about the MySQL performance optimization knowledge point, I will be divided into two length of the article to output, about the SQL statement performance optimization I will write a separate length, statement optimization in practice is the highest level of performance optimization point, I hope you can also digest.

The MySQL project is I has been prepared from years ago, just for the holiday at home is okay I have been thinking about how to go to this part of the article, let everybody can see clearer thinking, more profound memory, in the end I was released by the brain figure, and then according to the map in the direction of project knowledge, After all, this way I can write the article with a clear mind, so that everyone can read the article with a clear mind, and review the knowledge points according to the brain map.

I started to write the blog post last year. Before that, I took more notes on cloud notes. I think taking notes and summarizing is of great help to self-improvement, and sharing it is also hoped that everyone can learn new knowledge from it, and help me to improve together and give me some suggestions. Let me share the summary with you at the same time I can also check up on the missing.

I haven’t decided what to write about my next topic yet. If you have any ideas, please leave a message to me on the official account.

As usual, first on the plane ticket:

  1. How is a query executed
  2. How is an update statement executed
  3. Innodb logical storage structure;
  4. MySQL > index data model (B+Tree)
  5. MySQL related (4) – Performance optimization key index
  6. MySQL – Transaction features and isolation levels
  7. MySQL > Implement transaction Isolation level (MVCC)
  8. MySQL (7) – Innodb lock
  9. Innodb row-level locking
  10. MySQL related (9) – Deadlock occurrence and avoidance

The brain map mentioned above is as follows. If you want a complete hd picture, you can go to my wechat official account [6 Xixuanxin] and reply to MySQL to obtain the brain map:

The body of the

Optimization idea

Let me just go on and on

As an architect or developer, what is your thinking when it comes to database performance tuning?

Or more specifically, what would you say if you were asked in an interview: What dimensions would you use to optimize your database?

Through the previous chapter, I believe that you have slowly established the database knowledge system, and the correct tuning ideas.

When we talk about performance tuning, most of the time the goal is to make our queries faster. The action of a query is composed of many steps, each of which consumes time, as we analyzed in chapter 1 when we talked about the execution flow of SQL statements.

To reduce the time spent on queries, we need to start with each link (starting with a familiar graph).

Connect – Configuration optimization

The first step is to connect the client to the server. What performance problems may occur in this connection?

The application may fail to obtain connections because the number of server connections is insufficient. Mysql: error 1040: Too many connections

We can solve the problem of insufficient connections in two ways:

  1. On the server side, we can increase the number of connections available on the server side.

Max_connections: show variables like ‘max_connections’; Change the maximum number of connections when multiple applications are connected (2) or, or release inactive connections in a timely manner. The default timeout for both interactive and non-interactive clients is 28800 seconds, 8 hours. We can lower this value. show global variables like ‘wait_timeout’; Release inactive connections in a timely manner. Be careful not to release connections that are still in use in the connection pool

  1. From the client side, we can reduce the number of connections fetched from the server side. What if we didn’t want to create a new connection every time we executed SQL?

At this point we can introduce connection pooling to achieve connection reuse. At what levels can we use connection pooling? ORM layer (MyBatis comes with a connection pool); Or use dedicated connection pooling tools (Druid, Hikari by default in Spring Boot 2.x, DBCP and C3P0). When the client changes to fetch connections from the connection pool, how should the pool size be set? There may be a misconception that the larger the maximum number of connections in the pool, the better, so that clients can obtain more connections in high concurrency situations without queuing. This is not the case. Larger connection pools are not always better, as long as a certain number of connection pools are maintained and other clients queue up to get connections. Sometimes the larger the connection pool, the less efficient it is.

  • Druid’s default maximum connection pool size is 8. The default maximum connection pool size for Hikari is 10. Why are the defaults so small?

The github documentation for Hikari provides a formula for setting the connection pool size for a PostgreSQL database: github.com/brettwooldr… Its suggestion is to multiply the number of machine cores by two plus one. In other words, for a 4-core machine, the connection pool can maintain only 9 connections. This formula is also applicable to other databases to some extent. There is also an example of reducing the connection pool size to improve concurrency and throughput.

  • Why does reducing the number of connections increase throughput in some cases? Why is the recommended connection pool size dependent on the number of CPU cores?

For each connection, the server needs to create a thread to handle it. The more connections there are, the more threads the server will create. How can a CPU simultaneously perform tasks far larger than its core size? Time slice. Context switch. The number of CPU cores is limited, and frequent context switching will cause a large performance cost.

We are talking about tuning the database from the database configuration level. No matter the configuration of the database itself, or the configuration of the operating system on which the database service is installed, the ultimate goal of the configuration optimization is to better play the performance of the hardware itself, including CPU, memory, disk, network.

In different hardware environments, the parameters of the operating system and MySQL are configured differently. There is no standard configuration. In previous chapters we have also looked at a number of MySQL and InnoDB configuration parameters, including various switches and values. Most of these parameters provide a default value, such as default buffer_pool_size, default page size, number of concurrent InnoDB threads, etc.

These defaults will suffice in most cases, unless there is a special need to modify the parameters if you know what they mean. Modifying configurations is usually done by professional DBAs.

Hardware choices, such as using solid-state drives, building disk arrays, and choosing specific CPU models, are not the focus of our developers, and we won’t go into that too much.

If you want to know the meaning of some specific parameters, the official website has a list of system parameters for reference:

Dev.mysql.com/doc/refman/…

  • In addition to properly setting the number of server connections and the size of the client connection pool, what other solutions can we use to reduce the number of client connections to the database server?

We can introduce caching.

Caching — Architecture optimization

The cache

In the case of a very large number of concurrent applications, if there is no cache, it will cause two problems: on the one hand, it will put a lot of pressure on the database. On the other hand, at the application level, the speed at which data can be manipulated is also affected.

We can solve this problem with a third-party caching service, such as Redis.

What other optimizations can be made at the architectural level to reduce the read-write burden on a single database server?

A master-slave replication

If a single database service can not meet the access requirements, then we can do the database cluster scheme.

Clustering is bound to face the problem of data consistency between different nodes. If I read and write to multiple database nodes at the same time, how can I keep the data of all the nodes consistent?

In this case, we need to use replication technology. The node to be replicated is called master and the node to be replicated is called slave. The slave itself can also be used as a source of data for other nodes. This is called cascading replication.

How is master-slave replication implemented? Update statements log a binlog, which is a logical log.

With this binlog, the secondary server retrieves the master’s binlog file, parses the SQL statement, and executes it on the secondary server to keep the master and slave data consistent.

There are three threads that connect to the master to fetch the binlog and parse the binlog to write the relay log. This thread is called the I/O thread.

The Master node has a log dump thread, which is used to send binlogs to the slaves.

The SQL thread from the library is used to read relay logs and write data to the database.

With master-slave replication, data is only written to the master node and read requests are shared with the slave node. We call this scheme read-write separation.

Read/write separation can reduce the access pressure of the database server to a certain extent, but special attention should be paid to the problem of data consistency between master and slave. What if we write data to the master and immediately query the data from the slave, but the data from the slave has not been synchronized?

So, based on the principle of master-slave replication, we need to ask, how slow is master-slave replication?

Single thread

In the early days of MySQL, the SLAVE SQL thread was a single thread. The master supports the parallel execution of SQL statements. The maximum number of connections is the maximum number of concurrent SQL statements.

On the other hand, the SLAVE SQL can only be executed in a single-thread queue. In the case of a large amount of concurrent data in the master database, there will definitely be delays in synchronizing data.

Why can’t SQL threads from the library execute in parallel? For example, the master library executes multiple SQL statements where the user first posts a comment, then modifies the content, and then deletes the comment. These three statements must not be executed in reverse order on the slave library.

insert into user_comments (10000009,'nice');
update user_comments set content ='very good' where id =10000009; 
delete from user_comments where id =10000009;
Copy the code

So how to solve this problem? How can I reduce the latency of master/slave replication?

Asynchronous and full synchronization

First of all, we need to know that in the master-slave replication process, MySQL is asynchronous replication by default. In other words, for the master node, the binlog is written, the transaction is finished, and it is returned to the client. For the slave, if it receives the binlog, it’s done, and the master doesn’t care if the slave’s data has been written.

This ensures that data is synchronized before reading, but has the side effect of, as you might expect, taking longer to execute transactions, which can degrade the performance of the master node.

Is there a better way? Can the slave write delay be reduced without significantly increasing the time the master returns to the client?

Semi-synchronous replication

Between asynchronous replication and fully synchronous replication, there is also a semi-synchronous replication.

What does semi-synchronous replication look like?

The master library does not return to the client immediately after executing the transaction committed by the client, but waits for at least one library to receive the binlog and write it to the relay log before returning to the client. The master does not wait long, but when it returns to the client, the data will be written successfully because it only has one last step left: read the relay log and write to the slave.

cd /usr/lib64/mysql/plugin/

The master and slave libraries are different plug-ins that need to be enabled after installation:

— Primary library execution

INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
set global rpl_semi_sync_master_enabled=1;
show variables like '%semi_sync%';
Copy the code

— execute from the library

INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; 
set global rpl_semi_sync_slave_enabled=1; 
show global variables like '%semi%';
Copy the code

Compared with asynchronous replication, semi-synchronous replication improves data security, but it also causes a certain degree of delay. It has to wait for a slave to write the relay log, which is an additional process of network interaction. Therefore, semi-synchronous replication is best used on low-latency networks.

This ensures that slave data is written from the point of view of connecting the master and slave libraries.

On the other hand, if you want to reduce the latency of master-slave synchronization and reduce the waiting time caused by SQL execution, is there a way to have multiple SQL statements executed in parallel on the slave library rather than in a queue?

Parallel replication of multiple libraries

How do you implement parallel replication? Imagine if three statements were executed on three databases, operating on each database. Is there no concurrency problem? The order of execution is also not required. Of course it is, so if you are operating on three databases, the SQL threads from all three databases can execute concurrently. This is the multi-library parallel replication supported in MySQL 5.6.

But in most cases, we have a single database with multiple tables. How can we achieve parallel replication in a database? Or, as we know, the database itself supports multiple transactions at the same time; Why can these transactions be executed in parallel on the main library without problems?

Because they do not interfere with each other, for example, these transactions operate on different tables, or operate on different rows, there is no competition for resources and data interference. If a transaction is executed in parallel on the master library, it must be executed in parallel on the slave library, right? For example, if there are three transactions on the master operating on three tables at the same time, can these three transactions also be executed in parallel on the slave?

GTID replication for asynchronous replication

Dev.mysql.com/doc/refman/…

Therefore, we can divide and number the transactions that are executed in parallel on the master library into a group that can be executed in parallel on the slave library. This number is called a GTID (Global Transaction Identifiers), and this master-slave replication is called GTID-based replication.

If we want to use GTID replication, we can turn it on by modifying the configuration parameters, which is off by default:

show global variables like 'gtid_mode';
Copy the code

Whether it is to optimize the connection mode of master and slave, or to allow the slave library to execute SQL in parallel, the problem of master-slave replication delay is solved from the database level.

In addition to the database level itself, at the application level, we also have some ways to reduce master-slave synchronization latency.

After the master-slave replication, if the data stored by a single master node or a single table is too large, for example, a table has hundreds of millions of data, the query performance of a single table will still decline. We need to further classify and split the data of a single database node, which is the sub-database sub-table.

Depots table

Vertical storage, reduce concurrency pressure. Horizontal table, solve the storage bottleneck.

The practice of vertical repository splitting a database into different databases according to business:

The practice of horizontal database and table is to distribute the data of a single table to multiple databases according to certain rules.

The access pressure and storage pressure of a single database node can be reduced through the master and slave or database and table, so as to improve the performance of the database. However, what if the master node fails?

Therefore, High availability is also the foundation of High performance.

High availability solution

Dev.mysql.com/doc/mysql-h…

A master-slave replication

The traditional HAProxy + Keepalived solution is based on master-slave replication.

NDB Cluster

Dev.mysql.com/doc/mysql-c…

MySQL Cluster based on NDB Cluster storage engine.

Galera

galeracluster.com/

A multi-master synchronous replication cluster solution.

MHA/MMM

Tech.meituan.com/2017/06/29/…

MMM (Master-Master Replication Manager for MySQL), a multi-master high availability architecture, was developed by a Japanese and used extensively in the early days by companies like Meituan.

MHA (MySQL Master High Available). BM and MHA both provide a virtual IP address to the external, and monitor the master node and the slave node. When the master node fails, a slave node needs to be promoted to the master node, and the data missing from the slave node is made up, and the VIP points to the new master node.

MGR

Dev.mysql.com/doc/refman/… Dev.mysql.com/doc/refman/…

MySQL 5.7.17 InnoDB Cluster, also known as MySQL Group Replicatioin (MGR), includes the MySQL shell and mysql-route.

In the high availability (HA) solution, when a master node breaks down, how to promote the slave with the latest data to become the master? If you have multiple masters running at the same time, you have to deal with data replication between masters and routing connections for clients.

Different solutions have different implementation difficulties and operation and maintenance management costs.

The above is the optimization of architecture level, can use cache, master and slave, sub-library sub-table.

The third part: parser, lexical and grammar analysis, mainly to ensure the correctness of the statement, the statement is not wrong. Sever handles it himself. Skip it.

Step 4: Optimizer This section will be covered in a new chapter, focusing on SQL statement performance optimization.

Summary: Optimize the system

In addition to code, SQL statements, table definitions, schemas, and configuration optimizations, business-level optimizations cannot be ignored. A few examples:

1) On the Singles’ Day of a certain year, why do you make an activity to top up the balance in Yu ‘ebao and give a bonus to the balance (charge 300 and get 50 free)?

Because the payment with Yu ‘ebao or Yu ‘ebao is to record the local or internal database, while the payment with bank card requires the interface to be invoked, so the operation of the internal database is definitely faster.

2) On last year’s Singles’ Day, why is it forbidden to check bills other than today’s at dawn?

This is a downgrading measure to protect the core business right now.

3) In recent years, why is the price of Singles’ Day already available more than a week in advance? Pre-sale diversion.

At the application level there are also many other optimizations to minimize the strain on the database, such as limiting traffic, or introducing MQ peak clipping, and so on.

With MySQL, some companies can handle millions and millions of concurrent tasks, while others can’t handle hundreds of concurrent tasks. The key is how to use MySQL. Therefore, the slow use of the database does not mean that the database itself is slow, and sometimes to the upper level to optimize.

The latter

MySQL project to the end of this chapter is formal, is basically according to the direction of the mind maps, so everyone can be on the mind maps and review my blog for MySQL, basically has involved in the interview questions, write a project is indeed a cerebral energy expenses time again, I still need your attention and thumb up to support the ha ha ha ~

If you feel write still have some things to help pay attention to my public number, and in the background to leave a message to me hope I write which topic of things (now learn now sell if there is anything wrong place also please help correct, very grateful), if more people immediately arranged on ~

As always, after some time, I will first push the brain map on the official account, and then draw up the outline of the topic according to the brain map. In this way, I will not be confused in my writing, and everyone will read my blog with more direction. Thank you again for your support

By the way

There is a problem? Can you leave me a message or chat privately? Just give it a thumbs up

Of course, you can also go to my official account “6 Xi Xuan”,

Reply to “Learn” and receive a copy of the Video tutorial for Advanced Architects for Java Engineers

Answer “interview”, can obtain:

MySQL brain Map MySQL brain map

Because, I trained programmers, PHP, Android and hardware are done, but in the end or choose to focus on Java, so have what questions to ask the public for discussion (emotional pouring technology can ha ha ha), see words will reply as soon as possible, hope can with everyone common learning progress, on the server architecture, Java core knowledge analysis, career, interview summary and other articles will be pushed irregularly output, welcome to pay attention to ~~~