Mysql replication mode (semi-synchronous replication, parallel replication, multi-source replication)

MySQL primary/secondary replication (asynchronous, default)

Mysql master-slave replication principle

Mysql replication works as follows:

1. The primary library records binlog logs

The master database records the data update event in the binary log, binlog, before each transaction is updated. The sync_binlog parameter on the primary library controls the flushing of binlog logs to disk.

2. The SECONDARY library I/O thread copies the binlog of the primary library to its local relay log

The slave library will start an IO thread, and the IO thread will connect to the master library, and then the master library will start a special binlog dump thread. The binlog dump thread will read the events in the binlog on the master library, it won’t poll for events all the time, and when it catches up with the master library it will go to sleep. The main library is not woken up until it sends a semaphore notifying it of a new event.

3. Replay from the SQL thread of the library

The SQL thread from the slave library reads events from the slave relay log and executes them in the slave library to update slave data.

Note that:

Concurrent changes on the master library can only be serialized on the slave library because there is only one SQL thread to replay the relay log, which is a performance bottleneck for many workloads.

While there are some solutions to your problem, most users are still subject to single threads.

In older versions of mysql, IO threads were single-threaded, but in newer versions IO threads can also be multi-threaded, but in any case, SQL threads are single-threaded.

Semisynchronous Replication

1. Data is lost due to primary/secondary delay

Before Mysql5.5, replication was asynchronous, and there was a lag between master and slave data, so there was a problem:

When a transaction is successfully written to the master database and submitted, but the slave database has not received the binlog pushed by the master database, if the master database breaks down at this time and the transaction binlog on the master database is lost, the slave database may lose this transaction, resulting in the inconsistency between the master and slave database.

To solve this problem, Mysql5.5 introduced a semi-synchronous replication mechanism.

2. Semi-synchronous replication

Official documents:

17.3.8 Semisynchronous Replicati

Semi-synchronous replication is supported as a plug-in starting with Mysql5.5. It is turned off by default and must be turned on in configuration when used.

Mysql’s default replication is asynchronous, with the master library successfully returning to the client after writing the event to the binlog without knowing if and when the slave library will fetch and process the log.

With semi-synchronous replication, the thread that performs the transaction commit after the master commits will wait until at least one semi-synchronous slave confirms that it has received all events, and the slave acknowledges the event only after writing the event to its relay log and flushing it to disk.

At this point, the master library does not respond to the client until it receives the acknowledgement.

Semi-synchronous replication further guarantees data integrity by ensuring that after a successful transaction commits, there are at least two log records, one in the master binlog and the other in the relay log of at least one slave.

If the master database breaks down in steps ①, ②, and ③ in the figure, the transaction is not submitted successfully and the slave database does not receive the binlog of the transaction. Therefore, the data of the master and slave are consistent.

Note that:

(1) Semi-synchronous replication can be implemented only when semi-synchronous replication is enabled for both the primary and secondary libraries. Otherwise, the primary library is restored to the default asynchronous replication.

(2) When the master database waits out, it will also revert to the default asynchronous replication. When at least one slave catches up, the master reverts to semi-synchronous replication.

Threaded slave (threaded slave)

1. Master/slave delay

Since multiple clients can write concurrently on the master library, when the TPS of the master library is high, the processing speed of the slave library may not keep pace with that of the master library because the SQL thread of the slave library is single-threaded, resulting in delay.

When the master reaches 1000/s concurrency, the latency of the slave is negligible, which is a few milliseconds. When the concurrency of the master reaches 2000/s, the delay of the slave will be tens of milliseconds. When the concurrency of the master library reaches 4000/s,6000/s,8000/s, the pressure of the master library at this time is very high, and it is almost hanging up. The delay from the library can reach several seconds. These are just empirical values, but within reasonable limits. (From Huperia chinensis)Copy the code

You can check the master/slave delay by monitoring the value of Seconds_Behind_Master output from the show slave status command.

NULL: Indicates that io_thread or SQL_thread is faulty.
0: The value is zero, indicating that the primary and secondary replication is successful.
Positive value: indicates that the master/slave delay has occurred. A larger value indicates that the slave library delay is more serious.

2. Parallel replication

Parallel replication means that multiple SQL threads are started from the library, the logs of different libraries are read in parallel in the relay log, and the logs of different libraries are replayed in parallel. This is library-level parallelism.

① Parallel replication in Mysql5.6

MySQL 5.5 does not support parallel replication. MySQL 5.6 is starting to support parallel replication, but it’s only schema-based, that is, library-based. When there are multiple libraries, multiple libraries can be replicated in parallel without interfering with each other.

But most of the time, you might have a single schema, that is, a single library, and then schema-based replication is useless.

The core idea is that the data submitted concurrently by tables in different schemas will not affect each other. In other words, the slave node can assign a thread with similar SQL function to each schema in the relay log to replay the transactions submitted by the master database in the relay log to keep the data consistent with the master database. You only need to set slave_parallel_workers to enable parallel replication. For example, slave_parallel_workers= 4 indicates that four SQL threads (coordinator threads) are available for parallel replication.

② Parallel replication in Mysql5.7

MySQL 5.7 provides “true” parallel replication, called enhanced multi-threaded slave (MTS). The replication latency has been significantly improved.

Binary log group commit (binary log group commit) The last_committed transactions (sequence_num is different) in the slave relay log can be executed concurrently. Slave_parallel_workers >0 and global.slave_parallel_type = ‘LOGICAL_CLOCK’ Slave_parallel_workers worker threads concurrently execute transactions committed by the master library in the relay log.

3. ★ Step pit case (master-slave delay solution)

In fact, we often encounter this, for example, after using the master/slave architecture of mysql, we may find that the data is not found, the result is finished. So you actually want to 【 what should consider good in scene down to the mysql master-slave synchronization, suggestion is generally reading than writing, and reading in general requirements for data timeliness is not so high, only use mysql master-slave synchronization 】 so this time, we can think about a thing, "you can use the mysql parallel copying, The problem is that it's library-level parallelism, so sometimes it doesn't work very well. So in general, for scenarios where you have to make sure it's available immediately after you write it, you can be sure that you can read the data. There is no problem with some database middleware. Generally speaking, if the master-slave delay is serious: 1. Architecture level: divide a master library into multiple ones. For example, if one library is split into four master libraries, the write concurrency of each master library is 500/s, and the master-slave delay can be ignored. 2. Enable parallel replication for multiple libraries supported by mysql. But if the write concurrency of a library is extremely high, and the single-library write reaches 2000/s, parallel replication still doesn't make sense. Rule 28, a lot of times, for example, is that a few order tables write 2000/s, and dozens of other tables 10/s. 3. Code level: Rewrite code. Those of you who write code, be careful. At that time, we actually asked the student to rewrite the code for a short period of time. After inserting the data, we directly updated it without querying it. 4. Direct connection to the main library. If there is indeed a query that is requested immediately after insertion and then performs some operations in reverse based on the result (such as a state), then you can set the direct connection to the main library for that query. However, this method is not recommended, because the meaning of separating reading and writing is lost.Copy the code

4. Multi-source Replication

MySQL supports multi-source replication from version 5.7.

Prior to MySQL 5.7, only one master, one master, multiple slaves, or multiple master, multiple slave replication could be implemented.

If you want to implement multiple master, one slave replication, you can only use MariaDB, but MariaDB is not compatible with the official MySQL version.

MySQL 5.7 now supports multi-master, single-slave replication, also known as multi-source replication.

Therefore, multi-source replication requires at least two masters and one Slave.

1. Multi-source replication scenarios

When the data analysis department needs some data from each business department for data analysis, multi-source replication can be used to replicate the data from each primary database to a unified database.
In the secondary server for data summary, if our master server has carried out the operation of sub-database and sub-table, in order to achieve some later data statistics function, it is often necessary to summarize the data together and then statistics.
Of all the primary data from the server for backup, in MySQL 5.7 before every master server requires a from the server, so it is easy to cause waste of resources, but also increases the maintenance costs, of the DBA but MySQL 5.7 introduced multi-source replication, can take multiple primary server data synchronization to a backup from the server.

2. Configure multi-source replication

When configuring multi-source replication, the primary library can be configured to use either gTID-based replication or binlog-based replication.

Official documents:

Section 16.1.3.4, “Setting Up Replication Using GTIDs”

Section 16.1.2.1, “Setting the Replication Master Configuration”

Conclusion:

Mysql default replication is asynchronous, the master library writes a binlog and returns it to the client, regardless of whether the slave library successfully copied.

Due to the master/slave delay, when the master library hangs up, data will be lost. In order to solve the problem of data loss, a semi-synchronous replication mechanism is introduced.

In order to solve the performance problem of replication, we can enable multi-thread parallel replication.

(Mysql5.5 does not support parallel replication. Mysql5.6 has started to support parallel replication based on schema. Mysql5.7 has improved and enhanced 5.6 parallel replication.)

Mysql5.7 Multi-source replication allows multiple MSSQLS to replicate simultaneously to a single slave library.