Preface: In the interview, I am often asked what is the principle of master-slave synchronization in MySQL? What is master-slave delay? How to solve the master/slave synchronization delay problem?

Advantages of master/slave synchronization

With the growth of service volume, high concurrency, database server downtime and other problems frequently occur, a single MySQL server will become a system bottleneck.

To solve this problem, data is synchronized in master-slave mode and mysql-proxy is used to increase the concurrent load of the database.

Advantages of master-slave synchronous mode:

  • Read and write separation, relieve the database pressure (the master database is used to do data write, the slave database is used to do data read);
  • One master and many followers, system scalability and high availability;
  • Data backup and disaster recovery, remote active-active, to ensure that the master library switch at any time if abnormal, improve the system fault tolerance;

binlog

Data synchronization between the primary and secondary MySQL servers is implemented using binlog logs.

Binlog Meaning and function

It is mainly used to record the writing operations (excluding queries) performed by the database. The information is stored in the disk in binary form, which can be simply understood as the SQL statement.

In practice, binlog can be used in two main scenarios:

  • The binlog is used for master-slave replication. In the master-slave structure, the binlog is sent from the master to the slave as operation records. The slave server saves the logs received from the master to the relay log.
  • Binlog is used for data backup. After a database backup file is generated, binlog saves the detailed information about the database backup so that the next backup can start from the backup point.

Log format

A binlog can be in three formats: Statement, Row, and Mixed.

Before MySQL 5.7.7, the default format was Statement. After MySQL 5.7.7, the default format was Row.

Statement format – Based on replication of SQL statements, each SQL Statement that modifies data is recorded in a binlog

Advantages: No need to record the changes of each row, reducing the amount of binlog logs, saving I/O, and improving performance.

Disadvantages: Because the record is only the execution statement, but because the SQL execution is contextualized, so the relevant information needs to be saved when saving, at the same time, there are some functions (such as uuid() function) and other statements cannot be recorded copy;

Row format — Based on copying rows, each Row change is recorded as a unit, and almost all changes can be recorded

Advantages: It records the source data of each operation and the modified target data, and restores the data precisely, thus ensuring the security and reliability of the data. In addition, the replication and data recovery process can be carried out concurrently.

Disadvantages: May cause a large number of row changes (such as ALTER table). This mode stores too much information and logs.

Mixed format * *- a compromise where Statement is used for normal operations and Row is used when Statement is not available

Master slave synchronization principle

The master (binlog dump thread) and slave (I/O thread and SQL thread are required for MySQL replication.

  • Binlog dump thread: When data is updated in the primary database, it writes the updated event type to the binlog file in the primary database based on the specified binlog format. When the I/O thread requests the log content, the master creates the dump thread and passes both the binlog name and the current update location to the SLAVE I/O thread.
  • I/O thread: this thread connects to the master, requests a copy of the specified binlog file location from the dump thread, and stores the requested binlog in a local relay log.
  • SQL thread: after detecting a relay log update, this thread reads and performs redo operations on the local database. Events that occurred in the primary database are reexecuted on the local database to ensure data synchronization between the primary and secondary databases.

Summary of basic process

  1. When data changes on the master library, write the changes to the binlog.

  2. The slave library will perform binlog detection on the master within a certain period of time to check whether the master has changed. If the change occurs, the I/O thread will connect to the master event and request to start reading the binlog to the slave library from the specified position in the binlog file.

  3. After the master library receives the request from the Slave I/O thread, the dump threads copied on the master library will read the binlog file in batches according to the request information of the Slave library and then return it to the Slave I/O thread.

  4. After the I/O thread of the slave library obtains the Log content sent by the I/O thread of the master library, it successively writes the binlog content to the end of the slave relay Log file and records the name and location of the new binlog file to the master-info file. So that the next reading of the master library binlog can tell the master server to start reading the new binlog from the specified file and location.

  5. The SQL thread of the slave library server detects the newly added log content in the local relay log in real time, translates the logs in the relay log into SQL, and executes SQL in sequence to update the data in the slave library.

Master-slave delay

Master/slave delay calculation time

According to the principle of master-slave replication, there is a certain period of data inconsistency between the two, which is the so-called master-slave delay.

Let’s look at the point in time that causes the master-slave delay:

  • Primary library A completes A transaction and writes to the binlog at A time marked T1.
  • The moment it receives the binlog from library B is called T2.
  • The moment the transaction completes from library B is denoted as T3.

So the so-called master-slave delay is the difference between the time of the completion of the slave library and the time of the completion of the master library for the same transaction, namely T3-T1.

Seconds_behind_master is returned by executing show slave status on the slave library, indicating how many seconds the slave library is currently delayed.

How is seconds_behind_master calculated?

  • Each transaction’s binlog has a time field that records the time of the write on the primary library
  • Take the time field of the currently executing transaction from the library, subtract it from the current system time, and getseconds_behind_masterThat’s t3-T1, as I described earlier.

Primary/secondary delay cause

Why the master-slave delay?

Normally, if there is no network latency, the time for logging from the master to the slave is quite short, so T2-T1 can be largely ignored.

The most direct impact is the time period of relay log consumption from the library, and the reasons are generally as follows:

1. The machine performance of the slave library is worse than that of the master library

For example, put 20 master libraries on four machines and one slave library on one machine. When the update is performed at this time, a large number of read operations are triggered. As a result, multiple slave libraries on the slave library machine compete for resources, resulting in master/slave delay.

2, from the storage pressure is large

According to the normal policy, read and write are separated, with the master library providing write capability and the slave library providing read capability. Placing a large number of queries on the slave library consumes a large amount of CPU resources on the slave library, which in turn affects synchronization speed and leads to master-slave latency.

3. Execution of big transactions

Once a large transaction is executed, the main library must wait until the transaction is complete before writing to the binlog. For example, the primary library performs an insert… Select a very large insert operation that caused hundreds of gigabytes of binlog files to be transferred to the read-only node, thus causing binlog latency to be applied to the read-only node.

As a result, DBAs often warn developers not to try delete statements to delete a large amount of data at once, but to do so in batches if possible.

DDL(alter, drop, create)

DDL synchronization between the read-only node and the master is sequential. If the DDL operation takes a long time to execute in the master, it will take the same time to execute in the slave. For example, if it takes 10 minutes to add a field to a 500W table in the master, it will take 10 minutes to add a field to the slave node.

5. Lock conflicts

Lock conflict issues can also cause slow execution of SQL threads from the slave node, such as some select…. on the slave machine SQL for update, etc.

How to reduce master-slave latency

The master-slave synchronization problem is always a trade-off between consistency and performance, depending on the actual application scenario. To reduce master-slave latency, you can do the following:

  1. Optimize SQL to avoid slow SQL and reduce batch operations. It is recommended to write scripts in the form of update-sleep.
  2. Reduce the concurrent probability of multi-threaded large transactions and optimize business logic;
  3. Increase the slave server, the purpose is to spread the read pressure, thus reducing the server load;
  4. Improve the configuration of the slave library machine, reduce the efficiency difference between the master library writing binlog and the slave library reading binlog;
  5. Try to use a short link, that is, the distance between the master library and the slave library server should be as short as possible, improve the port bandwidth, reduce the network delay of binlog transmission;
  6. In real time, the service read is forcibly removed from the primary database, and the secondary database only performs DISASTER recovery and backup.

Master-slave delay solution

In high concurrency scenarios or poor network scenarios, if there is a large delay in master/slave synchronization data, read requests to read slave libraries will read old data. The simplest and most violent way to do this is to force the main library to read. You can actually use cache notation.

  • A initiates A write request to update the primary database data and sets A mark in the cache to indicate that the data has been updated. The mark format is userId+ service Id.
  • Set this flag to set the expiration time (estimated synchronization delay between master and slave libraries)
  • B initiates a read request and checks whether the request is updated in the cache.
  • If there are flags, go to the main library; If not, request to walk from the library.

This solution solves the data inconsistency problem, but each request has to deal with the cache first, which will affect the system throughput.

Shoulders of giants

Interviewer: What does MySQL Binlog do? Master slave delay understanding?) Mp.weixin.qq.com/s?__biz=Mzg…

Pick up the little boy: (ant gold side: ten classic interview questions analysis) mp.weixin.qq.com/s?__biz=Mzg…