background

Some time ago, I encountered an online problem. After a long time of investigation, I found that it was caused by the delay of master and slave synchronization. Therefore, I will write an article today to summarize this problem, hoping it will be useful for you. If you like it, add a thumbs-up

Mind mapping

Mind mapping

Common master-slave architectures

With the increasing number of visitors, the capacity of a single database has been stretched. Hence the master-slave architecture of separating read and write data from the master library and write data from the master library was derived.

A master from

From more than a master

One master, one slave and one master, many slave are the most common master-slave architectures, which are simple and effective to implement. They can not only achieve high availability, but also achieve read and write separation, thus improving the concurrency of the cluster.

More from Lord

Double master replicates

Cascade copy

Master slave synchronization principle

To understand how master-slave synchronization works, there are two important log files to keep in mind

  • Binary log file

  • Relay log file

Master/slave synchronization process

  • Generate two threads from the library, one I/O thread and one SQL thread;

  • The I/O thread requests the master binlog and writes the resulting binlog to the relay log file.

  • The primary library generates a log dump thread that passes binlogs to the secondary LIBRARY I/O thread

  • The SQL thread will read the logs in the relay log file and parse them into specific operations to achieve the consistency of the master and slave operations and the consistency of the final data.

How can I determine whether the master/slave is delayed

You can view the value of Seconds_Behind_Master displayed in the show slave status command.

  • NULL: indicates that io_thread or SQL_thread is faulty.

  • 0: the value is zero, indicating that the master/slave replication is good.

  • A positive value indicates that the master/slave delay has occurred. A larger value indicates that the slave library delay is more serious.

Primary/secondary delay cause

Random playback

MySQL master-slave replication is a single-thread operation. The master database writes all DDL and DML logs to the binlog. Since the binlog is sequential, it is very efficient. The Slave SQL Thread replays the DDL and DML operation events of the master library in the Slave. DML and DDL IO operations are random, not sequential, and cost a lot more. So SQL Thread threads can’t keep up with the master library’s binlog learning speed, resulting in master-slave latency

Lock wait

On the other hand, since SQL threads are also single-threaded, when the concurrency of the master library is high, the number of DML generated exceeds the speed that the SLAVE SQL threads can process, or when a large query statement in the slave generates a lock wait, the delay occurs.

Master-slave delay solutions

The parallel copying

Since the speed of a single thread of SQL replay is limited, can we use a multi-threaded way to replay? MySQL 5.6 provides a parallel replication approach by converting SQL threads into multiple work threads for replay, which solves the master-slave delay problem

Reduce the concurrent

If you understand how random replay can cause master-slave delays, then it is easier to understand. Controlling the write speed of the master library makes master-slave delays less likely to occur.

Read the main library

If you are in a very real-time business like payment, the most direct way is to read the main library directly.

A few words about

Hi, I’m Xiao Fan, a back-end engineer. If you feel that the article is a little help to you, welcome to share with your friends, but also to small fan point a like, interested in my public account – programmer small fan. Thank you and see you next time!