MySQL replication process

The official document flow chart is as follows:



1, absolute delay, relative synchronization

2. Pure write operation. Under the online standard configuration, the pressure of the secondary library is greater than that of the primary library, at least the secondary library has relaylog writing.

MySQL delay problem analysis

1. DML requests of the master library are frequent

Cause: The master library writes data concurrently, while the slave library applies logs in a single thread, which easily causes relaylog accumulation and delays.

Do sharding, break up write request. Consider upgrading to MySQL 5.7+ to enable logical clock-based parallel replication.

2. Master library performs large transactions

Cause: It takes a long time for the master library to update a large table. Under the circumstance that the master and slave libraries have similar configurations, the slave library also needs to spend almost the same time to update this large table. At this time, the slave library starts to accumulate with delay, and subsequent events cannot be updated.

Solution: Break up big transactions and submit them in time.

3, master database to execute DDL statement on large table

Cause: DDL execution has not started and is blocked. The check point in place remains unchanged. The DDL is executing, single threaded applications cause increased latency, and loci do not change.

Find a query that blocks DDL or write operations, kill the query, and let the DDL execute normally on the slave library. Use a higher version of MySQL that supports Online DDL.

4. The configurations of the primary and secondary instances are inconsistent

Cause: On hardware, the primary library instance server uses SSD, while the secondary library instance server uses SAS disks, and the CPU frequency is inconsistent. Configuration: For example, RAID card write policies are inconsistent, OS kernel parameters are inconsistent, and MySQL disk drop policies (such as Innodb_flush_log_at_trx_COMMIT and sync_binlog) are inconsistent

Solution: try to unify the configuration of DB machine (including hardware and option parameters); Even for some OLAP services, the slave library instance hardware configuration is higher than the master library, etc.

5. Excessive pressure from the library itself

Cause: A large number of SELECT requests are executed from the database, or most of the select requests are routed to the secondary database instance, or a large number of OLAP services are running, or the secondary database is backing up. As a result, the CPU load and I/O usage are too high, and SQL Threads are slowly applied.

Create more slave libraries to break up read requests and reduce the pressure of existing slave instances.

You can also adjust the innodb_flush_log_at_trx_COMMIT =0 and sync_binlog=0 flush parameters to reduce I/O pressure and master/slave latency.

3. CPU excessively high during the promotion period

Phenomenon:

High concurrency results in a high CPU load, which lengthens the time for processing requests and causes the service to become unavailable. A lot of slow SQL leads to high CPU load.

Solution:

Basically prohibit or carefully consider the database master/slave switch, this can not solve the fundamental problem, the need for RESEARCH and development with a radical solution to SQL problems, can also service degradation, container can dynamically expand THE CPU; Negotiate with services. Start pt-kill Query read-only slow SQL. See if you can solve the slow SQL problem by adding general or federated indexes, but consider the impact of DDL on the database.

Iv. InnoDB flush strategy

The innodb_flush_method parameter controls the opening and flushing modes of innoDB data files and redo logs. Fdatasync (default), O_DSYNC, O_DIRECT: fdatasync(default), O_DSYNC (default), O_DIRECT: fdatasync. Innodb opens and writes redo logs using O_SYNC. Fsync () opens data files using O_DIRECT. Fsync () writes data files and redo logs using three steps: The fsync(int fd) function is used to flush the buffer associated with the file referenced by the FD file descriptor to disk and to flush metadata (such as modification date, creation date, etc.). If you open the redo file in O_DSYNC mode, data is written to disk and metadata is updated. O_DIRECT means that we write directly from MySQL’s InnoDB buffer to disk.

The data writing modes are as follows:

In fdatasync mode, when data is written, the write step does not need to be written to disk (it may be written to the OPERATING system buffer), but is flushed. The buffer is flushed to the operating system, and the metadata of the file is updated to disk. O_DSYNC mode: Log writing is performed in write mode and data file writing is performed in flush mode through fsync. O_DIRECT mode: Data files are written directly from mysql innoDB buffer to disk, not through OS buffering, and are actually flushed by the OS.





1. On unix-like operating systems, opening a file as O_DIRECT minimizes the impact of buffering on IO. The IO of the file is operated directly on the buffer in user space, and the IO operations are synchronous, so whether the read() or write() system call, Data is guaranteed to be read from disk; So the IO side of the minimum pressure on the CPU processing pressure is also minimal, the physical memory footprint is also minimal; However, since there is no operating system buffer, the speed of writing data to disk is significantly reduced (in the form of longer write response times), but it does not significantly reduce the overall volume of SQL requests (depending on the innodb_buffer_POOL_size being large enough).

2. In O_DSYNC mode, files are opened in I/O synchronization mode. Any write operation will be blocked until data is written to the physical disk. This results in longer CPU waits, reduced SQL request throughput, and longer INSERT times.

3. The fsync(int Filedes) function only works on the single file specified by the file descriptor Filedes and waits for the disk write operation to finish and then returns. The fdatasync(int Filedes) function is similar to fsync, but it only affects the data part of the file. In addition to data, fsync also synchronizes and updates the meta information of files to disk.

O_DSYNC had the highest pressure on CPU, datasynC followed, and O_DIRECT had the lowest. O_DSYNC is poor in terms of overall SQL statement processing performance and response time. O_DIRECT performs better in SQL throughput (second only to DATASynC), but has the longest response time.

The default datasync mode takes full advantage of the operating system buffer and Innodb_buffer_pool processing performance. However, the negative effect is that the free memory decreases too fast, resulting in frequent page swapping and disk I/O pressure, which seriously affects the stability of large concurrent data writes.