In a master/slave relationship, each standby database receives the master database’s binlog and executes.

Under normal circumstances, as long as all binlogs generated by the master database can be passed to the standby database and executed correctly, the standby database will reach the same state as the master database, which is the final consistency.

But for MySQL to provide high availability, final consistency is not enough.

1 Active/standby delay

The active/standby switchover may be an active o&M operation, such as software upgrade, or a passive operation, such as power outage in the equipment room.

Synchronization delay: The time points related to data synchronization are as follows:

Primary library A completes A transaction and writes to the binlog, which is denoted as T1;
The time at which the binlog is received by standby library B is denoted as T2;
Standby B executes and completes the transaction, which is denoted as T3.

The so-called primary/secondary delay is the difference between the completion time of the secondary database and the completion time of the primary database for the same transaction (T3-T1).

You can run the show slave status command on the secondary database. The result shows seconds_behind_master, which indicates the current secondary database delay.

Seconds_behind_master

Each transaction has a time field in the binlog, which records the time of the write on the primary database.
The standby library takes the value of the time field of the current transaction being executed, calculates the difference between it and the current system time, and getsseconds_behind_master.

If the system time Settings of the active and standby databases are inconsistent, the delay values of the active and standby databases are not incorrect. When the standby library is connected to the primary library, the system time of the current primary library is obtained by executing SELECT UNIX_TIMESTAMP(). If the system time of the primary database is inconsistent with its own, the secondary database will automatically correct the difference during the seconds_behind_master calculation.

Under normal network conditions, the time required for logging from the primary to the secondary is very short (t2-T1 is very small). The main source of the primary/secondary delay is the time difference between the secondary database receiving the binlog and executing the transaction. Therefore, the most direct manifestation of the master/slave delay is that the slave consumes the relay logs slower than the master produces the binlogs.

2 Source of the active/standby delay

2.1 Poor performance of the machine where the standby database resides

The IOPS burden of update requests is the same on the primary and secondary databases. In this deployment, the secondary database is usually set to the non-double-1 mode.

The update process also triggers a large number of read operations. When multiple standby databases compete for resources on the standby database master, the active and standby databases may be delayed.

2.2 High pressure of standby warehouse

Since the primary library provides write capability, the secondary library can provide some read capability. Or some analysis statements required by the operation background cannot affect normal services, so they can only be run on the standby database.

When the backup pressure is large, it can be handled as follows:

One master follows many. In addition to the standby library, you can connect several slave libraries, let these slave libraries to share the pressure of reading.
Binlog output to an external system, such as Hadoop, to provide statistical query capabilities.

2.3 large transaction

The primary database must wait for the transaction to complete before writing the binlog to the standby database. So, if a statement on a master library executes for 10 minutes, the transaction is likely to result in a 10-minute delay from the master library.

Deleting too much data at once with a DELETE statement is a typical large transaction scenario.

Another typical large transaction scenario is large table DDL. The solution is the planned DDL and gh-OST solution is recommended. Mysql > alter table file size = 1;

3 Reliability first policy

In the double M structure of Figure 1, the detailed process of switching from state 1 to state 2 is as follows:

Determine the current value of standby Bseconds_behind_masterIf it is less than a certain value (such as 5 seconds), continue to the next step, otherwise continue to retry this step;
Change primary library A to read-only;
Determine the value of standby database Bseconds_behind_masterUntil the value becomes 0;
Change standby database B to read-write state;
The business request is cut to standby database B.

There are unavailable times in this switching process. After Step 2, both primary libraries A and secondary libraries B are in readonly state and cannot be restored until Step 5 is complete.

In this unavailable state, the more time-consuming step 3 May take several seconds. Therefore, check whether the seconds_behind_master value is small enough in Step 1.

The amount of time a system is unavailable is determined by a data-reliability-first policy. You can also choose an availability first strategy to reduce this unavailability time to almost zero.

4 Availability First policy

If steps 4 and 5 are performed at the beginning, that is, the connection is directly cut to standby database B and standby database B can read and write data without waiting for data synchronization between the master and standby database, then the system will be almost unavailable.

The cost of this switching process is the possibility of data inconsistencies.

CREATE TABLE `t` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `c` int(11) unsigned DEFAULT NULL.PRIMARY KEY (`id`)
) ENGINE=InnoDB;

insert into t(c) values(1), (2), (3);
Copy the code

This table defines an auto-increment primary key ID. After initializing data, there are 3 rows on both the primary and secondary databases. Next, execute two insert statements on table T, in order:

insert into t(c) values(4);
insert into t(c) values(5);
Copy the code

Suppose that there are now a large number of updates to other tables on the master database, resulting in a master/standby delay of 5 seconds. After inserting a statement with c=4, an active/standby switchover is initiated.

Figure 3 isAvailability first policyAnd,binlog_format=mixedWhen the switching process and data results.

In step2, master library A completes the insert statement, inserts A row of data (4,4), and then starts the master/standby switchover.
In step3, since there is a 5-second delay between the master and the slave, the standby library B starts to receive the “insert c=5” command from the client before applying the “insert c=4” transfer log.
In step4, standby B inserts A row of data (4,5) and sends this binlog to primary A.
In step5, standby database B performs the relay log of “insert c=4” and inserts a line of data (5,4). The statement “insert c=5” executed directly in standby B, passed to primary A, inserts A new row (5,5).

Availability first policy, but set binlog_format=row

Because the row format records all the field values of the newly inserted row when recording the binlog, only one row will be inconsistent. In addition, the application threads on both sides of the primary and secondary synchronization report a duplicate key error and stop.

Data inconsistencies are easier to spot when using a row format binlog.

The availability priority policy for the active/standby switchover may cause data inconsistency. Therefore, in most cases, it is recommended that you use a reliability-first policy.

5 Failover in the case of reliability priority

Assume that the primary/standby delay between primary library A and secondary library B is 30 minutes. At this time, primary library A loses power and the HA system switches to B as the primary library.

If the reliability priority policy is adopted, the switchover can be performed only after seconds_behind_master=0 of secondary library B. It’s not that the system is read-only and unwritable, it’s that the system is completely unusable.

Can I switch directly to standby library B, but keep it read-only? I can’t. During this period, the transfer log has not been applied. If the active/standby switchover is directly initiated, the client cannot see the previously completed transaction and considers that data is lost.

Therefore, the availability of the MySQL HIGH availability system is dependent on the master/slave delay. The smaller the delay, the shorter the service recovery time and the higher the availability in the event of a primary library failure.

6 thoughts on standby storage delay monitoring

In general, the current database operation and maintenance system has the standby database delay monitoring, which is implemented on the standby databaseshow slave statusTo gatherseconds_behind_masterThe value of the. If there is a standby library, its latency monitoring image looks like the following:

Synchronization of the standby is completely blocked during this time. There are two typical scenarios for this phenomenon:

Large transactions (including large table DDLS and many rows in a transaction);
The standby database has a long transaction, such as

begin; 
select * from t limit 1;
Copy the code

And then it stopped. At this point, the master database does an add-field operation on table T, which will be blocked by the standby database.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Analysis of MySQL high availability mechanism

1 Active/standby delay

2 Source of the active/standby delay

2.1 Poor performance of the machine where the standby database resides

2.2 High pressure of standby warehouse

2.3 large transaction

3 Reliability first policy

4 Availability First policy

5 Failover in the case of reliability priority

6 thoughts on standby storage delay monitoring

Analysis of MySQL high availability mechanism

1 Active/standby delay

2 Source of the active/standby delay

2.1 Poor performance of the machine where the standby database resides

2.2 High pressure of standby warehouse

2.3 large transaction

3 Reliability first policy

4 Availability First policy

5 Failover in the case of reliability priority

6 thoughts on standby storage delay monitoring

Related Posts

AWStats analysis system monitors Web sites, YYDS!!

Six Principles of Design Pattern – Open and Close Principle (1/6)

Quarkus subscribe (4) 🔥