Question brief:

A week ago, one of the mysql servers had a hardware failure and went down. We submitted an application to the students who are in charge of this section, and they are responsible for applying to repair this server. After the server was fixed today, they turned it on. The four mysql instances on the server start up automatically after startup and start pulling the binlog of the master library. Due to the long downtime of this server, more logs are lost, and the binlog of the master library is pulled madly, resulting in the network problems of the master library.


Phenomenon:


First of all, we didn’t realize that it was caused by a broken server rebooting the main binlog library, because we didn’t know anything about this server, except that we reported repairing a server a week ago. We don’t know exactly what happened, whether it was fixed or turned on.

In such a situation, I suddenly heard from a student on the network that there was a machine of mysql with too much network traffic, which made the service feel very slow. It lasted for 17 minutes in total. In fact, there is no much clue.


Screen:


A look at processList, the full log, and the slow log showed no problems.


The monitor shows that the read I/OS of the server during that period increase sharply.

The history of processList shows that the user in the master/slave replication was waiting for NET for a period of time. The IP address of the slave server was found to be the one that broke down a week ago.


Conclusion:

There are 4 instances on this server. After the server is started, the mysql instance starts automatically and starts to pull binlogs to the master database. Each master database generates about 6GB of binlogs per day, and each 4 instances generates about 160 GB of binlogs per week.


Question:

1. When will the broken server be repaired, and when will it be turned on? We have no control over it, do not know, and do not pay attention to it

2. This kind of case is actually a very simple and typical case that may cause influence or failure. We did not alert to this phenomenon in advance. Hence the event

3. Lack of effective monitoring of network traffic


Solutions:

1. For all servers, mysql automatically starts when the server is powered on. After the server is powered on, instances are manually started and slave is stopped. (In this way, if there are many servers, it may be too troublesome, but it is better to record it in this way than to cause an impact)

2. Be aware of the problem and include it in your general knowledge base or workbook to avoid it.