directory
- Binlog write mechanism
- Redo log writing mechanism
- The group commit mechanism implements a lot of TPS
- Understanding WAL
- How can I improve I/O performance bottlenecks
WAL ensures that data can be recovered after an abnormal MySQL restart as long as redo logs and binlogs are persisted to disk.
MySQL writes binlog and redo log files.
Binlog write mechanism
1. During transaction execution, the log is written to the binlog cache first. When the transaction is committed, the binlog cache is written to the binlog file.
The binlog cache is allocated to each thread. The binlog_cache_size parameter controls the size of the binlog cache for each thread. If this size is exceeded, the disk will be temporarily saved
When a transaction commits, the executor writes the entire transaction from the binlog cache to the binlog. And clear the binlog cache
4. Each thread has its own binlog cache and shares a binlog file
5. Write: logs are written to the page cache of the file system and are not persisted to disks, so the speed is relatively fast. Fsync in the figure is to persist data to disks, consuming IOPS of disks
When to write and fsync is controlled by the sync_binlog parameter:
1. When sync_binlog = 0, each committed transaction is only written, not fsync; 2. When sync_binlog = 1, fsync will be performed for each committed transaction; 3. When sync_binlog = N (N>1), it indicates that every transaction is committed and fsync is performed after N transactions are accumulated.Copy the code
Sync_binlog controls how often binlog flusher disks. For very large I/OS, this number can improve performance, but it must be set to 1 if the fault tolerance rate is very low. (Sync_binlog is set to N. If the host restarts abnormally, the binlog of the last N transactions will be lost.
Redo log writing mechanism
During a transaction, redo logs are written to the redo log buffer.
The contents of the redo log buffer do not need to be persisted to disk every time they are generated.
If MySQL restarts unexpectedly during a transaction, this portion of the log is lost. Since the transaction is not committed, there is no loss of logs at this point.
It is also possible for redo log buffer logs to be persisted to disk when the transaction is not committed.
The three colors represent three possible redo log states:
1. There is a redo log buffer, which is physically stored in MySQL memory (red).
Write to disk, but do not persist (fsync), physical file system in the page cache, i.e. Yellow;
Persisting to disk (hard Disk in green)
The first two steps are write memory, and the last step is disk I/O, so persist the redo log to disk until the Page cache is large enough to not affect writing to the Page cache.
To control the redo log write policy, InnoDB provides the innodb_flush_log_at_trx_commit parameter, which has three possible values:
1, is set to0The redo log is left in the redo log buffer every time a transaction commits.2, is set to1The redo log is persisted directly to disk every time a transaction commits.3, is set to2The redo log is only written to the page cache each time a transaction commits.Copy the code
Unlike binlog, where each thread has a binlog cache, redo log is a redo log buffer shared by multiple threads.
InnoDB has a background thread that writes logs from the redo log buffer to the page cache of the file system every 1s, and then fsync persists to disk. Redo logs are also written directly to the redolog buffer, so redolog of uncommitted transactions may also be persisted to disk.
There are also two scenarios that cause redo logs of uncommitted transactions to be written to disk:
Case 1:
When the size of the redo log buffer reaches half of innodb_log_buffer_size, the background thread writes disks.
(This is just write, no fsync)
Case 2:
When a parallel transaction commits, persist the redo log buffer of that transaction to disk.
Half of transaction A is executed and part of the redo log is stored in the buffer. Transaction B commits and innodb_flush_log_at_trx_commit persists all logs from the redo log buffer to disk.
added
Redo log: prepare, write binlog, commit;
Set innodb_flush_log_at_trx_commit to 1, and the redo log falls in the prepare phase. Fsync is not required when redo logs are committed. Instead, they are written to the file system’s page cache.
The value of sync_binlog and innodb_flush_log_at_trx_COMMIT is set to 1. That is, before a transaction is committed, two flusher processes are required. One is the redo log(prepare stage) and the other is the binlog.
The group commit mechanism implements a lot of TPS
First, the concept of log sequence number (LSN) is introduced. LSN is monotonically incremented, and length is incremented each time a length redo log is written.
In the prepare phase, all three concurrent transactions (TRx1, TRx2, and TRx3) finished writing redo buffers and persisted them to disk.
The corresponding LSN is 50, 120, and 160.
Corresponding process:
1. Trx1 arrives first and is selected as the leader of the group;
2. By the time TRx1 starts to write disks, there are already three transactions in this group, and the LSN has become 160;
3, trx1 to write disk, LSN=160; When trx1 returns, all redo logs with LSN<= 160 are persisted to disk;
4. Trx2 and trx3 return directly.
Conclusion:
The more members you have in a group submission, the better the disk IOPS saving effect is. In the case of single threads, there can only be one persistent operation for one transaction
Two-phase commit | Two-stage commit refinement |
---|
This ensures that the binlog can also be submitted as a group. Because step3 is fast, fewer binlogs are collected. Therefore, binlog group submission is not as effective as redo log group submission.
Improved binlog effect:
--1. Binlog_group_commit_sync_delay :b indicates the number of microseconds before fsync is called;
--2. Binlog_group_commit_sync_no_delay_count: indicates the number of times before fsync is called;
Copy the code
Understanding WAL
WAL reduces disk writes, but redo log and binlog are written every time a transaction is committed.
So WAL benefits from two things:
Redo logs and binlogs are sequential. Sequential data writes are faster than random data writes
2. Group submission mechanism reduces disk IOPS consumption
Copy the code
How can I improve I/O performance bottlenecks
1. Set binlog_group_COMMIT_SYNc_delay and binlog_group_COMMIT_SYNc_NO_delay_count to reduce the number of disk writes in binlog. This approach is implemented based on an “extra deliberate wait,” which may increase the response time of the statement without losing data
2. Set sync_binlog to a value greater than 1 (100-1000). However, the binlog may be lost when the host is powered off
3. Set innodb_flush_log_at_trx_commit to 2. The host may be powered off and data may be lost