Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
I have talked about binlog and redo log, but not binlog buffer and redo log buffer, mainly because buffer is prone to interference in the understanding of the core context. But buffers are important, so let’s look at the relationship between logs and buffers.
Binlog profile
Binlog is used for replication. In master-slave replication, the secondary library uses the binlog on the primary library to replay data for synchronization.
Three formats
Binlog can be set in three formats:
STATEMENT
The STATEMENT format records logical SQL statements of logs.
-
Advantages: Small amount of data
-
Disadvantages: In order for the statement to run correctly on the slave, it is necessary to record some information about the execution of each statement to ensure that all statements get the same results on the slave as they do on the master. Some statements and functions, such as UUID, LOAD DATA INFILE, may cause DATA inconsistency or even errors during replication
ROW
In ROW format, binary logging no longer records simple SQL statements, but records table ROW changes.
-
Advantages: Log content clearly records the details of data modification in each row
-
Disadvantages: Large amounts of logs can be generated. For example, if you update a full table, every change needs to be logged
MIXED
In the MIXED format, MySQL uses the STATEMENT format by default to record binary log files. However, in some cases, MySQL uses the ROW format.
1) The storage engine of the table is NDB. In this case, DML operations on the table will be recorded in ROW format.
2) Use UUID (), USER (), cur-rent_user (), FOUND_ROWS (), ROW_COUNT () and other indeterminate functions.
3) INSERT DELAY statement was used.
4) User-defined functions (UDFs) are used.
5) Temporary tables are used.
To view
Log format
We can use the following command to set and view the format of binlog. This can be set by session level or global level.
mysql> set @@session.binlog_format='STATEMENT';
Query OK, 0 rows affected (0.00 sec)
mysql> select @@session.binlog_format;
+-------------------------+
| @@session.binlog_format |
+-------------------------+
| STATEMENT |
+-------------------------+
1 row in set (0.00 sec)
mysql> set global binlog_format='ROW';
Copy the code
File location
mysql> show variables like 'datadir';
+---------------+-----------------------+
| Variable_name | Value |
+---------------+-----------------------+
| datadir | /usr/local/var/mysql/ |
+---------------+-----------------------+
1 row in set (0.01 sec)
mysql> system ls -lh /usr/local/var/mysql/
total 447360
-rw-r----- 1 bytedance admin 5.9k 9 20 12:55 binlog.000001
-rw-r----- 1 bytedance admin 16B 8 13 21:12 binlog.index
Copy the code
Binlog. index indicates a binary index file, and binlog.000001 indicates a binary log file
The file content
The content of the binlog file is binary and cannot be viewed directly. You can use the mysqlbinlog tool provided by MySQL.
To view the STATEMENT command:
mysqlbinlog –start-position=5910 /usr/local/var/mysql/binlog.000001
View ROW format command:
mysqlbinlog -vv –start-position=3353 –stop-position=4478 /usr/local/var/mysql/binlog.000001
In the following example, you can see the specific statement.
➜ ~ mysqlbinlog -vv --start-position=3353 --stop-position=4478 /usr/local/var/mysql/binlog.000001
/ *! 50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/ *! 50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER / *! * /;
# at 156
#210813 21:12:05 server id 1 end_log_pos 125 CRC32 0xc29576e3 Start: binlog v 4, server v 8.023. created 210813 21:12:05 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/ *! * /;
BINLOG '
pW8WYQ8BAAAAeQAAAH0AAAABAAQAOC4wLjIzAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAClbxZhEwANAAgAAAAABAAEAAAAYQAEGggAAAAICAgCAAAACgoKKioAEjQA
CigB43aVwg==
'/ *! * /;
# at 3353
#210912 12:00:11 server id 1 end_log_pos 4478 CRC32 0x110c4a8c Query thread_id=21 exec_time=0 error_code=0 Xid = 134
use `testdb`/ *! * /;
SET TIMESTAMP=1631419211/ *! * /;
SET @@session.pseudo_thread_id=21/ *! * /;
SET @@session.foreign_key_checks=1The @@session.sql_auto_is_null=0The @@session.unique_checks=1The @@session.autocommit=1/ *! * /;
SET @@session.sql_mode=1168113696/ *! * /;
SET @@session.auto_increment_increment=1The @@session.auto_increment_offset=1/ *! * /;
/ *! \C utf8mb4 *// *! * /;
SET @@session.character_set_client=255The @@session.collation_connection=255The @@session.collation_server=255/ *! * /;
SET @@session.lc_time_names=0/ *! * /;
SET @@session.collation_database=DEFAULT/ *! * /;
SET @@session.explicit_defaults_for_timestamp=1/ *! * /;
/ *! 80011 SET @@session.default_collation_for_utf8mb4=255*// *! * /;
/ *! 80013 SET @@session.sql_require_primary_key=0*// *! * /;
CREATE TABLE `trace_sp_info2` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT 'on the ID',
`sp_id` bigint unsigned NOT NULL DEFAULT '0' COMMENT 'service id',
`sp_name` varchar(50) NOT NULL DEFAULT ' ' COMMENT 'Service Name',
`type` tinyint DEFAULT '0' COMMENT 'service',
`type_name` varchar(50) NOT NULL DEFAULT ' ' COMMENT 'Service Type Name',
`status` tinyint(1) NOT NULL DEFAULT '0' COMMENT 'State 0 inactive 1 active 2 inactive',
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
`create_by` varchar(50) NOT NULL DEFAULT ' ' COMMENT 'Founder',
`update_by` varchar(50) NOT NULL DEFAULT ' ' COMMENT 'Update person',
`update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update Time'.PRIMARY KEY (`id`),
UNIQUE KEY `uniq_spid` (`sp_id`),
KEY `idx_update_time` (`update_time`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COMMENT='service'
/ *! * /;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ / *! * /;
DELIMITER ;
# End of log file
/ *! 50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/ *! 50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
Copy the code
Redo log profile
Redo log can be used to make crash recovery, which is one of the important functions of database to ensure data security. It records physical information about changes made to each Page, usually including 2 log files by default, and can also be configured by command.
format
Log file. The size of each page is 512 bytes. The first four pages in the file are used to manage log content and overall database state. After 2KB (4 x 512 bytes) of content, it is normal to store the log content.
The format is shown in the figure below, where the value refers to the offset position of each item on the page.
Four header pages are displayed in blue, and common pages are displayed in yellow. In a normal page, there are 12 bytes of page header information, which is used to manage the data storage mode of the page itself.
LOG_BLOCK_HDR_NO: 4 bytes, a block number associated with the LSN.
LOG_BLOCK_HDR_DATA_LEN: This value is usually equal to 512-12 (the size of the 12-bit page header). Since logs are stored in contiguous blocks, there is no free space in between. Therefore, if the length is not 500, it indicates that the log scan has been completed (Crash Recovery work).
LOG_BLOCK_FIRST_REC_GROUP: 2 bytes indicating whether there is an MTR starting position in the current block. Since the amount of logs generated by an MTR may exceed the size of one block, if an MTR spans multiple blocks, this value indicates which block the MTR starts from. If 0, it means that the logs of the current block all belong to the same MTR; If the value is greater than 0 and less than the value represented by LOG_BLOCK_HDR_DATA_LEN above, it indicates that the log in the current block belongs to the two MTR, and the starting position of the MTR is the position represented by LOG_BLOCK_FIRST_REC_GROUP.
LOG_BLOCK_CHECKPOINT_NO: 4 bytes, which stores the sequence number of the checkpoint.
The following describes the LSN. The full name of the LSN is Log Sequence Number, which records the exact location of logs and increases continuously.
The organization structure of the log file is described above. Only the first 2KB is the log header, and all the rest are successive ones used to store the log pages generated by MTR.
MTR
MTR, also known as mini-Transaction, is called a “physical transaction” and is used to ensure the integrity of physical operations. For example, if a record is inserted into an underlying page, it is incomplete to the page if the header information is changed but the footer information is not changed, which is guaranteed by a transaction.
MTR has start and submit phases to ensure physical transaction consistency. Its role is shown in the figure below:
When a page is modified, an MTR generates a redo log record. When a MTR submits, the redo log record is copied to the redo log buffer. In addition, during MTR submission, an LSN will be generated for each log record, which determines its position in the log file. Therefore, data in a normal redo log page has an LSN.
The cache
Now let’s look at the relationship between binlog and redo log cache.
Write and fsync
How many steps does it take to write the data from buffer to disk? There are two main steps, write and fsync.
Write: Writes the logs in the buffer to the page cache of the file system. This operation does not persist the data to disks. Therefore, the operation is fast, but the logs are lost when the power failure occurs.
Fsync: Truly persisting data to disk without loss. In general, we consider fsync to account for disk IOPS.
Of course, the data in buffer will be lost if mysql restarts abnormally.
So the timing of write and fsync controls the persistence of binlog and redo log.
Binlog brush set
The writing logic of binlog is simple: During transaction execution, the log is written to the binlog cache first. When the transaction is committed, the whole transaction in the binlog cache is written to the binlog file.
The binlog cannot be broken apart, so be sure to write once, no matter how big the transaction is.
The sync_binlog parameter controls the write and fsync timing of binlog:
-
When sync_binlog=0, it indicates that only write, not fsync, is committed each time. If the host restarts abnormally, the unfsync binlog will be lost.
-
If sync_binlog=1, fsync will be performed every time a transaction is committed.
-
When sync_binlog=N(N>1), it indicates that every transaction is committed and fsync is performed after N transactions are accumulated. If the host restarts abnormally, the binlog of the latest N transactions will be lost.
Redo log brush plate
During a transaction, redo logs are written to the redo log buffer.
The innodb_flush_log_at_trx_commit parameter controls redo log write and fsync:
-
If the value is set to 0, the redo log is left in the redo log buffer every time a transaction is committed.
-
If the value is set to 1, the redo log is persisted to disk during each transaction.
-
If the redo log is set to 2, only the redo log is written to the Page cache each time a transaction is committed. The host restarts abnormally, resulting in data loss.
Of course, the binlog and redo log flush times are controlled by other configurations besides the parameters. For example, InnoDB has a background thread that writes logs from the redo log buffer to the page cache of the file system every second, and then calls fsync to persist the logs to disk.
However, these contents will affect our understanding, so this time mainly understand the brushing time in the submission process.
process
Update a row of data as shown in the following figure. The binlog cache and redo log cache are not included in this figure. Prepare, write binlog, and commit the redo log.
In the case that both SYNC_binlog and Innodb_flush_LOG_at_trx_COMMIT are set to 1, the update process is as follows:
Prepare and commit for the redo log phases:
-
Write and fsync operations were performed in the prepare phase, and the redo log was persisted
-
During the commit phase, only write is performed because the redo log was persisted during the prepare phase, but did not change its status to COMMIT. If an exception occurs, the data can be recovered according to the write status of the binlog. If you forget this part, you can revisit InnoDB redo, undo, binlog, how do they work together
As you can see, sync_binlog and Innodb_flush_log_at_trx_COMMIT control the write and fsync operations that are performed during the commit process.
In the case of double 1, that is, sync_binlog and innodb_flush_log_at_trx_COMMIT are both 1, MySQL persistence is the most timely. This means that a transaction must wait twice for a redo log and twice for a binlog before it is fully committed. The QPS of a transaction flush increases accordingly.
How to choose these two configurations depends on your specific business situation.
conclusion
I have a lot of things to do recently. It took me almost a week to write this article, and I was quite happy after it was finished. A few fuzzy knowledge points also comb clear to oneself before. I hope this article will help you.
data
-
InnoDB memory structure and features
-
MySQL shared table space concept
-
Linux IO synchronization: sync, fsync, and fdatasync
-
Mysql Binlog three formats in detail
-
MySQL binlog format parsing
-
MySQL redo log format parsing
-
Redo log Log file
-
Mysqlbinlog restores (redo) data based on an offset,–start-position,–stop-position
-
Mysqlbinlog skills
-
MySQL redo log and recover
The last
If you like my article, you can follow my public account (Programmer Malatang)
My personal blog is shidawuhen.github. IO /
Review of previous articles:
-
Design patterns
-
recruitment
-
thinking
-
storage
-
The algorithm series
-
Reading notes
-
Small tools
-
architecture
-
network
-
The Go