What are these two?
Binlog
Binlog is the logical log of the database Service layer.
Let’s discuss the concepts of the database Service layer and engine layer:
- Normally, the services that provide the database connection and SQL execution environment to parse SQL belong to the Service layer
- The engine layer is responsible for actually interfacing with the operating system’s file system and persisting data in the file system.
redolog
Redolog is innoDB engine’s own log. This log is “physical” and records data Page changes. It’s actually logical.
-
Why Binlog and Redolog? A: As mentioned above, Binlog is the mysql logical layer log and redolog is the InnoDB engine log.
-
Why redolog is both physical and logical A: In my personal understanding, physical means relative to the Service layer and features of page-based logging, while logical means that relog also records relative changes to A row, not A mirror image of A point in time.
-
Innodb caches data in memory for performance, but writes redolog to disk for consistency. What’s the point? A: From my personal understanding, logging is sequential writing, not random writing, so performance (especially on A mechanical hard drive) is better than not using memory caching.
Redolog file storage
The minimum read/write unit of a disk is a Block
File system/physical disk blocks:
Each Block is 512 bytes (B) (the Block size of each operating system file system can be customized). The first 12 bytes are used to store block-related information, and the last 4 bytes are used to store verification information
- For example, the first bit of the first byte is Flush Flag to see if the Block is being written.
- For example, datalength, redolog logs are written continuously to disk, so all datalength should be full except for the last block.
Redolog file layer organization
REDO logfile ib_logfile0, ib_logfile1… A name. In order to avoid requesting storage space, log files are written in advance, and in practice they are overwritten as a circular list of end to end writes.
Four blocks are reserved at the beginning of each log file to record additional information. The first Block is called a Header Block, and the next three blocks are used to store Checkpoint information in the 0 file, but empty in other files:
Calculation of real-time position/displacement of the current log
Sn is the globally unique logical offset when writing to redolog
It can be interpreted as your position in the class square
LSN is the real physical offset of Redolog, so to count the header and tail information of each block, as well as the length of the header information of the current block, the length is constant, just need to count the number of blocks:
constexpr inline lsn_t log_translate_sn_to_lsn(lsn_t sn) {
return (sn / LOG_BLOCK_DATA_SIZE * OS_FILE_LOG_BLOCK_SIZE +
sn % LOG_BLOCK_DATA_SIZE + LOG_BLOCK_HDR_SIZE);
}
Copy the code
LSN is the desired sequential displacement of all blocks. To obtain the physical offset of the current log pointer across the entire Redolog file block, add the headers of each file, etc.
The first block in each log file stores some information, including the LSN of the first log block in the log file, which is called current_FILe_Lsn. In this case, Use the current LSN-current_FILe_Lsn to get the relative offset (cursor position) of the current log in the file, and add current_FILe_offset to get real_offset. If the REDO files at the top of each log file are viewed as one large file, the real_offset is the offset in the large file.
I can understand the meaning of each offset, but I don’t know why these three offsets are used because I haven’t read the entire source code or the entire innoDB architecture.
Redolog’s write process
- Generate Redolog based on user SQL
- Write to InnoDB Log Buffer
- Write to the operating system’s Page Cache
The generation of Redolog
An atomic operation (transaction) may involve multiple rows of operations, so multiple RedolOgs may be generated. To ensure that all redolOgs for a transaction are continuous throughout the log, redologs for a single atomic operation are first passed through the Min-Transaction cache and written to the LogBuffer after they are all generated
Written to the Log Buffer
LogBuffer supports concurrent writes because in high concurrency scenarios, locking to limit Buffer writes can become a performance bottleneck.
To support concurrency and ensure that multiple REdologs per atomic operation are contiguous, calculate the length of each min-transaction(MTR) to obtain exclusive buffer space.
It can be imagined that although MTR is continuously allocated in this buffer, there must be a lot of MTR that have been given buffer space, but no data is actually written or not all written into the buffer, as shown in the figure below:
In fact, the upper part of this figure does not refer to the log buffer array, but the index of the buffer. Each element stores the LSN and length of the MTR in the buffer. The reason for this index will be explained below.
Write the Page Cache
When writing to the file system of the operating system, for efficiency, it is necessary to write the first consecutive part of the log buffer, and ignore the MTR which has already allocated space but has not finished writing.
To find the current buffer location (buf_READY_FOR_write_LSN above) that has been completely written consecutively, you need to use the index array described above.
The index array (buf_link) is actually a cyclic array, and sn subscripts the result of modulo the length of the array
flush disk
After writing to the PageCache, the special thread log_flusher calls fsync to notify the operating system of writing to the disk.
The resources
Database kernel monthly report
My solution of InnoDB REDO LOG | CatKang blog