Storage engine is the role of memory and disk, responsible for the data transmission between his two, in order to improve the transmission efficiency, puts forward the concept of “page”, which is in the general case, in order to reduce the number of IO, at least from a disk read 16 KB into memory, the contents of a minimum of 16 KB of memory content flushed to disk.
Pages: Data is divided into pages, using pages as the basic unit of interaction between disk and memory. InnoDB pages are typically 16 KB in size.
InnoDB row format
Row format: The way records are stored on disk is called row format.
Four row formats: Compact, Redundant, Dynamic, and Compressed.
# specify the line format command
REATE TABLE TABLE name (column information) ROW_FORMAT= Row format name ALTER TABLE name ROW_FORMAT= row format name mysql> CREATE TABLE record_format_demo (-> C1 VARCHAR(10), -> c2 VARCHAR(10) NOT NULL, -> c3 CHAR(10), -> c4 VARCHAR(10) -> ) CHARSET=ascii ROW_FORMAT=COMPACT; Query OK, 0 rows affected (0.03 SEC)Copy the code
InnoDB Compact row format
Records can be broken down into extra information and real data
# Record additional information
-
Side length field length list: Records the number of bytes occupied by the real data of a variable length field
Note: The variable-length field length list stores only the length of columns with non-NULL values, not the length of columns with NULL values
-
Null value list: Columns with null values are centrally managed and stored in the NULL value list
-
Record header: a fixed five-byte field containing the description of a record
# Real data
In addition to storing real data, hidden columns are added as follows:
The column name | Whether must | Take up the space | describe |
---|---|---|---|
row_id | no | 6 bytes | A row ID that uniquely identifies a record |
transaction_id | is | 6 bytes | Transaction ID |
roll_pointer | is | 7 bytes | Rollback Pointers |
Note that the actual columns are DB_ROW_ID, DB_TRX_ID, and DB_ROLL_PTR. We wrote row_id, transaction_id, and roll_pointer for aesthetic purposes.
# char(M) storage format
For CHAR(M) columns, the number of bytes used by the column in a fixed-length character set is not added to the variant-length field length list, as is the number of bytes used by the column in a variant-length character set
The row format of the table adopts the COMPACT experiment
create table record_test_1 ( id bigint, score double, name char(4), content varchar(8), extra varchar(16) )row_format=compact; INSERT INTO 'record_test_1' (' id ', 'score ',' name ', 'content ',' extra ') VALUES (1, 78.5, 'hash', 'wodetian', 'nidetiantadetian'); INSERT INTO 'record_test_1' (' id ', 'score ',' name ', 'content') VALUES (65536, 17983.9812, 'ZHX ', 'shin', 'nosuke'); INSERT INTO 'record_test_1' (' id ', 'score ',' name ', 'content ',' extra ') VALUES (NULL, -669.996, 'aa', NULL, NULL); INSERT INTO `record_test_1`(`id`, `score`, `name`, `content`, `extra`) VALUES (2048, NULL, NULL, 'c', 'jun');Copy the code
Current table structure
+ | | id score | name | content | extra 78.5 | | | | + 1 hash | wodetian | nidetiantadetian | | 65536 | | 17983.9812 ZHX | shin | nosuke | | NULL | | 669.996 aa | NULL | NULL | | 2048 | | NULL | NULL | c jun | +Copy the code
MySQL Server 5.7. Data\test: record test_1.ibd, Open it in hexadecimal with notepad++ (go to notepad++ and find the plugin -> hex-editor -> view in HEX, or install one if you don’t have one)
Let’s first look at the content and extra fields of the first record. The length of the fields is 8 and 16 respectively, and the character type is variable, so these two fields will be stored in the list of variable length fields.
And since it’s stored in reverse order, it should be 10, 08
# Redundant line format
I don’t understand. I’ll leave it off the record
## Row overflows data
# VARCHAR(M) Maximum amount of data that can be stored
To store a VARCHAR(M) column, you need three parts of storage space:
- Real data
- The length of bytes occupied by real data
- NULL indicates that the column may NOT have the space if it has the NOT NULL attribute
The first premise is that VARCHAR(M) columns can take up to 65535 bytes!
Now assume that the character set for the column is the ASCII character set (a character represents a byte) and that the table has only one column
- If the column is
not null
It is not necessary to spend a byte space to store null values, and the length of real data may occupy 2 bytes, that is, the maximum 65533 bytes can be used to store real data, and 65533 characters can be stored - If the column is not
not null
It takes a byte space to store null values, and the length of real data may occupy 2 bytes, that is, 65532 bytes can be used to store real data at most, and 65532 characters can be stored
But what if the character set is not an ASCII character set? As utf8? GBK?
The maximum number of bytes used by a character is not changed because the character set changes.
For example, if the character set is UTF8, a maximum of 3 bytes are used for a character, so in the above case, the maximum number of stored characters 65532/3=21844
Select * from GBK where M = 32766 and utF8 where M = 21844; Be sure to remember that all columns in a row (excluding hidden columns and record headers) cannot add up to more than 65535 bytes!
# Overflow caused by too much data in the record
In Compact and Reduntant row formats, for columns that occupy very large storage space, only part of the data for that column is stored at the recorded real data, and the rest is distributed in several other pages. Then the actual recorded data store the addresses pointing to these pages with 20 bytes (of course, these 20 bytes also include the number of bytes occupied by the data scattered in other pages), so as to find the page where the remaining data resides, as shown in the figure:
Finally, note that not only VARCHAR(M) columns, but other TEXT and BLOB columns can occur when very large amounts of data are storedLine overflow
# Critical point of row overflow
If the number of bytes stored in a column is n, then a row overflow should satisfy this formula:
136 + 2 x (27 + N) > 16384 N > 8098 Description: 136 bytes of additional information on a page + a row of records (27 bytes of additional information on a page + Nbytes of real data) x 2 > 16,384 bytes Maximum number of bytes stored on a pageCopy the code
Note:
136: In addition to storing our records, each page also needs to store some extra information. The extra information adds up to 136 bytes of space (just know that number for now), and the rest of the space can be used to store records.
2: store at least two rows of records on a page.
27: The additional information required for each record is 27 bytes.
16384:16384 bytes per page
But this is only assuming that a table has only one column. If there are more than one column, the calculation formula and the result must be changed
Conclusion: You don’t have to worry about what this critical point is, just know that if we want to store a large amount of data in a row, row overflow can occur.
Dynamic and Compressed row format
# Dynamic format
MySQL version 5.7 has a default row format called Dynamic. These two rows are similar to Compact except that they handle row overflow data differently. They do not store the first 768 bytes of the field’s real data in the recorded real data. Instead, store all the bytes in other pages, storing only the addresses of other pages where the real data is recorded, like this:
# Compressed
The difference between Dynamic and Compressed rows is that Compressed pages are Compressed using a compression algorithm to save space.