Summary:

The MYSQL line format changes along with the file format, but Antelope is an early file format that includes COMPACT and REDUNDANT line formats; The other latest file format is Barracuda, which contains COMPRESSED and DYNAMIC row formats. Functions associated with COMPRESSED tables, efficient storage of off-page columns, and index key prefixes of up to 3072 bytes. I did not find the specific date when these two file formats were enabled and set as the default configuration on the official website. If any friend knows when the file format was used, you can comment and send it. I really did not find it, nor did I see it in the book. Innodb_file_format =Antelope, and mysql5.7 on another computer is Barracuda. As a general rule, 5.6 defaults to Antelope format and 5.7 to Barracuda. Innodb_file_format is a good format to use.

Here is a comparison of the actual file formats of 5.6 and 5.7:

Format:

The mind map for the file format looks like this:

Translation form from the official website comparison

Row format Compact storage features Enhanced variable-length column storage Large index key prefix support Compression support Supported table space types Required file format
REDUNDANT no no no no system, file-per-table, general Antelope or Barracuda
COMPACT is no no no system, file-per-table, general Antelope or Barracuda
DYNAMIC is is is no system, file-per-table, general Barracuda
COMPRESSED is is is is file-per-table, general Barracuda

REDUNDANT line format:

  • Redundant provides compatibility with older versions of mysql and is supported by both Antelope and Barracuda. A table using the REDUNDANT row format stores the first 768 bytes of variable-length column values (VARCHAR, VARBINARY, and BLOB and TEXT types) in index records inside b-tree nodes, with the rest stored on overflow pages. Fixed-length columns greater than or equal to 768 bytes are encoded as variable-length columns that can be stored off-page.
    • For example, if the maximum byte length of the character set is greater than 3, the CHAR(255) column can exceed 768 bytes, just as with UTF8MB4. CHAR(255) this 255 is the number of characters, uft8 format is three bytes is a character to represent Chinese, let’s say VARCHAR(10), so 10 times 3 is 30. If you use UTF8MB4 encoding, which is 4 bytes, it’s 10 times 4, and the number of bytes becomes 40. Remember that the vARCHar parentheses store the number of characters, not bytes. Therefore, if the value of the stored column is less than or equal to 768 bytes, mysql does not use overflow pages, and the data is stored directly in the MYSQL btree node, saving IO. Here is a quick mention of mysql row overflow, as shown in the figure below:

Blob, text, and VARCHAR are all stored in this way. If the data does not exceed 768 bytes, it is stored directly in the btree node. If the data exceeds 768 bytes, it is stored in the overflow page.

  • Storage structure:

  • Header:

  • Storage characteristics:

  1. Each index record contains a 6-byte header. Headers are used to link consecutive records together and for row-level locking.
  2. The record in the clustered index contains fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte scroll pointer field.
  3. If no primary key is defined for the table, each clustered index record also contains a 6-byte row ID field.
  4. Each secondary index record contains all primary key columns defined for clustered index keys that are not in the secondary index.
  5. A record contains Pointers to each field of the record. If the total field length of a record is less than 128 bytes, the pointer is 1 byte; Otherwise, two bytes. Pointer arrays are called record directories. The area to which the pointer points is the data part of the record.
  6. Internally, fixed-length character columns such as CHAR(10) are stored in a fixed-length format. Trailing whitespace is not truncated from VARCHAR columns.
  7. Fixed-length columns greater than or equal to 768 bytes are encoded as variable-length columns that can be stored off-page. For example, if the maximum byte length of the character set is greater than 3, the CHAR(255) column can exceed 768 bytes, just as with UTF8MB4.
  8. SQL NULL values retain one or two bytes in the record directory. If SQL NULL values are stored in variable-length columns, zero bytes are retained in the data portion of the record. For a fixed-length column, the fixed-length of that column is retained in the data portion of the record. Reserving fixed space for NULL values allows columns to be updated in place from NULL to non-NULL values without causing index page fragmentation.

COMPACT row

  • Introduction: Compared to the REDUNDANT row format, the COMPACT row format reduces row storage space by approximately 20% at the cost of increased CPU usage for some operations. If the workload is typically limited by cache hit ratio and disk speed, the COMPACT format may be faster. If your workload is CPU-speed constrained, the compact format may be slower. In short, Compact is better for IO intensive tasks and worse for computationally intensive tasks.
  • Storage structure:
  • Header:
  • Storage characteristics:
  1. Each index record contains a 5-byte header that may be preceded by a variable length header. Headers are used to link consecutive records together and for row-level locking.
  2. The variable-length portion of the record header contains a bit vector to indicate the NULL column. If the number of columns in the index that can be NULL is N, the bit vector occupies CEILING(N/8) bytes. (For example, if there are 9 to 16 columns that can be NULL, the bitvector uses two bytes.) NULL columns do not occupy space beyond the bits in this vector. The variable-length section of the header also contains the length of the variable-length column. Each length requires one or two bytes, depending on the maximum length of the column. If all columns in the index are not NULL and have fixed length, the record header has no variable length part.
  3. For each non-NULL variable-length field, the record header contains a column length of one or two bytes. Two bytes are required only if part of the column is stored externally in an overflow page or if the maximum length exceeds 255 bytes and the actual length exceeds 127 bytes. For externally stored columns, the 2-byte length represents the length of the internal storage portion plus a 20-byte pointer to the external storage portion. The inner part is 768 bytes, so the length is 768+20. A 20-byte pointer stores the true length of the column.
  4. Records the data content of the non-null column following the header.
  5. The record in the clustered index contains fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte scroll pointer field.
  6. If no primary key is defined for the table, each clustered index record also contains a 6-byte row ID field.
  7. Each secondary index record contains all primary key columns defined for clustered index keys that are not in the secondary index. If any primary key columns are variable-length, the record header of each secondary index has a variable-length section to record their length, even if the secondary index is defined on a fixed-length column.
  8. Internally, fixed-length character columns such as CHAR(10) are stored in fixed-length format for non-variable-length character sets. Trailing whitespace is not truncated from VARCHAR columns. Internally, for variable-length character sets such as UTF8MB3 and UTF8MB4, InnoDB tries to store CHAR(N) in N bytes by trimming trailing Spaces. If the CHAR(N) column value is longer than N bytes, the trailing whitespace is trimmed to the minimum length of the column value in bytes. The maximum length of a CHAR(N) column is the maximum character length x N. At least N bytes are reserved for CHAR(N). In many cases, reserving minimum space N allows column updates to be done in place without causing index page fragmentation. In contrast, the CHAR(N) column takes up the maximum character byte length x N when using the REDUNDANT row format. Fixed-length columns greater than or equal to 768 bytes are encoded as variable-length fields that can be stored off-page. For example, if the maximum byte length of the character set is greater than 3, the CHAR(255) column can exceed 768 bytes, just as with UTF8MB4.

DYNAMIC line format:

  • Description: DYNAMIC row format provides the same storage features as COMPACT, but with enhanced storage for long variable-length columns and support for large index key prefixes, and can be considered an upgraded version of COMPACT. Barracuda file format supports DYNAMIC line format (Antelope does not). When creating tables using ROW_FORMAT=DYNAMIC, InnoDB can store long variable length column values completely off-page (for VARCHAR, VARBINARY and BLOB and TEXT types), and the clustered index record contains only a 20-byte point to the overflow page. Fixed-length fields greater than or equal to 768 bytes are encoded as variable-length fields. (You can see that redundant uses 20 bytes to point to an overflow page for variable-length field Dynamic, redundant stores rows for 768, and Redundant stores rows for fixed-length field Dynamic for 768.) Whether a column is stored out of a page depends on the page size and total row size. When a row is too long, the longest column is selected for off-page storage until the clustered index record fits the B-tree page. TEXT and BLOB columns less than or equal to 40 bytes are stored in rows. The DYNAMIC row format maintains the efficiency of storing entire rows in index nodes when appropriate (as with COMPACT and REDUNDANT formats), but the DYNAMIC row format avoids the problematic long columns of filling B-tree nodes with large numbers of data bytes. The DYNAMIC line format is based on the idea that if part of a long data value is stored off-page, it is usually most efficient to store the entire value off-page. For DYNAMIC formats, shorter columns may be retained in b-tree nodes to minimize the number of pages needed to overflow for a given row. The DYNAMIC line format supports up to 3072 bytes of index key prefixes. This function is controlled by the innodb_large_prefix variable and is enabled by default. For more information, see innodb_large_prefix variable description. Tables using the DYNAMIC row format can be stored in system table Spaces, per-table file table Spaces, and generic table Spaces. To store DYNAMIC tables in system tablespaces, disable innodb_file_per_TABLE and use regular CREATE TABLE or ALTER TABLE statements. Or use the TABLESPACE [=] Innodb_SYSTEM TABLE option with CREATE TABLE or ALTER TABLE. The innodb_FILe_per_TABLE and innodb_FILe_format variables are not available for normal tablespaces, nor for storing DYNAMIC tables in system tablespaces using the INNOdb_SYSTEM table option [=].

  • Storage features: Like Compact, a variant of Compact

Line COMPRESSED format

  • Description: COMPRESSED row provides the same storage features and functions as DYNAMIC row, but adds support for table and index data compression. The Barracuda file format supports COMPRESSED lines. The COMPRESSED row format uses similar out-of-page storage for internal details as the DYNAMIC row format, but the additional storage and performance considerations of table and index data are being COMPRESSED and used for smaller page sizes. For the COMPRESSED row format, the KEY_BLOCK_SIZE option controls how many columns are stored in the clustered index and how many columns are placed on the overflow page. The COMPRESSED format supports a maximum of 3072 bytes of index key prefixes. This function is controlled by the innodb_large_prefix variable and is enabled by default. For more information, see innodb_large_prefix variable description. Tables that use a COMPRESSED row format can be created in either a file-per-table tablespace or a generic tablespace. The system tablespace does not support COMPRESSED rows. To store a COMPRESSED table in a file-per-table tablespace, innodb_file_per_table must be enabled and innodb_file_format must be set to Barracuda. Innodb_file_per_table and innodb_file_format variables are not available for general table Spaces. Generic table Spaces support all row formats, but it is important to note that compressed and uncompressed tables cannot coexist in the same generic table space due to different physical page sizes.

  • Ps: COMPRESSED is different from compact in that the data stored in it is COMPRESSED in zlib format to save space.

References:

  1. MySQL InnoDB row format (ROW_FORMAT)
  2. Mysql5.7 official website file format
  3. Mysql5.7 line format
  4. MySQL Technology Insider