MySQL supports a limited number of string data types, but there are many variations. After MySQL 5.0, each string column can have its own character set and collation rules, which makes data table design more complicated. This article describes how to select the CHAR and VARCHAR data table fields of character type.

VARCHAR and CHAR

VARCHAR and CHAR are the two main string types used to store characters. Unfortunately, because the implementation is dependent on the storage engine, it is difficult to explain how these strings are stored on disk and in memory, except for InnoDB and MyISAM, which are commonly used. If you are using other storage engines, you should read the storage engine documentation carefully.

VARCHAR stores variable-length strings and is the most commonly used character data type. VARCHAR requires less storage space than fixed-length types and uses as little storage space as possible (for example, space taken up by short strings). For MyISAM, if ROW_FORMAT=FIXED is specified when creating a table, it will use a FIXED amount of space to store the fields, resulting in wasted space. VARCHAR uses 1-2 extra bytes to store the length of the string: 1 byte when the maximum length is less than 255 bytes, and 2 bytes if more. Thus, VARCHAR(10) of the Latin character set uses 11 bytes of storage, while VARCHAR(1000) uses 1002 bytes of storage.

VARCHAR can improve performance because it saves space. However, due to the variable length, the storage space of the rows varies when the table is updated, which incurs some additional overhead. If the length of the row causes the original storage location to fail, different storage engines will do different things. For example, MyISAM may fragment rows, while InnoDB requires disk paging to store updated rows.

In general, it is cost-effective to use VARCHAR if the maximum column length is much higher than the average length (for example, optional remarks fields), and fragmentation is not a problem if update frequency is low. It is important to note that if you are using the UTF-8 character set, the actual byte length stored is character dependent. For Chinese, the recommended stored character set is UTF8MB4.

The length of the CHAR type is fixed, and MySQL allocates sufficient storage space for each field. When storing values of type CHAR, MySQL removes extra empty characters. Values are aligned with null characters for comparison. For short strings, CHAR is advantageous, and can be used if all values are nearly the same length. For example, it is more appropriate to use CHAR when storing the MD5 value of a user password, because MD5 is always of a fixed length. CHAR also has an advantage over VARCHAR for data types where field values change frequently because CHAR does not generate fragmentation. For very short data columns, CHAR is more efficient than VARCHAR, such as using CHAR(1) to store Y and N of logical values, in which case only 1 byte is required, as opposed to 2 bytes for VARCHAR. For example, it may feel strange to remove null characters:

CREATE TABLE t_char_varchar_test (
  id INT PRIMARY KEY,
  char_col CHAR(10),
  varchar_col VARCHAR(10));INSERT INTO t_char_varchar_test 
VALUES 
(1.'string1'.'string1'),
(2.' string2'.' string2'),
(3.'string3 '.'string3 ');
Copy the code

If we insert string2 into the table as shown above, leading Spaces in string2 are not removed. If we use CHAR, trailing Spaces in string3 are removed.

SELECT CONCAT("'", char_col, "'"), CONCAT("'", varchar_col, "'") 
FROM t_char_varchar_test WHERE 1
Copy the code

The result is as follows. You can see that the whitespace after CHAR string3 is removed, but VARCHAR is not. Most of the time this is fine, and in practice the trim function is often used to remove whitespace from both ends, but if you do need to store whitespace, be careful not to use the CHAR type:How data is stored is determined by the storage engine, and the storage engine handles data of fixed and variable lengths differently. The Memory engine uses fixed-size rows, so it needs to allocate as much storage space as possible — even if the data length is variable. However, alignment and null character truncation of strings is done by the MySQL server, so all storage engines are the same.

Similar to CHAR and VARCHAR, BINARY and VARBINARY are used to store BINARY byte characters. BINARY alignment uses the byte value of character 0 and is not truncated when retrieving the value. BINARY is more efficient if you need to use the byte value of a character instead of a character. This is because, on the one hand, you don’t need to consider case when comparing, and on the other hand, MySQL only compares one byte at a time.

Conclusion: In real table design, VARCHAR would be chosen in most cases, but VARCHAR requires an additional 1 or 2 bytes to store string length. It is important to note that it is best to limit the maximum field length in your application so that the data table can use as short a VARCHAR as possible to improve efficiency. Meanwhile, it is recommended to use the CHAR class to improve storage efficiency for character types with fixed length, very short length, or little length change.