MySQL supports a large number of data types, and choosing the right type is critical to performance. This article introduces the data type selection principles of MySQL. You can use these basic principles to determine the specific data types of data table fields.
Small and beautiful
In general, use data types with a small storage footprint whenever possible to store data. This type of data is also usually faster, takes up less disk space, memory, and even caching, and consumes fewer CPU cycles.
However, it is important to accurately estimate the range of data values to be stored. Because scaling data across multiple parts of the data table structure can be a painful and time-consuming process. If in doubt about which data type is appropriate, choose the smallest type that you think will be within range (this can be adjusted early in the system or when the data table is low).
Simple is the highest
Simpler data types mean fewer CPU cycles for processing data. For example, integers are easier to process than characters because character sets and alignments make character comparisons more complicated. Two examples: you should use MySQL’s built-in types to store times and dates, not strings. IP addresses should also be stored as integers.
Avoid empty value
Many tables have nullable columns, although there is no need to store the default value NULL in the application. In general, it is better to specify a NOT NULL column than to store NULL.
MySQL is more difficult to optimize for columns that involve nullable columns because nullable columns complicate indexes, index statistics, and value comparisons. Furthermore, nullable columns take up more storage space and require special processing. Specifying an index on a nullable column can require an extra byte per index entry and even cause the MyISAM engine to convert fixed-size indexes to variable-size indexes (such as single-column indexes on integer fields). However, the performance improvement from converting NULL columns to NOT NULL columns is usually modest. Therefore, unless NULL columns are found to have a significant impact on performance, do not make changes to existing table structures a priority. However, if you need to index a column, you should avoid the possibility that the column value can be NULL, and it is generally a good practice to set the column NOT NULL directly.
Of course, there are exceptions, such as the fact that InnoDB uses only one bit to store NULL values, thus saving space for large data stores, but this is not the case with MyISAM.
Step for selecting a data type
The first step in selecting a data type is to decide which common data type to use to represent a data column, whether numeric, string, or time. It’s usually good to choose directly, but in some cases there are exceptions (amount, timestamp).
The second step is to choose a specific type. MySQL can store the same data type in multiple ways, based on the value range, precision, and physical storage space, and some data types have special attributes.
For example, DATETIME and TIMESTAMP can both store times and dates, both accurate to the second. However, the TIMESTAMP type requires half the storage space, includes time zone information, and supports automatic updates. But on the other hand, its storage time range is smaller, and these special features can become a hindrance.
Let’s look at the basic data types. MySQL supports aliases for data types such as INTEGER, BOOL, and NUMERIC. These are just aliases, and while they may seem confusing, they actually have no effect on performance. If you CREATE a TABLE using an alias data type, recall that using SHOW CREATE TABLE, you can see that MySQL actually converts to an underlying data type, not an alias.
Conclusion: There are many ways to represent data in MySQL. It is recommended to know the storage range of common data types, the number of bytes they occupy, and choose the appropriate data type based on the range or length of data values estimated by the product as much as possible. Therefore, performance is important from the beginning of creating tables. The cost of late adjustment often exceeds the time cost of careful thinking at the beginning of the design.