Numeric types are one of the most common types used in table structure design, but it is not as easy as you might think to use them well. For example:
-
How to design an Internet massive concurrent business autoincrement primary key? INT is enough, okay?
-
How to design the balance of the account? Is it safe to use DECIMAL?
All wrong!
The number type seems simple, but it is easy to have the problem of “incomplete design thinking” in the design of table structure architecture (especially in the massive concurrent Internet scenario).
Numeric types
Integer types
The MySQL database supports the types of integers supported by the SQL standard: INT and SMALLINT. In addition, MySQL also supports int types such as TINYINT, MEDIUMINT, and BIGINT (Table 1 shows the storage space occupied by each type and the range of values) :
MySQL data type | Meaning (signed) |
---|---|
tinyint(m) | 1 byte range (-128 to 127) |
smallint(m) | 2 bytes range (-32768 to 32767) |
mediumint(m) | 3-byte range (-8388608 to 8388607) |
int(m) | 4-byte range (-2147483648 to 2147483647) |
bigint(m) | 8-byte range (+-9.22*10 ^ 18) |
In an integer type, there are signed and unsigned properties, which represent the range of values of the integer. The default is signed. For design purposes, I don’t recommend using unsigned properties because SQL may return results that are not what you want when you do some data analysis.
Consider an example of a “sale table” with the following table structure and data. Note here that column sale_count uses an unsigned attribute (that is, the column is designed to store values greater than or equal to 0) :
mysql> SHOW CREATE TABLE sale\G
*************************** 1. row* * * * * * * * * * * * * * * * * * * * * * * * * * *Table: sale
Create Table: CREATE TABLE `sale` (
`sale_date` date NOT NULL.`sale_count` int unsigned DEFAULT NULL,
PRIMARY KEY (`sale_date`))ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.00 sec)
mysql> SELECT * FROM sale;
+------------+------------+
| sale_date | sale_count |
+------------+------------+8000 | | 10000 | 2020-01-01 | | 2020-02-01 | | 12000 | 2020-03-01 | | 9000 | 2020-04-01 | | 10000 | 2020-05-01 | | The 2020-06-01 | 18000 | +------------+------------+
6 rows in set (0.00 sec)
Copy the code
Sale_date indicates the date of sales, and sale_count indicates the number of sales per month. Now there is a requirement that the boss wants to count the change in the number of sales from month to month to make a business decision. This SQL statement needs to be applied to non-equivalent joins, but it’s not too hard to write:
SELECT
s1.sale_date, s2.sale_count - s1.sale_count AS diff
FROM
sale s1
LEFT JOIN
sale s2 ON DATE_ADD(s2.sale_date, INTERVAL 1 MONTH) = s1.sale_date
ORDER BY sale_date;
Copy the code
However, during execution, sale_count uses an unsigned attribute and throws something like this:
ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(`test`.`s2`.`sale_count` - `test`.`s1`.`sale_count`)'
Copy the code
As you can see, MySQL is telling the user that the result of the calculation is out of range. MySQL requires unsigned values to remain unsigned if they are subtracted, otherwise an error will be reported.
Traction error: NO_UNSIGNED_SUBTRACTION (int, int, int, int, int, int, int, int, int, int, int, int)
mysql> SET sql_mode='NO_UNSIGNED_SUBTRACTION';
Query OK, 0 rows affected (0.00 sec)
SELECT
s1.sale_date,
IFNULL(s2.sale_count - s1.sale_count,' ') AS diff
FROM
sale s1
LEFT JOIN sale s2
ON DATE_ADD(s2.sale_date, INTERVAL 1 MONTH) = s1.sale_date
ORDER BY sale_date;
+------------+-------+
| sale_date | diff |
+------------+-------+| | 2020-01-01 | | 2020-02-01 | | 2000 | | 2020-03-01-4000 | | 2020-04-01 | | 3000 | | 2020-05-01-1000 | | 2020-06-01 | | - 8000 +------------+-------+
6 rows in set (0.00 sec)
Copy the code
Floating point type and high precision type
In addition to integer types, numeric types are commonly used as floating-point and high-precision types.
Floating point types Float and Double exist in previous versions of MySQL. However, these types are not high precision or SQL standard types, so they are not recommended in real production environments. Otherwise, errors may occur in calculation results due to accuracy types.
More importantly, starting with MySQL 8.0.17, when creating a table using the types Float or Double, the following warning is thrown: MySQL warns the user that the above floating point types should not be used, and even that floating point types will be deprecated in later versions
Specifying number of digits for floating point data types is deprecated and will be removed in a future release
Copy the code
The high-precision DECIMAL type in numeric types can be used, and precision and scale can (and usually must) be specified when declaring columns of this type, for example:
Salary is a DECIMAL (8, 2)Copy the code
Where, 8 is precision (precision indicates the main number of digits saved) and 2 is scale (scale indicates the number of digits saved after the decimal point). Typically in table structure design, type DECIMAL can be used to represent a user’s salary, account balance, and so on to two DECIMAL places.
However, for use in massively concurrent Internet business, the amount field design is not recommended to use DECIMAL, but rather to use INT (see why below).
Business table structure design practice
Integer type and increment design
In real business scenarios, the integer type is most commonly used in business to represent the number of items. For example, the sales quantity in the above table, or the inventory quantity and purchase times in the e-commerce, etc. Another common and important use of integer types in business is as the primary key of a table, which uniquely identifies a row of data.
The auto_increment function is implemented by combining the integer type with the attribute auto_increment. However, when designing the table structure, use the increment as the primary key.
-
Use BIGINT as the primary key instead of INT;
-
Self-increment is not persistent and backtracking is possible (prior to MySQL 8.0).
It can be seen from Table 1 that the maximum range of INT is at the level of 4.2 billion, which is easily reached in the application of real Internet business scenarios. For example, some flow tables, log tables, daily 1000W data volume, 420 days later, INT type upper limit can be reached.
Therefore, use BIGINT instead of INT for auto-incrementing primary keys. Do not use ints to save 4 bytes. When the limit is reached, changing the table structure will be a huge burden and pain.
This raises an interesting question: how will the database perform if the upper limit of ints is reached? Does it go back to 1? We can verify this with the following SQL statement:
mysql> CREATE TABLE t (
-> a INT AUTO_INCREMENT PRIMARY KEY
-> );
mysql> INSERT INTO t VALUES (2147483647); Query OK, 1 row affected (0.01sec) mysql>INSERT INTO t VALUES (NULL);
ERROR 1062 (23000): Duplicate entry '2147483647' for key 't.PRIMARY'
Copy the code
As you can see, a repeat error will be reported when the INT limit is reached and the MySQL database does not automatically reset it to 1.
The second issue to note is that before MySQL 8.0, increments are not persistent, and there may be backtracking problems with increments!
mysql> SELECT * FROM t;
+---+
| a |
+---+
| 1 |
| 2 |
| 3 |
+---+
3 rows in set (0.01 sec)
mysql> DELETE FROM t WHERE a = 3;
Query OK, 1 row affected (0.02 sec)
mysql> SHOW CREATE TABLE t\G
*************************** 1. row* * * * * * * * * * * * * * * * * * * * * * * * * * *Table: t
Create Table: CREATE TABLE `t` (
`a` int NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`a`))ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.00 sec
Copy the code
AUTO_INCREMENT=4; AUTO_INCREMENT=4; AUTO_INCREMENT=4 However, if the database is restarted at this time, the starting value of table T will change to 3 again after the database is started, that is, the self-increment backtracking occurs. The details are as follows:
mysql> SHOW CREATE TABLE t\G
*************************** 1. row* * * * * * * * * * * * * * * * * * * * * * * * * * *Table: t
Create Table: CREATE TABLE `t` (
`a` int NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`a`))ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.00 s
Copy the code
To solve this problem completely, there are two ways:
-
MySQL > update MySQL to 8.0;
-
If you cannot upgrade the database version, it is strongly not recommended to use the primary key of the increment data type in the core business table.
In fact, in the design process of massive Internet architecture, in order to improve the scalability of distributed architecture, it is not recommended to use the integer type as the main key, but rather the string type.
Capital field design
In the business design of user balance, fund account balance, digital wallet, change, and so on, since the fields are all fund fields, it is common practice for programmers to use DECIMAL type as the type of field selection because it is accurate to the point, e.g. : DECIMAL(8,2).
CREATE TABLE User (
userId BIGINT AUTO_INCREMENT,
money DECIMAL(8.2) NOT NULL,...)Copy the code
In the design standards for mass Internet services, the use of DECIMAL type is not recommended, but rather the conversion of DECIMAL to an integer type. In other words, it is more recommended to store funds in sub-units rather than in meta-units. For example, 1 dollar is stored in the database as an integer of type 100.
How do you define length if the value range of the amount field is represented by DECIMAL? Because the type DECIMAL is a variable-length field, defining DECIMAL(8,2) is not sufficient to define the amount field. This can only represent a maximum of 999,999,999.99, millions of funds stored.
The amount of the user should be stored in the field of at least ten billion, while the GDP amount field of the statistics bureau may reach hundreds of billions. Defined by type DECIMAL, it is not uniform.
It is also important to note that the type DECIMAL is a way of encoding through binary and is far less computationally efficient than integer types. Therefore, it is recommended to use BIG INT to store fields related to the amount.
BIG INT can store gigabit amounts even if the fields are stored separately. Here, 1 trillion is 1 trillion.
The advantage of this is that all the amount related fields are fixed-length fields, occupying 8 bytes, and storage efficiency. On the other hand, it’s more efficient to compute directly with integers.
Note that in our database design, we put a lot of emphasis on fixed-length storage because fixed-length storage provides better performance.
Let’s look at how records are stored in the database, roughly as follows:
If an update occurs, the original space of record 1 cannot accommodate the storage space of record 1 after the update. Therefore, the database will mark record 1 as deleted and search for new space for record 1 to use, for example:
In the figure above, * record 1 represents the space occupied by the original record 1, which will later become a fragmentation space and cannot be used unless the table space is artificially defragmented.
So how do we represent the data in the decimal point when we use BIG INT to store the amount field? In fact, this part can be handled and displayed by the front end. As the database itself, as long as the storage points can be.