Create a Mysql table
CREATE TABLE `t1_null` (
`uuid` varchar(100) NOT NULL DEFAULT '',
`Column1` varchar(100) DEFAULT NULL,
`Column2` varchar(100) DEFAULT NULL,
PRIMARY KEY (`uuid`),
KEY `t1_null_Column1_IDX` (`Column1`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Copy the code
CREATE TABLE `t2_not_null` (
`uuid` varchar(100) NOT NULL DEFAULT '',
`Column1` varchar(100) NOT NULL DEFAULT '',
`Column2` varchar(100) NOT NULL DEFAULT '',
PRIMARY KEY (`uuid`),
KEY `t2_not_null_Column1_IDX` (`Column1`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Copy the code
The only difference between t1_NULL and T2_not_NULL is whether Column1 or Column2 is allowed. When we first joined Xiaomi, DBA required us to use NOT NULL DEFAULT “, but prohibited the use of DEFAULT NULL, so the actual DEFAULT “is really good? In terms of disk space and performance.
What’s better for disk space and performance?
8.4.1 Optimizing Data Size
Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column.
From High Performance MySQL, 3rd Edition
Avoid NULL if possible. A lot of tables include nullable columns even when the application does not need to store NULL (the absence of a value), Merely because it’s the default. It’s usually best to specify columns as NOT NULL unless you intend to store NULL in It’s harder for MySQL to optimize queries that refer to Nullable columns, because they make indexes, index statistics, and value comparisons more complicated. A nullable column uses more storage space and requires special processing inside MySQL. When a nullable column is indexed, it requires an extra byte per entry and can even cause a fixed-size index (such as an index on a single integer column) to be converted to a variable-sized one in MyISAM. The performance improvement from changing NULL columns to NOT NULL is usually small, So don’t make it a priority to find and change them on an existing schema unless you know they are causing problems. However, if you’re planning to index columns, avoid making them nullable if possible. There are exceptions, Of course. For example, it’s worth mentioning that InnoDB stores NULL with a single bit, So it can be pretty space-efficient for Sparselypopulated data. This doesn’t apply to MyISAM, though.
In InnoDB, ‘DEFAULT NULL’ fields will use 1 bit more space, the query needs to test whether each value is NULL.
As with most “what’s better for disk space and performance” questions: why don’t you just insert a million rows with NULL, test some queries, and check for disk space? Repeat with “”s, and once more with a relatively even mix. And the answer is much more reliable than what some random guy on SO says
How much does’ DEFAULT NULL ‘and’ NOT NULL DEFAULT ‘affect queries and storage?
1. Create t1_NULL and t2_not_NULL
2. Each table writes 1600W rows of data
3.SELECT VERSION() 8.0.23 No query cache, restart Mysql Server
Show variables like “%pro%”; If you see profiling OFF, set profiling = 1; Enable the function
5. In the SQL query, if the count(column) is NULL, the column will not be counted
show profiles;
Conclusion is
- Both tables take about the same time to traverse, and tables with default NULL are slightly faster
- On select Column1, default “” has a slight advantage.
- So, use DEFAULT NULL if you want.
- If the size of the Column is much smaller than that of the primary key, more columns will be read to buffer each time, reducing disk I/O times and increasing count() speed
Explain the query plan
explain select count(Column1) from mytable.t1_null;
1 SIMPLE t1_null NULL index NULL t1_null_Column1_IDX 403 NULL 15560054 100.00 Using index
explain select count(Column1) from mytable.t2_not_null;
1 SIMPLE t2_not_null NULL index NULL t2_not_null_Column1_IDX 402 NULL 15929685 100.00 Using index
explain select count(*) from mytable.t1_null;
1 SIMPLE t1_null NULL index NULL t1_null_Column1_IDX 403 NULL 15560054 100.00 Using index
explain select count(*) from mytable.t2_not_null;
1 SIMPLE t2_not_null NULL index NULL t2_not_null_Column1_IDX 402 NULL 15929685 100.00 Using index
Copy the code
The COUNT () function
Version 8.0 dev.mysql.com/doc/refman/… For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count. InnoDB does not keep an internal count of rows in a table because concurrent Transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT() statements only COUNT rows visible to the current transaction. As of MySQL 8.0.13, SELECT COUNT() FROM tbl_name query performance for InnoDB tables is optimized for single-threaded workloads if there are no extra clauses such as WHERE or GROUP BY. InnoDB processes SELECT COUNT() statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, InnoDB processes SELECT COUNT() statements by scanning the clustered index. Processing SELECT COUNT() statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table.If an approximate row count is sufficient, use SHOW TABLE STATUS. InnoDB handles SELECT COUNT() and SELECT COUNT(1) operations in the same way. There is no performance difference.
Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT() statements by scanning the clustered index. As of MySQL 5.7.18, InnoDB processes SELECT COUNT() statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.
By default, the smallest index can be used by a unit. If a unit does not have a secondary index, the unit will use the primary key index. If we can accept fuzzy count, we can use SHOW TABLE STATUS to get the approximate number of rows. We can use this method to optimize the count of large tables…
Explains why all the above explained results are using secondary indexes. T2_not_null_Column1_IDX traversal is more efficient than t1_null_Column1_IDX traversal (because the index takes up less disk space, Test is null is no longer required during traversal, but the opposite is true, which is a bit confusing
Select count(Column1) from myTable. t1_NULL force index(t1_null_Column1_IDX); select count(Column1) from mytable. t1_NULL force index(t1_null_Column1_IDX); 16804155 18.2s SELECT count(Column1) from mytable. T2_not_null force index(t2_not_null_Column1_IDX); 16820977 9.63 sCopy the code
Use the SQL
show variables like "%pro%";
set profiling = 1;
explain select count(*) from mytable.t1_null;
explain select count(*) from mytable.t2_not_null;
show profiles;
show profile for query 6;
show profile for query 7;
select count(*) from mytable.t1_null;
select count(*) from mytable.t2_not_null;
select count(Column1) from mytable.t1_null force index(PRIMARY);
select count(Column1) from mytable.t2_not_null force index(PRIMARY);
SHOW TABLE STATUS;
select count(Column1) from mytable.t1_null force index(t1_null_Column1_IDX);
select count(Column1) from mytable.t2_not_null force index(t2_not_null_Column1_IDX);
Copy the code