Recently, I just joined a new company and found some problems in the database design. Many database fields do NOT have NOT NULL, which is unbearable for patients with advanced obsessive-compulsive disorder, so I wrote this article.

Based on the current state of development, we will set all fields to NOT NULL and give default values.

In general, the default values are set like this:

  1. Integer, we usually use 0 as the default value.

  2. String, default empty string

  3. The default time format can be 1970-01-01 08:00:01 or 0000-00-00 00:00:00, but the connection parameter must be zeroDateTimeBehavior=convertToNull. It is recommended not to use the default time format

But consider why NOT NULL?

Mysql > select * from high performance Mysql

Try to avoid NULL

Many tables contain nullable (NULL) columns, even if the application does not need to save NULL, because nullable is the default attribute for columns. It is usually best to specify NOT NULL columns unless you really need to store NULL values.

If a query contains nullable columns, it is more difficult for MySql to optimize because nullable columns complicate indexes, index statistics, and value comparisons. Nullable columns use more storage space and require special handling in MySql. When nullable columns are indexed, an extra byte is required for each index record, and even fixed-size indexes (such as those with a single integer column) can become variable-size indexes in MyISAM.

The performance gain from changing nullable columns to NOT NULL is usually small, so there is no need to first look for and fix the condition in the existing schema unless you are sure it will cause problems. However, if you plan to build indexes on columns, you should try to avoid columns designed to be nullable.

There are exceptions, of course. For example, InnoDB uses a single bit to store NULL values, so it is very space-efficient for sparse data. But this does not apply to MyISAM.

The descriptions in the book address a number of major issues, and I’m going to leave MyISAM out of the equation here, but I’m going to use InnoDB as a consideration.

  1. NULL is the default value for a column unless NOT NULL is set, and should NOT be used unless it is required by itself
  2. Using NULL brings more problems, such as indexes, index statistics, more complex value calculations, and avoiding NULL columns if you use indexes
  3. In the case of indexed columns, there are storage space issues that require additional special handling and can lead to more storage footprint
  4. For better space efficiency, sparse data refers to the case where many values are NULL and only a few rows and columns have non-NULL values

The default value

For MySql, if NOT actively set to NOT NULL, the default value is NULL when inserting data.

NULL and NOT NULL have different meanings. NULL can assume that the value of the column is unknown, while NULL can assume that we know the value but it is empty.

For example, if a name field in a table is NULL, we can assume that we do not know what the name is, whereas if it is an empty string, we can assume that we know there is no name and it is a NULL value.

In the case of most programs, there is no special need for the field to be NULL. NULL values can cause problems for programs such as NULL Pointers.

For the current situation where most MyBatis are used, I suggest using the insertSelective method generated by default or pure manual write insertion method, which can avoid the default value invalid or insertion error caused by the new NOT NULL field.

Value calculation

The aggregation function is inaccurate

For null-valued columns, NULL values are ignored when using aggregate functions.

Now we have a table where the name field is NULL by default, and the count for name is 1, which is incorrect.

Count (*) counts the number of rows in the table, and count(name) counts non-null columns in the table.

= failure

For columns with a NULL value, the = expression cannot be used to determine the value. The following query for name is not valid, and must use is NULL.

Operates with other values

NULL operates on any other value, including the value of an expression.

Select * from user where age is NULL, name is NULL, and concat is NULL.

In the following example, any operation with NULL will result in NULL. Imagine designing a field that is NULL and performing all the operations on it.

Distinct, Group by, order by

For distinct and Group BY, all NULL values are treated as equal, and for Order BY, the ascending NULL is ranked first

Other problems

Select * from table where there is only one record with name. =a The expected result should be that the remaining two records do not match the expected result.

The index problem

To verify the effect of NULL fields on indexes, add indexes to name and age, respectively.

[3] [is NULL] [is NULL] [is NULL] [is NULL] [is NULL] [is NULL] [is NULL]

Then we insert some more data into the database to test, and when the NULL column value increases, we find that the index fails.

As we know, a query SQL execution might look like this:

The connector first connects to the specified database, then looks to see if the statement exists in the query cache and returns the result if it does.

If the cache is not hit, an analyzer is required to parse the SQL statement to determine whether it is valid.

Now when you go to the optimizer, you can choose what index to use and decide how to execute the SQL statement.

Finally, the executor is responsible for executing statements, querying with or without permission, and returning the execution result.

As you can see from the simple test results above, the presence of NULL index columns can cause the optimizer to make index selection more complex and difficult to optimize.

The storage space

A row in a database is also stored as a row in the final disk file. For InnoDB, there are four row storage formats: REDUNDANT, COMPACT, DYNAMIC, and COMPRESSED.

InnoDB’s default row storage format is COMPACT, as shown below. Dashed lines may not exist.

Variable-length field length list: If there are multiple fields, we store them in reverse order. We only have one field, so we don’t care about that much. The storage format is hexadecimal, if there are no variable-length fields, this part is not needed.

NULL value list: used to store the case that the value of our record is NULL. If there are more than one NULL value, it is also stored in reverse order and must be an integer multiple of 8bit. If there are less than 8bit, the high value of 0 will be added. 1 means NULL, 0 means not NULL. If both are NOT NULL then this exists.

ROW_ID: the unique identifier of a row. If no primary key is specified, the ROW_ID is automatically generated as the primary key.

TRX_ID: indicates the transaction ID.

ROLL_PRT: rollback pointer.

And then the value of each column.

To illustrate the problem with this storage format, I tested a table in which only c1 is NOT NULL and everything else is NULL.

List of variable field lengths: C1 and C3 fields are 1 and 2 respectively, so the length is 0x01 0x02 in hexadecimal and 0x02 0x01 in reverse order.

NULL list: there are NULL columns, so c2, C3, and c4 are 010, and the result is 00000010.

We’re not going to worry about the other fields, but the result of the last one is, of course, we’re not going to worry about the encoded result.

Instead, if we set all fields to NOT NULL and insert a, BB, CCC, DDDD, the format would look like this:

Although we found that NULL itself does not take up storage space, the presence of NULL takes up one more byte of flag bit space.

Article Reference Document:

  1. Dev.mysql.com/doc/refman/…
  2. Dev.mysql.com/doc/refman/…
  3. Dev.mysql.com/doc/refman/…
  4. Dev.mysql.com/doc/refman/…
  5. www.cnblogs.com/zhoujinyi/a…