Why do I need a prefix index

The problem

Have we encountered this problem when we index a field or multiple fields in a table?

Specified key ‘uniq_code’ was too long; max key length is 767 bytes.

The table structure is as follows:

create table `t_account`(
  `id` BIGINT(20) UNSIGNED NOT NULL auto_increment COMMENT 'on the ID'.  `date` varchar(50) NOT NULL DEFAULT ' ' COMMENT 'date'.  `nick_name` varchar(50) NOT NULL DEFAULT ' ' COMMENT 'nickname'.  `account` varchar(50) NOT NULL DEFAULT ' ' COMMENT 'account'. `city` varchar(100) NOT NULL DEFAULT ' ' COMMENT 'city'.. PRIMARY KEY (`id`),  UNIQUE KEY `uniq_code` (`nick_name`.`account`.`city`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Test'; Copy the code

why

The default innodb_large_prefix=0 in MySQL5.6 limits the single-column index length to 767bytes.

In MySQL5.7 the default innodb_large_prefix=1 removes the 767bytes limit, but the maximum single-column index length cannot exceed 3072bytes.

As for why it is 767 bytes, it depends on the specific storage engine implementation, find the official documentation, also did not say why. https://dev.mysql.com/doc/refman/8.0/en/create-index.html

The number of bytes used by varchar(n) depends on the character set:If the character type is GBK, each character occupies two bytes.If the character type is UTF8, each character contains a maximum of three bytes.If the character type is UTF8MB4, each character occupies a maximum of four bytesCopy the code

Here I set the encoding as UTF8MB4 encoding, a character is 4 bytes, and I created the index 50+50+100=200 characters, a total of 800 bytes, so it is out of length.

The utF8 character set is a maximum of 767bytes, but it is not necessary to set the value to VARCHAR (255). The length of each field depends on the business. Too long is not good for joint indexing.

The solution

  1. You can change the length of the field directly, or you can cancel some of the fields in the index, but this change is not friendly to the table itself.

  2. By limiting the first n characters of a field to the index, a specific value can be obtained by measuring the length of data in real business.

    UNIQUE KEY `uniq_code` (`nick_name`(20),`account`(20),`city`(20))
    Copy the code

    The first 20 characters of the three fields are the unique index, so that the length can not exceed, this is what we call a prefix index

  3. Changes the maximum length of a single index

    To change the index limit length, add the following content to the my.ini configuration file and restart:# change single-column index length to 767 bytes, single-column index length to 3072 bytes
    innodb_large_prefix=1
    However, after this parameter is enabled, dynamic storage or compression of tables must be enabled:The system variable innodb_file_format is BarracudaROW_FORMAT is DYNAMIC or COMPRESSEDCopy the code

How do I determine the length of the prefix index

We mentioned above that we can solve the problem of index length exceeding the limit by prefix index, but how can we determine the appropriate length of index field prefix?

Here we can determine the selectivity of the prefix index by calculating the selectivity as follows

Full column selectivity:

SELECT COUNT(DISTINCT column_name) / COUNT(*) FROM table_name;

The selection of a length prefix:

SELECT COUNT(DISTINCT LEFT(column_name, prefix_length)) / COUNT(*) FROM table_name;

The closer the prefix selectivity is to full column selectivity, the better the index performance is.

Advantages and disadvantages of prefix indexes

  • It takes up less space and is fast
  • Cannot use prefix index to do ORDER BY and GROUP BY
  • Unable to use prefix index for overlay scan
  • It is possible to increase the number of scanned rows

For example, add id card index, can add hash index or reverse order after storage prefix index.

Again, the creation of a federated index

The same rules apply when we are not sure which field should be the first column in a joint index on a table.

The following example selects the combination (customer_id, staff_id) while creating a federated index for customer_id and staff_id.

# staff_id_selectivity: 0.0001
# customer_id_selectivity: 0.0373
# COUNT(*): 16049 
The results show that customer_id is more selective, so you should select Customer_id as the first column in the federated index
SELECT 
 COUNT(DISTINCT staff_id)/COUNT(*) as staff_id_selectivity,  COUNT(DISTINCT customer_id)/COUNT(*) as customer_id_selectivity,  COUNT(*) FROM payment Copy the code

So said

The closer the index selectivity is to full column selectivity, the better the index performance.

That is, when this field is used to create an index, it becomes more distinct in the table data.

reference

Mysql > select index length from prefix

Some limits on index length in mysql – yuyue2014

MySQL types and data lengths – Nuggets


Seek attention, share, look at!! Your support is the biggest motivation for my writing.

This article is formatted using MDNICE