define

Collate is used to determine the sequence of text fields such as text and vARCHar

It can be on the definition of fields (name, Nick), tables (Employee), as shown below

CREATE TABLE `employee` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `name` varchar(256) COLLATE utf8mb4_bin COMMENT 'name',
    `nick` varchar(256) COLLATE utf8mb4_bin COMMENT 'nickname'.PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
Copy the code

It can also be used on libraries

create database test default character set utf8mb4 collate utf8mb4_general_ci;

Or use it directly in a query statement

select id from employee order by name collate utf8mb4_general_ci

Collate takes effect in the following order:

  1. The SQL statement
  2. Column level
  3. Table level
  4. Repository level
  5. Instance level

Relationship between UTF8 and UTfMB4

  • Utf8 is 3 bytes long and UTfmb4 is 4 bytes long
  • Utfmb4 is fully compatible with UTF8, so use UTFMB4 whenever possible
  • Some expressions are expressed in 4 bytes and can only be expressed in UTFMB4

The difference between

The key point is whether the _CI suffix, case sensitive, is case-sensitive

  • Utf8mb4_bin: case sensitive, binary data storage
  • Utf8mb4_general_ci: case insensitive, may be inconsistent when sorting special characters, fast
  • Utf8mb4_unicode_ci: case insensitive, precise comparison, and slow speed

pit

Since the database default collation is UTF8MB4_general_CI (case insensitive), if not specified, inconsistent keys (size inconsistent) will conflict with strings as primary keys, as shown below

MySQL [test]> CREATE TABLE `business` (
	`business_id` varchar(32) COMMENT 'Business Unique Identifier'.PRIMARY KEY (`business_id`) 
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.02 sec)

MySQL [test]> insert into business values("key");
Query OK, 1 row affected (0.00 sec)

MySQL [test]> insert into business values("KEY");
ERROR 1062 (23000): Duplicate entry 'KEY' for key 'PRIMARY'
Copy the code

The solution

You can avoid this problem by changing the way the fields are sorted

MySQL [test]> alter table business modify column business_id varchar(32) collateutf8mb4_bin; GINE=InnoDB;
Query OK, 0 rows affected (0.02 sec)

MySQL [test]> insert into business values("KEY");
Query OK, 1 row affected (0.00 sec)
Copy the code

As you can see, after changing the sorting method of the field, you can insert it

After this problem occurs, you can only change the sorting mode from the field, instance, library, table modification can not affect the field sorting; When creating this field, it implicitly inherits the sorting mode of the upper level. When the upper level (instance, library, table) changes the sorting mode of the field, it does not affect the sorting mode of the field. Only the specified sorting mode of the field can be displayed

MySQL [test]> ALTER TABLE `business` CHARACTER SET utf8mb4 collate utf8mb4_bin;
Query OK, 0 rows affected (0.02 sec)

MySQL [test]> insert into business values("KEY");
ERROR 1062 (23000): Duplicate entry 'KEY' for key 'PRIMARY'
Copy the code

As shown above, even if we change the sorting mode of the table, the inconsistent keys will still collide, i.e., the sorting mode will not change

Select * from table_name where table_name is not sorted; select * from table_name where table_name is not sorted;

MySQL [tms_test]> show create table business;
+----------+---------------+
| Table    | Create Table                
+----------+---------------+
| business | CREATE TABLE `business` (
  `business_id` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT 'Business Unique Identifier'.PRIMARY KEY (`business_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin 
+----------+---------------+
Copy the code

reference

Mysql utF8 is different from UTF8MB4, utf8MB4_bin, UTF8MB4_general_ci, utF8MB4_unicode_ci

What is COLLATE in MYSQL?