This is the 12th day of my participation in the August More Text Challenge. For details, see: August More Text Challenge

Two different character sets cannot have the same collation. Each character set has a default collation.

Most of this is from the official website documentation…

View the currently supported character set

INFORMATION_SCHEMA.CHARACTER_SETS

You can view the available character sets in the information_schema.Character_sets table.

The MAXLEN column indicates The maximum number of bytes required to store one character.

MariaDB [(none)]> select * from INFORMATION_SCHEMA.CHARACTER_SETS;
+--------------------+----------------------+-----------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION                 | MAXLEN |
+--------------------+----------------------+-----------------------------+--------+
| big5               | big5_chinese_ci      | Big5 Traditional Chinese    |      2 |
| dec8               | dec8_swedish_ci      | DEC West European           |      1 |
| cp850              | cp850_general_ci     | DOS West European           |      1 |
| hp8                | hp8_english_ci       | HP West European            |      1 |
| koi8r              | koi8r_general_ci     | KOI8-R Relcom Russian       |      1 |
| latin1             | latin1_swedish_ci    | cp1252 West European        |      1 |
| latin2             | latin2_general_ci    | ISO 88592 - Central European |      1 |. .Copy the code

Or to view the character set that contains UTF8, you can use the LIKE clause or the WHERE clause.

MariaDB [(none)]> SELECT * FROM INFORMATION_SCHEMA.CHARACTER_SETS
    ->   WHERE CHARACTER_SET_NAME LIKE '%utf8%';
+--------------------+----------------------+---------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION   | MAXLEN |
+--------------------+----------------------+---------------+--------+
| utf8               | utf8_general_ci      | UTF- 8 - Unicode |      3 |
| utf8mb4            | utf8mb4_general_ci   | UTF- 8 - Unicode |      4 |
+--------------------+----------------------+---------------+--------+
2 rows in set (0.000 sec)
Copy the code

SHOW CHARACTER SET

The SHOW CHARACTER SET statement is more commonly used to view supported CHARACTER sets than the information_schema.Character_sets table.

MariaDB [(none)]> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
| cp850    | DOS West European           | cp850_general_ci    |      1 |
| hp8      | HP West European            | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 88592 - Central European | latin2_general_ci   |      1 |. .Copy the code

Or to view the character set that contains UTF8, you can use the LIKE clause or the WHERE clause.

MariaDB [(none)]> SHOW CHARACTER SET LIKE '%utf8%';
+---------+---------------+--------------------+--------+
| Charset | Description   | Default collation  | Maxlen |
+---------+---------------+--------------------+--------+
| utf8    | UTF- 8 - Unicode | utf8_general_ci    |      3 |
| utf8mb4 | UTF- 8 - Unicode | utf8mb4_general_ci |      4 |
+---------+---------------+--------------------+--------+
2 rows in set (0.005 sec)
Copy the code

View the Collation rule currently supported

A given character set has at least one collation, and most character sets have more than one.

Use the information_schema. COLLATIONS table or the SHOW COLLATION statement to display COLLATIONS for a character set.

SHOW COLLATION Displays all COLLATION rules by default. You can use the WHERE and LIKE clauses to filter and display COLLATION rules of a specified character set.

View the collation of the default character set UTF8MB4.

MariaDB [(none)]> SHOW COLLATION WHERE Charset = 'utf8mb4';
+------------------------------+---------+------+---------+----------+---------+
| Collation                    | Charset | Id   | Default | Compiled | Sortlen |
+------------------------------+---------+------+---------+----------+---------+
| utf8mb4_general_ci           | utf8mb4 |   45 | Yes     | Yes      |       1 |
| utf8mb4_bin                  | utf8mb4 |   46 |         | Yes      |       1 |
| utf8mb4_unicode_ci           | utf8mb4 |  224 |         | Yes      |       8 |
| utf8mb4_icelandic_ci         | utf8mb4 |  225 |         | Yes      |       8 |. . .| utf8mb4_unicode_nopad_ci     | utf8mb4 | 1248 |         | Yes      |       8 |
| utf8mb4_unicode_520_nopad_ci | utf8mb4 | 1270 |         | Yes      |       8 |
+------------------------------+---------+------+---------+----------+---------+
33 rows in set (0.008 sec)
Copy the code

Why is UTF8MB4 recommended? Utf8 and UTF8MB4 character sets in MySQL

True support for storing Unicode characters

The UTF8 encoding of MySQL/MariaDB is not really UTF-8 encoding. In MySQL/MariaDB, utF8 only supports a maximum of 3 bytes, as seen in the query above.

In utF8 encoding, simplified Chinese usually takes up 3 bytes, while Chinese using extended area takes up 4 bytes. Utf-8 encoding is a variable length encoding. English characters take up one byte, and other complex characters take up two to four bytes.

However, currently commonly used emojis, traditional characters, special characters, and unusual Chinese characters take up four bytes. And that leads to,

If you use UTF8 encoding in MySQL/MariaDB, you will not be able to store 4 bytes of Chinese characters, emojis and other emojis. MySQL/MariaDB utF8 encoding takes up to 3 bytes for historical reasons.

Utf8mb4 is the true UTF-8 encoding and can support up to 4 bytes.

In practice, it is recommended to use the UTF8MB4 encoding character set, which is the true Unicode encoding. This is especially true in contexts where characters such as Chinese and emoji are used. If you don’t, you’ll get Incorrect string values with Incorrect bytes.

MySQL 8.0, the default character encoding has been changed to UTf8mb4, MariaDB 10.0 is still latin1.

Utf-8 encoding is U+2528D. Characters belonging to the CJK Unified Ideographs Extension B character set are in the second auxiliary plane (SIP, ideographic supplementary plane) and support a maximum of 4 bytes.

Mysql’s UTF8 encoding is a common basic multilingual plane (BMP, that is, Unicode in the 0000-FFFF range) of characters, and supports up to three bytes.

Specifies the character set and collation rules

Specified when creating the database

The syntax for specifying the character set and collation when creating a database is as follows:

CREATE DATABASE IF NOT EXISTS database_name 
    DEFAULT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;
Copy the code

Create and view collations and character sets for a database:

CREATE DATABASE IF NOT EXISTS CHARSETTest 
    DEFAULT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- View database
show create database CHARSETTest;

- return
-- CREATE DATABASE `CHARSETTest` /*! 40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */
Copy the code

Or:

CREATE DATABASE IF NOT EXISTS CHARSETTest1 
    DEFAULT CHARSET latin1 COLLATE latin1_swedish_ci;

-- View database
show create database CHARSETTest1;
- return
-- CREATE DATABASE `CHARSETTest1` /*! 40100 DEFAULT CHARACTER SET latin1 */
Copy the code

Specifies the character set and collation rules for tables and columns

You can create a table by specifying the character set and collation: DEFAULT CHARSET= UTF8MB4 COLLATE= UTF8MB4_unicode_CI. If no character set is specified, the character set and collation rules of the database are used by default.

You can also specify the collation rule for the current column after the column name, for example, ID char(4) COLLATE UTF8MB4_unicode_CI.

CREATE TABLE `Product` (
  `id` char(4) COLLATE utf8mb4_unicode_ci NOT NULL,
  `name` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,
  `type` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL,
  `sale_price` int(11) DEFAULT NULL,
  `purchase_price` int(11) DEFAULT NULL,
  `regist_date` date DEFAULT NULL.PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Copy the code

Run SHOW CREATE TABLE tablename; Statement, you can see the full statement that created the table and column, and the character set and collation rules used in it.

Accordingly, the ALTER TABLE statement can be used to change the character encoding used for columns.

Change the database character set and collation rules

ALTER DATABASE <db_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Copy the code

Example:

MariaDB [test]> ALTER DATABASE CHARSETTest1 CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Query OK, 1 row affected (0.008 sec)

MariaDB [test]> show create database CHARSETTest1;
+--------------+-----------------------------------------------------------------------------------------------------+
| Database     | Create Database                                                                                     |
+--------------+-----------------------------------------------------------------------------------------------------+
| CHARSETTest1 | CREATE DATABASE `CHARSETTest1` / *! 40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */ |
+--------------+-----------------------------------------------------------------------------------------------------+
1 row in set (0.006 sec)
Copy the code

Change the character set of the table

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Copy the code

Example:

MariaDB [test]> ALTER TABLE OrderTest CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Query OK, 4 rows affected (0.035 sec)
Records: 4  Duplicates: 0  Warnings: 0

MariaDB [test]> show create table OrderTest;
+-----------+------------------------------------------------------------------------------------------------------------ ------------------------------------------+
| Table     | Create Table                                                                                                                                         |
+-----------+------------------------------------------------------------------------------------------------------------ ------------------------------------------+
| OrderTest | CREATE TABLE `OrderTest` (...) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
+-----------+------------------------------------------------------------------------------------------------------------ ------------------------------------------+
1 row in set (0.002 sec)
Copy the code

Changes the collation of a column

ALTER TABLE t_name MODIFY c_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL;
Copy the code

View character encodings or collations for different objects

SHOW VARIABLES Displays the collation rules set by the MySQL system

The SHOW VARIABLES system variable can return the default Settings for each level associated with the collation.

Display the Settings for all variables related to the character set and collation as follows:

MariaDB [(none)]> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_general_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.001 sec)
Copy the code

The last three rows are the default collation for server, database, and connection, respectively.

SELECT @@collation_server;

View the character set and collation of the current database

The global variable

use <db_name>;
SELECT @@character_set_database, @@collation_database;
Copy the code

Check as follows:

MariaDB [test]> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| latin1                   | latin1_swedish_ci    |
+--------------------------+----------------------+
1 row in set (0.004 sec)
Copy the code

If no database is specified, the MySQL system-level character set and collation rules are returned.

Using information_schema. Schemata

You can query the information_schema.schemata table or obtain the collation rules of the database. You can directly view the collate of the specified database without switching databases.

As follows:

MariaDB [(none)]> SELECT
    ->    default_character_set_name,
    ->    default_collation_name
    -> FROM information_schema.schemata
    -> WHERE schema_name = 'test';
+----------------------------+------------------------+
| default_character_set_name | default_collation_name |
+----------------------------+------------------------+
| latin1                     | latin1_swedish_ci      |
+----------------------------+------------------------+
1 row in set (0.001 sec)
Copy the code

Use environment variables

USE db_name;
show variables like "character_set_database";
-- or:
-- show variables like "collation_database";
Copy the code

View the collation of the specified table

Create a table to store ABab. Perform the following operations based on the preceding information.

CREATE TABLE OrderTest(
  letter char(1) NOT NULL
);
INSERT INTO OrderTest values('B'), ('b'), ('A'), ('a');
Copy the code

By querying the information_schema. TABLES table, you can view the sorting rule of a table.

Such as:

SELECT TABLE_SCHEMA  Dbname = dbname = dbname = dbname
    , TABLE_NAME
    , TABLE_COLLATION 
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 't_name';
Copy the code
MariaDB [test]> SELECT TABLE_SCHEMA
    ->     , TABLE_NAME
    ->     , TABLE_COLLATION
    -> FROM INFORMATION_SCHEMA.TABLES
    -> WHERE TABLE_NAME = 'OrderTest';
+--------------+------------+-------------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_COLLATION   |
+--------------+------------+-------------------+
| test         | ordertest  | latin1_swedish_ci |
+--------------+------------+-------------------+
1 row in set.2 warnings (0.003 sec)
Copy the code

SHOW TABLE STATUS LIKE ‘t_name’; SHOW TABLE STATUS where name like ‘t_name’; Statement to view the Collation of the table. (The disadvantage is that the output column cannot be selected…)

MariaDB [test]> SHOW TABLE STATUS LIKE 'OrderTest'\G
*************************** 1. row ***************************
            Name: OrderTest
          Engine: InnoDB
         Version: 10
      Row_format: Dynamic
            Rows: 4
  Avg_row_length: 4096
     Data_length: 16384
 Max_data_length: 0
    Index_length: 0
       Data_free: 0
  Auto_increment: NULL
     Create_time: 2021- 08- 10 10:38:06
     Update_time: 2021- 08- 10 10:38:06
      Check_time: NULL
       Collation: latin1_swedish_ci
        Checksum: NULL
  Create_options:
         Comment:
Max_index_length: 0
       Temporary: N
1 row in set (0.001 sec)
Copy the code

View the character set and collation of the column

INFORMATION_SCHEMA. Table COLUMNS

In the information_Schema. COLUMNS table, you can view the column collation rules.

SELECT TABLE_NAME 
    , COLUMN_NAME 
    , character_set_name
    , COLLATION_NAME 
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 't_name';
Copy the code

If you want to see your database, attach the TABLE_SCHEMA column.

As follows:

MariaDB [(none)]> SELECT TABLE_NAME
    ->     , COLUMN_NAME
    ->     , character_set_name
    ->     , COLLATION_NAME
    -> FROM INFORMATION_SCHEMA.COLUMNS
    -> WHERE TABLE_NAME = 'OrderTest';
+------------+-------------+--------------------+-------------------+
| TABLE_NAME | COLUMN_NAME | character_set_name | COLLATION_NAME    |
+------------+-------------+--------------------+-------------------+
| ordertest  | letter      | latin1             | latin1_swedish_ci |
+------------+-------------+--------------------+-------------------+
1 row in set (0.011 sec)
Copy the code

SHOW FULL COLUMNS FROM <table>

SHOW FULL COLUMNS FROM

returns information about all COLUMNS of a table, including collation rules.
MariaDB [test]> SHOW FULL COLUMNS FROM OrderTest;
+--------+---------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field  | Type    | Collation         | Null | Key | Default | Extra | Privileges                      | Comment |
+--------+---------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| letter | char(1) | latin1_swedish_ci | NO   |     | NULL    |       | select.insert,update,references |         |
+--------+---------+-------------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.005 sec)
Copy the code

Collation rules are used explicitly in SQL statements

Use the COLLATE clause to specify collation rules

The default query uses the default collation and is case insensitive:

MariaDB [test]> select * from OrderTest where letter='a';
+--------+
| letter |
+--------+
| A      |
| a      |
+--------+
2 rows in set (0.000 sec)
Copy the code

Specifies a character set, and the lookup is strictly case-sensitive:

MariaDB [test]> select * from OrderTest where letter collate latin1_general_cs ='a';
+--------+
| letter |
+--------+
| a      |
+--------+
1 row in set (0.007 sec)
Copy the code

MySQL/MariaDB sort in strict accordance with the character of the code point sort?

From the following query, you can see that in the MySQL/MariaDB Order By clause sorting query, specifying the Collation of the sorting sequence as case-sensitive latin1_General_cs does not make the results strictly sorted By character code point.

MariaDB [test]> select * from OrderTest order by letter;
+--------+
| letter |
+--------+
| A      |
| a      |
| B      |
| b      |
+--------+
4 rows in set (0.000 sec)

MariaDB [test]> select * from OrderTest order by letter collate latin1_general_cs;
+--------+
| letter |
+--------+
| A      |
| a      |
| B      |
| b      |
+--------+
4 rows in set (0.002 sec)
Copy the code

To sort by the Unicode code value of the character, use the binary latin1_bin collation. As follows:

MariaDB [test]> select * from OrderTest order by letter collate latin1_bin;
+--------+
| letter |
+--------+
| A      |
| B      |
| a      |
| b      |
+--------+
4 rows in set (0.002 sec)
Copy the code