MySQL utf8 and UTF8MB4
- MySQL supports a maximum of 3 bytes of UTF8 encoding
- In MySQL UTF8 encoding, 4-byte wide characters will be inserted exceptions, such as some infrequently used Chinese characters
- MySQL 5.5.3 supports UTF8MB4
- Utf8mb4 is a superset of UTF8. When upgrading data from an older version of MySQL UTF8 to UTF8MB4, there is no need to worry about character conversion or data loss
Utf8mb4_unicode_ci and utf8mb4_general_ci
- Before MySQL 8.0 utF8MB4_general_CI was the default collation
- There is no real difference between Chinese and English, utF8_general_ci will do
- Utf8mb4_unicode_ci In special cases, Unicode collation implements a slightly more complex sorting algorithm to be able to handle special characters
- Utf8mb4_general_ci is faster, but less accurate
- Use UTf8_unicode_ci if the application has German, French, or Russian
utf8mb4_0900_ai_ci
- It is used by default in MySQL 8.0.1 and later, and is one of utF8MB4_unicode_CI
- 0900 refers to the Unicode collation algorithm version
- Ai refers to insensitivity of accent, for example, there is no difference between e, E, E, e and E when sorting
- Ci means case insensitive, for example, there is no difference between p and P when sorting
- If you want accent sensitivity and case sensitivity, use UTF8MB4_0900_AS_cs
- It can store emojis, which take up to four bytes
conclusion
- New projects try to use utF8MB4 characters compatible with multiple languages, such as UTF8MB4_general_ci
- For MySQL prior to 8.0, be sure to use UTF8MB4_unicode_ci if your project may contain other languages
- For MySQL 8.0 or later, use the default UTF8MB4_0900_ai_CI
- Using vARCHar instead of char saves space when using UTF8MB4.
Utf8 and UTF8MB4 are different from each other in MySQL