Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
Finally started to work on this last little knowledge article.
All the time did not think of write what quite appropriate, be about to continue to prepare liver of November more article, too roll 👨💻
🏆 preface
This article should be a little knowledge. I like to use Windows emoji (Win +.) when WRITING articles. Can pop up like 👨 💻 🏂 🛌 🛀 🤽 came ️ ⛹ ️ came ️ 🤸 came ️, also, 🏎 🚠 🛫 💺 🚀 🛰 ⛵, this 🍟 🍔 🍿 🌭 🥞 🥙 🍰 🥤 🍸
The previous development project did not store such a small expression, but used the default mysql character set UTF-8, but today’s test found that it does not work, so there is a small article, I hope to help you learn.
📚 why utF-8 does not support Emoji
Tests done in a UTF-8 table did not support inserting data containing emoji.
Reason: MySQL database “UTF8” is not really utF-8 in concept. Currently visible character sets require only 3 bytes and cover all characters. But the problem with unicode6 encoding is that they require four bytes, the part known as emoji. So, if our database uses the default character Settings, we can’t store emojis.
📑 2. The difference between UTF-8 and UTF-8MB4
2.1 utF-8 (Unicode)
Let’s start with UTF-8. At the beginning, only 127 characters were encoded into the computer, that is, upper and lower case English letters, numbers, and some symbols. This encoding table is called ASCII encoding. China developed the GB2312 code to incorporate Chinese. As you can imagine, there are hundreds of languages around the world, and with Japan coding Japanese to Shift_JIS and Korea coding Korean to Euc-KR, each country has its own standards and inevitably conflicts, resulting in garbled text in a multilingual mix.
Hence Unicode. Unicode consolidates all languages into one code, so that there is no garbled problem anymore. Modern operating systems and most programming languages support Unicode directly.
So in UTF-8 encoding, one English character takes up one byte of storage space, and one Chinese (including traditional) takes up three bytes of storage space.
Currently, most visible character sets require only three bytes, covering all characters, but the current problem with unicode6 encodings is that they require four bytes, the part known as emoji. So, as long as you’re not in special code or Unicode and don’t have emoji, you’re guaranteed to be fine.
Another thing I need to add here is:
MySQL database “UTF8” is not really the concept of UTF-8, for one reason, and the MySQL “UTF8” encoding only supports a maximum of 3 bytes per character. The actual UTF-8 encoding you’re using should support 4 bytes per character.
The developers of MYSQL, however, did not fix this bug, but instead introduced a new character set, utF-8MB4 character encoding. 👇
2.2, utf-8 mb4
UTF8MB4: MySQL added UTF8MB4 encoding after 5.5.3. Mb4 stands for Most Bytes 4 and is specifically designed to be compatible with four-byte Unicode. So it can be used to store emojis.
After 8.0, MySQL will also use UTF-8MB4 as the default character encoding in some version.
Utf-8mb4 is the true UTF-8 encoding in MySQL.
So how to let MySQL store Emoji le.
📰 3. How to make MySQL store Emoji
When we create the database, we need to select the UTF-8MB4 character set, not UTF-8.
When setting the field character set, we also need to set the utF-8MB4 character set.
This way I can test in Navicat.
However, when I was searching for relevant information online, I said THAT I needed to modify the my.ini configuration file.
Add: character_set_server= utf8MB4 to [mysqld];
⌛ 4. Talk to yourself
Note: The next time someone asks you to set the encoding, please recommend utF-8MB4. This is the real UTF-8 encoding of MySQL.
Starting to think about what to write in November, nuggets, you can’t save me if I start learning front-end now.
You can also say what you want to watch, AND I’ll just write about it, or I’ll just learn about it and tweet about it. 👨 💻
This stage of our back end is really in a period of people and ghosts are rolling, so it’s very difficult.
Hello everyone, I am ning Zaichun: homepage
A young man who likes literature and art but takes the path of programming.
Hope: by the time we meet on another day, we will have achieved something.
Reference:
The difference between “UTF-8” and “UTF8MB4” in Mysql and their application scenarios