Some thoughts on ASCII, Unicode and UTF-8 encoding problems

One, foreword

Just a few thoughts on the coding problem, just a quick look.

Second, about coding

Three, validation,

All we might have seen before was some theory, but let’s test it with Python3. Let’s look at the characters ‘A’ and ‘middle’ in different encodings.

The ASCII, UTF-8, GB2312 codes of A

>>> 'A'.encode('ascii')
b'A'
Copy the code

>>> 'A'.encode('utf-8')
b'A'
Copy the code

>>> 'A'.encode('gb2312')
b'A'
Copy the code

ASCII, UTF-8, GB2312 encoding

>>> 'in'.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u4e2d' in position 0: ordinal not in range(128)
Copy the code

>>> 'in'.encode('utf-8')
b'\xe4\xb8\xad'
Copy the code

>>> 'in'.encode('gb2312')
b'\xd6\xd0'
Copy the code

As you can see, Chinese cannot be ASCII encoded.

Some thoughts on ASCII, Unicode and UTF-8 encoding problems

One, foreword

Second, about coding

Three, validation,

Related Posts

Adaptation problems when APP embedded H5 page, and two-way communication between APP and H5 data transmission

Front end two design pattern classic interview questions 🔥

Want to write good object-oriented code, this article must see | learn JS