Python2 codecs have been confusing for the last two days, so I decided to spend some time over the weekend summarizing the problem.

First, let’s try something basic on the console.

Did you feel overwhelmed when you first saw this? The problem with this type is that Python2 is completely different from Python3, and the stuff written on the web is messy. I later discovered that in Python2 there is no such type as bytes. In Python2, STR is not a string, but an array of bytes, and the real string is Unicode. In python3, STR is just a string, which is unicode, and bytes is an array of bytes. Isn’t that confusing? Let’s put up a picture to make sense of it.

Well, understanding that explains a lot of things. As for why python2 is so weird. This, I can only say is left over by history.

And then understanding this, our decoding is much easier. The process of converting code points (Unicode characters) into specific bytes (bytes) is called encode, and the process of converting bytes into code points (Unicode characters) is called decoding.

So first, let’s look at python3.

Well, clearly, bytes and Unicode string conversions can cause errors when using incorrect codecs. Then let’s look at python2. Python2’s STR is actually bytes, so STR can only decode but not encode.

Huh? There seems to be something wrong. STR is a byte. There is no code. Unicode is a character that can only be encoded, not decoded. In python2, both operations return itself. Eldest brother, you report wrong ah, you report wrong ah.

Okay, so I guess all of this stuff in python2 is a relic of history, but python3 is a lot cleaner. So that’s it.