Important: Python crawlers work with strings all the time, so it’s important to know strings, especially when extracting data
STR and bytes:
STR: in Unicode form, which we can intuitively understand
Bytes: Displays data in binary format. All data is transmitted on the network in binary format, which is hard to understand
ASCLL:
A byte represents a character, which occupies a small memory, but cannot completely represent the characters of all countries. Each country has different codec methods, which makes it difficult to use
Unicode:
Two bytes are used to represent one character, occupying large memory, which can represent almost all characters, but occupying large memory is difficult to promote
UTF-8:
1. Unicode is an implementation of Unicode, which can be understood as an upgraded version of Unicode.
2. Automatic recognition of the number of characters, can use any byte to represent a character, is a variable length encoding method, and select the smallest byte
3. Make the memory use optimal and can represent all characters
Praise the utf-8!!!!!