Why can variable names be used in Chinese in Python? Because the UTF8 character encoding

This is the 16th day of my participation in Gwen Challenge

Chinese as variable name

> > > title ='Play some programming with Chinese in Python'
>>> Name list = ['Joe'.'bill'.'Cathy'.'Daisy'.'money seven']
>>> forThe nameinList of names:.    print(Name, end='> > >')
.The output... Zhang SAN >> Li Si >>> Wang Wu >> Zhao Liu >>> Qian Qi >>>Copy the code

In every Python book, the naming of variables in Python follows the following rules:

You can use numbers, letters, and underscores
Can not start with a number why can use Chinese as a variable without error? The answer is character encoding because the Python3 interpreter’s default encoding has changed from Python2’s ASCII encoding to UTF-8, which now supports writing any Unicode string, including Chinese.

Have no idea what it says?? Here is a family photo of the current character encoding table.

Character coding table

First, look at the concept of character encoding and character encoding table.

A character encoding

We know that any data is ultimately stored in the computer as binary 010101, and so are the characters we see. Great computer pioneer on the graphical interface to show our character, using a table, this table is a character encoding table: table on each specific Numbers (that is, we say that a character encoding) corresponds to a specific character, when a computer is to receive a certain number (a character encoding) is displayed on the graphical interface to the corresponding characters. For example, the decimal 65 in the following table represents the character A. So, our computer can according to this table according to the specific code provided by the user to query the specific characters, and then displayed in front of us.

A character encoding	character
65 in decimal	A
66 in decimal	B

Character coding table

A table of character encodings and corresponding characters, that is, a table of a number corresponding to a character

Table of common character encodings

ASCII

ASCII is the granddaddy of all subsequent character encoding tables. Since computers were first born in the United States, this list is suitable for the United States. It mainly contains uppercase, lowercase, alphanumeric and some special characters, which can be done in a single byte. The default encoding used by Python2 is ASCII, so Python2 does not support Chinese, or more essentially, ASCII has no equivalent for Chinese characters

GB2312 GBK

Computers come across the ocean to China and can’t use Chinese?? How can you!! We expanded on ASCII, added Chinese characters, and finally formed GBK, two bytes for a Chinese character.

Unicode

However, each country has its own character table. Is it necessary to prepare the character code table of all countries for corresponding decoding when communicating? At that time, the International Organization for Standardization (ISO) proposed a standard code: Unicode to unify the world. Unicode is represented by two bytes as a single character, which can combine a total of 65535 different characters, which is enough to cover all the symbols in the world (including oracle bones)

UTF-8

However, according to Uicode standards, the original letters and numbers can be handled by one byte, but it needs two characters to be stored. It can be solved by 1 yuan, but it costs 2 yuan. So utF-8 comes out, it’s a variable length encoding table, alphanumeric is still one byte, and other characters are three or four bytes for a character. Chinese in UTF-8 stores a Chinese character in 3 bytes. So in the network transmission, if all Chinese, using GBK will save more space.

Python3 supports Chinese

Since the Python3 interpreter’s default encoding uses UTF-8, which naturally includes all characters in the world, variables can be defined in Chinese in Python3.

Because in programming, we still follow the principles of variable definition mentioned at the beginning of this article. First of all, switching between Chinese and English is tiring enough, and second, variable definition in Chinese is still very awkward. Naming variables in the native language of other native countries also helps to sort out the logic.

Why can variable names be used in Chinese in Python? Because the UTF8 character encoding

Chinese as variable name

Character coding table

A character encoding

Character coding table

ASCII

GB2312 GBK

Unicode

UTF-8

Python3 supports Chinese

Related Posts

Rookie programmer, cheated by unscrupulous HR, “accidentally” won the offer of Meituan

Mysql basics: You must Know, you must Know

LVS+Keepalived for high availability and load balancing