For more exciting content, please pay attention to wechat public number: Back-end technology Cabin

Like me, you’ve probably encountered all sorts of garbled code problems in your daily Python development. Therefore, some common problems and solutions are summarized based on my personal experience.

0 How to view the py source file encoding

Vim py file input

:set fileencoding
Copy the code

The output

fileencoding=utf-8
Copy the code

1 How to determine the encoding of static strings in py

How static strings are encoded depends on the coding setting in the file header. In the code below, the variable name is encoded in UTF-8.

#! /usr/bin/env python
# coding: utf-8
name = "Back-end Technology Cabin."
Copy the code

If coding is not set, running the py file will report a syntax error because Python Interpreter does not recognize Chinese characters in the file

SyntaxError: Non-ASCII character '\xe5' in file 1.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
Copy the code

2 How to view the standard input and standard output codes

The stdin/ STdout encoding can be viewed in the following three ways

2.1 Using the Python SYS library

$python python 2.7.6 (default, Nov 13 2018, 12:45:42) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> print sys.stdin.encoding UTF-8 >>> print sys.stdout.encoding UTF-8 >>>Copy the code

2.2 Viewing the Environment Variable LANG

$ echo $LANG
en_US.UTF-8
Copy the code

2.3 perform the locale

$ locale LANG=en_US.UTF-8 LANGUAGE=en_US: LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8"  LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=Copy the code

3 How do I change the standard input and standard output codes

First we need to know that it is the environment variable LANG that determines the encoding of STdin and stdout

If you want to make a temporary change, you can specify the standard input/output code by using export LANG=’XXX’ on the command line. This setting becomes invalid when the terminal exits.

To change the value of LANG permanently, run vim /etc/sysconfig/i18n to change the value of LANG, and then run source /etc/sysconfig/i18n to take effect.

4 Define the difference between prefixes ‘u’ and ‘b’ for strings

First we assume # coding: UTF-8 in the py file header

When a = u’ back-end tech shack ‘is executed, the UTF-8 encoded static string is converted to Unicode encoding and assigned to A;

When b = b’ back-end tech hut ‘is executed, the UTF-8 encoded static string is assigned directly to B

5 Define a=u’ back-end tech hut ‘and b=b’ back-end tech hut’, why print a and B the same result

Assume that the py file header specifies coding as UTF-8 and that the standard input/output encoding is UTF-8

When Python executes print a, Unicode is automatically converted to utF-8, the standard output encoding, and printed

When Python executes print b, the string b is encoded utF-8, which happens to be the same as the standard output. Although a and B are defined in different encoding ways, they are both printed in standard output encoding, so we see that printing a and B gives the same result.

There’s actually some coincidence here: the string B happens to be encoded the same as the standard output. If the encoding of B is inconsistent with the encoding of the standard output (for example, the encoding of B is changed to GBK), we will see gibberish because GBK does not automatically translate to UTF-8, which is where most of the Chinese gibberish problems we encounter in Python development come from.

In practical development, it is strongly recommended that all strings that may contain Chinese characters be converted to Unicode encoding before printing. As follows:

# coding: gbk
a = "Back end Development Cabin."     # GBK code
aa = a.decode("gbk")  # Unicode
print aa                
Copy the code
# coding: utf-8
a = "Back end Development Cabin."        # utf-8 encoding
aa = a.decode("utf-8")   # Unicode
print aa
Copy the code

What encoding does the standard output of py redirect to a file?

The encoding mode is the same as standard output

7 Why does the VIM file open without garbled characters, but the CAT file has garbled characters

The file code is inconsistent with the standard output code. Vim supports file encoding, so viM opens files without garbled characters, and CAT files have garbled characters. So the solution is to make the file encoding consistent with the standard output coding, can eliminate the garbled code.

Solution 1: Run the iconv -f XXX -t XXX file command to convert the file to the standard output

Solution 2: Modify the standard output encoding (see question 2) to make it consistent with the file encoding

8 The vim file has garbled characters, but the CAT file does not

Note VIM does not recognize this file code

Solutions:

  • Vim input:set fileencodingTo view the file encoding
  • Vim ~/.vimrc to add the fileencoding to the set Fileencodings list, for exampleset fileencodings=ucs-bom,utf-8,cp936,gb18030,big5,euc-jp,euc-kr,latin1
  • Vim reopens the file

9 summary

Having said all that, to sum up:

  • In Python development, if a string that may contain Chinese characters needs to be printed, it needs to be converted to Unicode encoding. Python automatically converts Unicode to the encoding of the output medium.
  • Standard output, standard input, files, vim(or other editors) on the problem of garbled code, can be based on the above problem, the remedy to solve.

Recommended reading

  • STL source code analysis — Vector
  • STL source code analysis – Hashtable
  • STL source code analysis –algorithm
  • Principles of the ZooKeeper Client
  • Redis implements distributed locking
  • Recommend a few useful efficiency devices
  • Restrict the C/C++ keyword
  • Rvalue semantics in modern C++

For more exciting content, please scan the code to follow the wechat public number: back-end technology cabin. If you think this article is helpful to you, please share, forward, read more.