The screenshot of the CSV content used in this paper is as follows:

1. Import files

1.1 Import of different delimiters

1.1.1 Import with comma delimiter

Import CSV files using the read_csv() method, which by default delimits the data in the file with commas

Import pandas as pd df = pd.read_csv(r 'c :\Users\admin\Desktop\ data.csv ') print(df)Copy the code

result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'an 2 South China Guangdong Shenzhen 3 North China Beijing Beijing 4 Central China Hubei WuhanCopy the code

1.1.2 Import other specified delimiters

If the CSV file is not separated by commas but by negotiation symbols, you need to use sep to specify the delimiter. Otherwise, an error will be reported

eg:

Import pandas as pd df = pd.read_csv(r 'c :\Users\admin\Desktop\ data.csv ', sep=" ") print(df)Copy the code

result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'an 2 South China Guangdong Shenzhen 3 North China Beijing Beijing 4 Central China Hubei WuhanCopy the code

1.2 Importing Some Data

When the file is large, you can import only the first few lines of data

Df = pd.read_csv(r 'c :\Users\admin\Desktop\中文\ 表. CSV ', nrows=1) print(df)Copy the code

The result:

City 0 Northeast Dalian liaoning province

2. File encoding

2.1 the utf-8 encoding

If utF-8 (comma-separated) is used to save CSV files, the encoding parameter must be added during import.

Df = pd.read_csv(r 'c :\Users\admin\Desktop\ data. CSV ', encoding=' utF-8 ') print(df)Copy the code

The result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'an 2 South China Guangdong Shenzhen 3 North China Beijing Beijing 4 Central China Hubei WuhanCopy the code

You can also omit the encoding argument, since Python’s default encoding is UTF-8. In this case, the result is the same as that of importing CSV in 1

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ data.csv ') print(df)Copy the code

result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'an 2 South China Guangdong Shenzhen 3 North China Beijing Beijing 4 Central China Hubei WuhanCopy the code

2.2 GBK code

In this case, the encoding parameter must be set to GBK; otherwise, an error will be reported

Df = pd.read_csv(r 'c :\Users\admin\Desktop\ data. CSV ', encoding=' GBK ') print(df)Copy the code

result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'an 2 South China Guangdong Shenzhen 3 North China Beijing Beijing 4 Central China Hubei WuhanCopy the code

3. Chinese path problem

If the CSV file path contains Chinese characters, the lower version will report an error when reading the file.

There are four solutions:

(1) Change the Chinese path to the English path

(2) Upgrade to pandas. The 1.3.0 version installed on my computer supports Chinese paths

(3) Add open before the file address name

Df = pd.read_csv(open(r 'c :\Users\admin\Desktop\中文\ data.csv ')) print(df)Copy the code

(4) Add engine parameter

CSV ', engine='python', encoding=' utF-8-sig ') print(df)Copy the code

The read_csv() method uses C as the parsing language by default. In this case, you only need to change C to Python. If the CSV encoding mode is UTF-8, replace UTF-8 with UTF-8-SIG

Note: Methods 3 and 4 have not been used so far and their effects are unknown

4. Row index and data reading problems

The operation mode is the same as that of XLSX files. For details, see juejin.cn/post/698366…