External data import

Importing Excel files

2. Read_excel () :

import pandas as pd
excel_file1 = pd.read_excel('data/test. XLSX',encoding='utf-8')Copy the code
Name age job salary0Zhang SAN25students200
1Li si24workers3000
2Wang wei28    NaN      5000
3Wang Ermao22freelance6000Copy the code

Encoding = ‘UTF-8’ and sheet_name specifies the name of the Sheet and the order in which it is passed, starting at 0.

excel_file1 = pd.read_excel('data/test.xlsx',sheet_name = 0)Copy the code

Specify the index

Column indexes start at 0 by default, set with index_col, and set with header for row indexes.

excel_file1 = pd.read_excel('data/test. XLSX',encoding='utf-8', index_col = 0)Copy the code
Age Working seniority name Zhang SAN25students200Li si24workers3000Wang wei28   NaN       5000Wang Ermao22freelance6000Copy the code
excel_file1 = pd.read_excel('data/test. XLSX',encoding='utf-8', the header =1)Copy the code
Zhang SAN25students200
0Li si24workers3000
1Wang wei28   NaN      5000
2Wang Ermao22freelance6000Copy the code

Specify index column

Sometimes there are too many columns in a local file, so you can set userCOLs to specify which columns to import, or you can pass in multiple values in list form to indicate which columns are passed in.

excel_file1 = pd.read_excel('data/test. XLSX',encoding='utf-8',usecols=[0.2])Copy the code
Name of work0Zhang SAN student1Li si workers2Wang wei NaN3Wang Ermao is a freelance workerCopy the code

Commonly used method

  • Shape () retrieves rows and columns of an Excel file and returns them as primitives;

  • Info () gets the data type;

  • Astype () converts the data type inside the column, with the target type to be converted in parentheses; Such as df column [2] astype (‘ float64 ‘); Df [‘ column ‘]. Dtype You can view the column type

  • Isnull () determines which value is missing;

  • Dropna () dropna() dropna() dropna() dropna() dropna() dropna() dropna()

  • Fillna () parentheses can be filled directly with the value to be filled, or column fill can be specified, passing the argument in dictionary form;

  • Drop_duplicates () Default For duplicates. Keep =first specifies the first row value. You can set the value of keep to last and keep the last, or set keep to False and keep none. You can also specify a subset of the column name for deduplication, for example, subset=[‘ Column name 1, column name 2’].

  • The argument passed in head() gets the first few lines;

  • Describe the distribution of numerical values such as mean, maximum, variance, quantile

  • Column and INDEX You can set the column index and row index and send parameters in the form of a list.

  • Set_index () resets the index column, passing in the name to be used as the row index;

  • Reset_index (level = None,drop=False,inplace =False), level specifies the level of the hierarchical index to be converted to columns. The first index is 0 and the second is 1. By default, all columns are converted to columns. Drop Whether to drop the original index, inplace whether to modify the original table; This method is commonly used in data grouping and pivottables.

  • Rename () renames an index. You can rename columns and index to send columns and indexes as dictionaries. The key is the original value and the value is the replaced value.

Importing a CSV file

Pandas import CSV files using read_csv().

import pandas as pd
csv_file1 = pd.read_csv('.\\data\\train-pivot.csv',index_col=0,header=0,nrows = 2) Copy the code

Sep sets the separator, and encoding specifies the encoding format. Import CSV file must be specified as GBK, otherwise error will be reported, if a large file you just look at the first few lines, through nRows Settings.

import pandas as pd
csv_file1 = pd.read_csv('data/train-pivot.csv',encoding='gbk',nrows=2)
print(csv_file1)Copy the code

User ID Indicates whether the customer category region is the provincial capital7In sales8In sales0  59224Class A first-tier cities are6    20     0
1  55295Class B third-tier cities no37    27    35Copy the code

Usercols can be set to specify imported columns.

User ID area0  59224First-tier cities1  55295Third line cities2  46035Second-tier cities3   2459First-tier cities4  22179Third line citiesCopy the code

Import SQL

Pandas has the read_sql() method:

import pandas as pd
import pymysql
Create a connection
conn = pymysql.connect(host = 'localhost',user = 'python',
                       password = 'passwd',db = 'test',
                       charset = 'utf-8'
                       )
User: user name password: password host: database address/localhost db: database name charset: encoding, usually UTF-8"
sql = "SELECT * FROM user" Write the SQL statement to execute
pd.read_sql(sql,conn)Copy the code

Added: Data export

Df.to_excel () exports excel files;

Df.to_csv () exports the CSV file.