This article is participating in Python Theme Month. See [activities]
The table used in this article is as follows:
Let’s look at the original situation:
Import pandas as pd df = pd.read_excel(r 'c :\Users\admin\Desktop\ test.xlsx ') print(df)Copy the code
result:
Name Age Score 0 Xiao Ming 23.0 78 1 Xiao Gang NaN 89 2 Xiao Hong 876.0 65 3 Li Hua 65.0 89 4 Xiao Mei NaN 43 5 Zhang SAN 34.0 90 6 Li Si NaN 34 7 Wang Wu 98.5 87Copy the code
1. Delete the column
Drop a column, mainly using the drop() method, that is, in the parentheses after the drop() method to specify the column name or position (that is, the column) to drop.
1.1 Pass column names directly
You need to add an axis parameter with a value of 1 to delete the column.
1.1.1 Deleting a Single Column
Df = pd.read_excel(r 'c :\Users\admin\Desktop\ test.xlsx ') print(df.drop(" score ", axis=1))Copy the code
The result:
Name age 0 Xiao Ming 23.0 1 Xiao Gang NaN 2 Xiao Hong 876.0 3 Li Hua 65.0 4 Xiao Mei NaN 5 Zhang SAN 34.0 6 Li Si NaN 7 Wang Wu 98.5Copy the code
As a result, the grade column has been removed
1.1.2 Deleting Multiple Columns
Df = pd read_excel (r 'C: \ Users \ admin \ Desktop \ test. XLSX') print (df) drop ([" achievement ", "age"], axis = 1))Copy the code
result:
My name is 0 Xiao Ming 1 Xiao Gang 2 Xiao Hong 3 Li Hua 4 Xiao mei 5 Zhang SAN 6 Li Si 7 Wang WuCopy the code
According to the results, the grade and age columns have been deleted
1.2 Pass the column position
In this case, you still need to add an axis parameter with a value of 1 to delete the column.
1.2.1 Deleting a Single Column
Df = pd.read_excel(r'C:\Users\admin\Desktop\ XLSX ') print(df.drop(df.columns[2], axis=1)Copy the code
The result:
Name age 0 Xiao Ming 23.0 1 Xiao Gang NaN 2 Xiao Hong 876.0 3 Li Hua 65.0 4 Xiao Mei NaN 5 Zhang SAN 34.0 6 Li Si NaN 7 Wang Wu 98.5Copy the code
Column 3 (2 by index) is dropped here
1.2.2 Deleting multiple Columns
Df = pd.read_excel(r'C:\Users\admin\Desktop\ XLSX ') print(df.drop(df.columns[[1, 2]], axis=1)Copy the code
The result:
My name is 0 Xiao Ming 1 Xiao Gang 2 Xiao Hong 3 Li Hua 4 Xiao mei 5 Zhang SAN 6 Li Si 7 Wang WuCopy the code
Columns 2 and 3 have been deleted, and since there are three columns in total, only one column remains
1.3 Deleting columns
In this way, the names of the columns to be deleted are passed to the columns parameter in the form of a list, and the Axis parameter is not needed
1.3.1 Deleting a Single Column
Df = pd.read_excel(r'C:\Users\admin\Desktop\ XLSX ') print(df.drop(columns=' achievement '))Copy the code
The result:
Name age 0 Xiao Ming 23.0 1 Xiao Gang NaN 2 Xiao Hong 876.0 3 Li Hua 65.0 4 Xiao Mei NaN 5 Zhang SAN 34.0 6 Li Si NaN 7 Wang Wu 98.5Copy the code
Delete the score column here as well
1.3.2 Deleting Multiple Columns
Df = pd.read_excel(r'C:\Users\admin\Desktop\ XLSX ') print(df.drop(columns=[' columns ', 'age']))Copy the code
The result:
My name is 0 Xiao Ming 1 Xiao Gang 2 Xiao Hong 3 Li Hua 4 Xiao mei 5 Zhang SAN 6 Li Si 7 Wang WuCopy the code
Here, the score and age columns are also deleted
Looking at the above three methods, it is not difficult to see that when deleting multiple columns of information, the columns to be deleted are given as nested lists
2. Delete rows
To facilitate comparison, set a downstream index for the original data before comparison
Df = pd read_excel (r 'C: \ Users \ admin \ Desktop \ test. XLSX'). Df index = [' A ', 'B', 'C', 'D', 'E', 'F', 'G', 'H'] print (df)Copy the code
result:
Name Age Grade A Xiao Ming 23.0 78 B Xiao Gang NaN 89 C Xiao Hong 876.0 65 D Li Hua 65.0 89 E Xiao Mei NaN 43 F Zhang SAN 34.0 90 G Li Si NaN 34 H Wang Wu 98.5 87Copy the code
To drop a line, use the drop() method, that is, the parentheses after the drop() method indicate the name or position of the line to be dropped.
2.1 Pass the line name directly
You need to add an axis argument with a value of 0 to delete the row.
2.1.1 Deleting a Single line
When deleting a single line, simply pass the line number
Df = pd read_excel (r 'C: \ Users \ admin \ Desktop \ test. XLSX'). Df index = [' A ', 'B', 'C', 'D', 'E', 'F', 'G', 'H'] print(df.drop('A', axis=0))Copy the code
The result:
Name Age Grade B Xiao Gang NaN 89 C Xiao Hong 876.0 65 D Li Hua 65.0 89 E Xiao Mei NaN 43 F Zhang SAN 34.0 90 G Li Si NaN 34 H Wang Wu 98.5 87Copy the code
As you can see, the information of xiaoming’s line with index A has been deleted. After the content in turn migration, deleted after the first line of content became Xiaogang
2.1.2 Deleting multiple rows
When multiple lines are deleted, multiple line numbers should be given as lists
Df = pd read_excel (r 'C: \ Users \ admin \ Desktop \ test. XLSX'). Df index = [' A ', 'B', 'C', 'D', 'E', 'F', 'G', 'H'] print(df.drop(['B', 'D'], axis=0))Copy the code
result:
Name Age Grade A Xiao Ming 23.0 78 C Xiao Hong 876.0 65 E Xiao Mei NaN 43 F Zhang SAN 34.0 90 G Li Si NaN 34 H Wang Wu 98.5 87Copy the code
You can see that xiaogang’s line with index B and Li Hua’s line with index D have been deleted.
2.2 Pass row position
In this case, you still need to add an axis argument with a value of 0 to indicate row deletion.
In this case, you still need to add an axis parameter with a value of 1 to delete the column.
2.2.1 Deleting a Single line
Df = pd read_excel (r 'C: \ Users \ admin \ Desktop \ test. XLSX'). Df index = [' A ', 'B', 'C', 'D', 'E', 'F', 'G', 'H'] print(df.drop(df.index[2], axis=0))Copy the code
The result:
Name Age Grade A Xiao Ming 23.0 78 B Xiao Gang NaN 89 D Li Hua 65.0 89 E Xiao Mei NaN 43 F Zhang SAN 34.0 90 G Li Si NaN 34 H Wang Wu 98.5 87Copy the code
Row 3 (2 by index) is dropped here
2.2.2 Deleting multiple rows
Df = pd read_excel (r 'C: \ Users \ admin \ Desktop \ test. XLSX'). Df index = [' A ', 'B', 'C', 'D', 'E', 'F', 'G', 'H'] print(df.drop(df.index[[1, 2]], axis=0))Copy the code
The result:
Name Age Grade A Xiao Ming 23.0 78 D Li Hua 65.0 89 E Xiao Mei NaN 43 F Zhang SAN 34.0 90 G Li Si NaN 34 H Wang Wu 98.5 87Copy the code
Lines 2 and 3 are deleted here
2.3 Deleting a file using the index parameter
In this way, the names of the rows to be deleted are passed to the index argument as a list, without the need for the Axis argument
2.3.1 Deleting a Single line
Df = pd read_excel (r 'C: \ Users \ admin \ Desktop \ test. XLSX'). Df index = [' A ', 'B', 'C', 'D', 'E', 'F', 'G', 'H'] print(df.drop(index='A'))Copy the code
The result:
Name Age Grade B Xiao Gang NaN 89 C Xiao Hong 876.0 65 D Li Hua 65.0 89 E Xiao Mei NaN 43 F Zhang SAN 34.0 90 G Li Si NaN 34 H Wang Wu 98.5 87Copy the code
Here also delete xiaoming’s line of information
2.3.2 Deleting multiple rows
Df = pd read_excel (r 'C: \ Users \ admin \ Desktop \ test. XLSX'). Df index = [' A ', 'B', 'C', 'D', 'E', 'F', 'G', 'H'] print(df.drop(index=['B', 'D']))Copy the code
The result:
Name Age Grade A Xiao Ming 23.0 78 C Xiao Hong 876.0 65 E Xiao Mei NaN 43 F Zhang SAN 34.0 90 G Li Si NaN 34 H Wang Wu 98.5 87Copy the code
Here is also to delete xiaogang, Li Hua this 2 lines of information
Looking at the above three methods, it is not difficult to see that when multiple lines of information are deleted, the rows to be deleted are given as nested lists
3. Delete a specific row
Deleting a specific row usually means deleting rows that meet certain criteria. In the previous section of the Pandas series of conversions and handling of outliers, a deletion of an outlier is considered a deletion of a specific row
When deleting a specific row, the value that meets the condition is often not deleted directly, but the value that does not meet the condition is filtered out as the new data source, so that the row to be deleted is filtered out.
For example, delete the record information of failing grades
Df = pd.read_excel(r'C:\Users\admin\Desktop\ test.xlsx ') print(df[df[' result '] >= 60])Copy the code
result:
Name Age Score 0 Xiao Ming 23.0 78 1 Xiao Gang NaN 89 2 Xiao Hong 876.0 65 3 Li Hua 65.0 89 5 Zhang SAN 34.0 90 7 Wang Wu 98.5 87Copy the code