Removing one or more columns from a pandasDataFrame is a fairly common task, but it turns out that there are many possible ways to perform this task. I find this StackOverflow problem, and the solutions and discussions within it, raise a lot of interesting topics. It’s worth digging into the details.
First, what is the “correct” way to delete a column from a DataFrame? The standard approach is to think in SQL and use DROP.
0 import pandas as pd import numpy as NP DF = pd.dataframe (Np.arange (25).0 columns=list("abcde")) display(df) try: df.drop('b') except KeyError as ke: print(ke)Copy the code
a b c d e
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
4 20 21 22 23 24
"['b'] not found in axis"
Copy the code
Wait, what? Why did it go wrong? This is because the default axis on which DROP works is rows. As with many pandas methods, there is more than one way to call this method (which some people find frustrating).
You can remove rows using axis=0 or axis=’rows’, or use the labels argument.
df.drop(0) # drop a row, on axis 0 or 'rows'
df.drop(0, axis=0) # same
df.drop(0, axis='rows') # same
df.drop(labels=0) # same
df.drop(labels=[0]) # same
Copy the code
a b c d e
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
4 20 21 22 23 24
Copy the code
Again, how do we delete a column?
We want to delete a column, so what does this look like? You can specify axis, or use columns.
df.drop('b', axis=1) # drop a column
df.drop('b', axis='columns') # same
df.drop(columns='b') # same
df.drop(columns=['b']) # same
Copy the code
a c d e
0 0 2 3 4
1 5 7 8 9
2 10 12 13 14
3 15 17 18 19
4 20 22 23 24
Copy the code
Ok, so that’s how you delete a column. Now you must assign it to a new variable, or return it to your old variable, or pass inplace=True to make the change permanent.
df2 = df.drop('b', axis=1)
print(df2.columns)
print(df.columns)
Copy the code
Index(['a', 'c', 'd', 'e'], dtype='object')
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Copy the code
It is worth noting that by using both index and columns arguments, you can use drop to drop both _ rows and columns, and you can pass in multiple values.
Df. Drop (index = [0, 2], the columns = [' b ', 'c'])Copy the code
a d e
1 5 8 9
3 15 18 19
4 20 23 24
Copy the code
If you don’t have the drop method, you can basically get the same result by indexing. There are many ways to accomplish this task, but an equivalent solution is to use the.LOC indexer and ISIN, while reversing the selection.
Df. Loc [~ df. Index. The isin ([0, 2]), ~ df. Columns. The isin ([' b ', 'c'])]Copy the code
a d e
1 5 8 9
3 15 18 19
4 20 23 24
Copy the code
If none of this makes sense to you, I recommend you read my series on selection and indexing in Pandas, starting here.
Back to the question
Returning to the original question, we see that there is another technique available to delete a column.
del df['a']
df
Copy the code
b c d e
0 1 2 3 4
1 6 7 8 9
2 11 12 13 14
3 16 17 18 19
4 21 22 23 24
Copy the code
Poof! It just disappears. This is like using inplace=True to do the delete.
What about property access?
We also know that we can use attribute access to select _ DataFrame columns.
df.b
Copy the code
0 1
1 6
2 11
3 16
4 21
Name: b, dtype: int64
Copy the code
Can we delete columns in this way?
del df.b
Copy the code
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-0dca358a6ef9> in <module>
----> 1 del df.b
AttributeError: b
Copy the code
We can’t. In the current PANDAS design, this is not an option to delete columns. Is that technically impossible? Why does del df[‘b’] work, but not del df.b? Let’s dig into these details and see if it’s possible to make the second version work as well.
The first version works because in PANDAS, the DataFrame implements the __delitem__ method, which is called when you execute del df[‘b’]. But, Del Df.B, is there a way to deal with this?
First, let’s make a simple class to show how this class works under the hood. Instead of making a real DataFrame, we’re just using a dict as a container for our columns (which can really contain anything, we don’t do any indexing here).
class StupidFrame:
def __init__(self, columns):
self.columns = columns
def __delitem__(self, item):
del self.columns[item]
def __getitem__(self, item):
return self.columns[item]
def __setitem__(self, item, val):
self.columns[item] = val
f = StupidFrame({'a': 1, 'b': 2, 'c': 3})
print("StupidFrame value for a:", f['a'])
print("StupidFrame columns: ", f.columns)
del f['b']
f.d = 4
print("StupidFrame columns: ", f.columns)
Copy the code
StupidFrame value for a: 1
StupidFrame columns: {'a': 1, 'b': 2, 'c': 3}
StupidFrame columns: {'a': 1, 'c': 3}
Copy the code
There are a few things to note here. First, we can use the index operator ([]) to access the data in our StupidFrame and use it to set, get, and delete items. When we assign D to our framework, it is not added to our column because it is just a normal instance attribute. If we want to be able to treat columns as attributes, we have to do more.
So following the pandas example (which supports attribute access to columns), we add the __getattr__ method, but we’ll also handle the setting with the __setattr__ method and pretend that any attribute assignment is a “column.” We must update our instance dictionary (__dict__) directly to avoid infinite recursion.
class StupidFrameAttr: def __init__(self, columns): self.__dict__['columns'] = columns def __delitem__(self, item): del self.__dict__['columns'][item] def __getitem__(self, item): return self.__dict__['columns'][item] def __setitem__(self, item, val): self.__dict__['columns'][item] = val def __getattr__(self, item): if item in self.__dict__['columns']: return self.__dict__['columns'][item] elif item == 'columns': return self.__dict__[item] else: raise AttributeError def __setattr__(self, item, val): if item ! = 'columns': self.__dict__['columns'][item] = val else: raise ValueError("Overwriting columns prohibited") f = StupidFrameAttr({'a': 1, 'b': 2, 'c': 3}) print("StupidFrameAttr value for a", f['a']) print("StupidFrameAttr columns: ", f.columns) del f['b'] print("StupidFrameAttr columns: ", f.columns) print("StupidFrameAttr value for a", f.a) f.d = 4 print("StupidFrameAttr columns: ", f.columns) del f['d'] print("StupidFrameAttr columns: ", f.columns) f.d = 5 print("StupidFrameAttr columns: ", f.columns) del f.dCopy the code
StupidFrameAttr value for a 1
StupidFrameAttr columns: {'a': 1, 'b': 2, 'c': 3}
StupidFrameAttr columns: {'a': 1, 'c': 3}
StupidFrameAttr value for a 1
StupidFrameAttr columns: {'a': 1, 'c': 3, 'd': 4}
StupidFrameAttr columns: {'a': 1, 'c': 3}
StupidFrameAttr columns: {'a': 1, 'c': 3, 'd': 5}
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-fd29f59ea01e> in <module>
39 f.d = 5
40 print("StupidFrameAttr columns: ", f.columns)
---> 41 del f.d
AttributeError: d
Copy the code
How can we handle deletions?
Everything works except deletion using property access. We use the array index operator ([]) and property access to handle setting/retrieving columns. But how do you detect deletions? Is that possible?
One way is to use the __delattr__ method, which is described in the data model documentation. If you define this method in your class, it will be called instead of updating an instance’s property dictionary directly. This gives us an opportunity to redirect it to our column instance.
class StupidFrameDelAttr(StupidFrameAttr):
def __delattr__(self, item):
# trivial implementation using the data model methods
del self.__dict__['columns'][item]
f = StupidFrameDelAttr({'a': 1, 'b': 2, 'c': 3})
print("StupidFrameDelAttr value for a", f['a'])
print("StupidFrameDelAttr columns: ", f.columns)
del f['b']
print("StupidFrameDelAttr columns: ", f.columns)
print("StupidFrameDelAttr value for a", f.a)
f.d = 4
print("StupidFrameDelAttr columns: ", f.columns)
del f.d
print("StupidFrameDelAttr columns: ", f.columns)
Copy the code
StupidFrameDelAttr value for a 1
StupidFrameDelAttr columns: {'a': 1, 'b': 2, 'c': 3}
StupidFrameDelAttr columns: {'a': 1, 'c': 3}
StupidFrameDelAttr value for a 1
StupidFrameDelAttr columns: {'a': 1, 'c': 3, 'd': 4}
StupidFrameDelAttr columns: {'a': 1, 'c': 3}
Copy the code
Now I’m not saying that column attribute deletion is easy to add to PANDAS, but at least it shows that it can be done. In current PANDAS, it is best to drop columns using DROP.
It is also worth noting that when you create a new column in PANDAS, you do not assign it as a property. To better understand how to create a column properly, you can check out this article.
If you already know how to delete a column in PANDAS, I hope you know a little more about the work.
The postHow to remove a column from a DataFrame, with some extra detailappeared first onwrighters.io.