“This is my fourth day of participating in the First Challenge 2022. For more details: First Challenge 2022.”
Hello, I’m Ding Xiaojie. Pandas retrieves the specified data conditionally for Pandas, but if you haven’t used Query yet, you’ll be impressed by its simplicity.
Regular use
Create a DataFrame.
import pandas as pd
df = pd.DataFrame(
{'A': ['e'.'d'.'c'.'b'.'a'].'B': ['f'.'b'.'c'.'d'.'e'].'C': range(0.10.2),
'D': range(10.0, -2),
'E.E': range(10.5, -1)})
Copy the code
We now select all the rows where the letters in column A appear in column B. Let’s start with two common ways of writing it.
>>> df[df['A'].isin(df['B'])]
A B C D E.E
0 e f 0 10 10
1 d b 2 8 9
2 c c 4 6 8
3 b d 6 4 7
>>> df.loc[df['A'].isin(df['B'])]
A B C D E.E
0 e f 0 10 10
1 d b 2 8 9
2 c c 4 6 8
3 b d 6 4 7
Copy the code
Let’s use query() to do this.
>>> df.query("A in B")
A B C D E.E
0 e f 0 10 10
1 d b 2 8 9
2 c c 4 6 8
3 b d 6 4 7
Copy the code
As you can see, the code using Query is clean and easy to read, and it consumes less memory.
Multi-condition query
Select all rows in which column A letters appear in column B and column C is less than column D.
>>> df.query('A in B and C < D')
A B C D E.E
0 e f 0 10 10
1 d b 2 8 9
2 c c 4 6 8
Copy the code
And can also be represented by ampersand.
Reference variables
Externally defined variables can also be used in expressions, preceded by an @ sign.
>>> number = 5
>>> df.query('A in B & C > @number')
A B C D E.E
3 b d 6 4 7
Copy the code
Index selection
Select all rows with A column letter appearing in column B and index greater than 2.
>>> df.query('A in B and index > 2')
A B C D E.E
3 b d 6 4 7
Copy the code
Multi-index Select a DataFrame that creates a two-level index.
>>> import numpy as np
>>> colors = ['yellow'] *3 + ['red'] *2
>>> rank = [str(i) for i in range(5)]
>>> index = pd.MultiIndex.from_arrays([colors, rank], names=['color'.'rank'])
>>> df = pd.DataFrame(np.arange(10).reshape(5.2),columns=['A'.'B'] , index=index)
>>> df = pd.DataFrame(np.arange(10).reshape(5.2),columns=['A'.'B'] , index=index)
>>> df
A B
color rank
yellow 0 0 1
1 2 3
2 4 5
red 3 6 7
4 8 9
Copy the code
1. If a multi-level index has a name, select the index name.
>>> df.query("color == 'red'")
A B
color rank
red 3 6 7
4 8 9
Copy the code
2. If multiple indexes have nameless names, select them by index level.
>>> df.index.names = [None.None]
>>> df.query("ilevel_0 == 'red'")
A B
red 3 6 7
4 8 9
>>> df.query("ilevel_1 == '4'")
A B
red 4 8 9
Copy the code
Special characters
For column names with Spaces or other special characters such as operators, use backquotes.
>>> df.query('A == B | (C + 2 > `E.E`)')
A B C D E.E
2 c c 4 6 8
3 b d 6 4 7
4 a e 8 2 6
Copy the code
In general, query() is relatively easy to use, quick to learn, and much more readable.
For those who are new to Python or want to get started with Python, you can follow the public account “Python New Horizons” to communicate and learn Python together. They are all beginners. Sometimes a simple question is stuck for a long time, but others may suddenly realize it with a little help. There are also nearly 1,000 resume templates and hundreds of e-books waiting for you to collect!