“This is my fourth day of participating in the First Challenge 2022. For more details: First Challenge 2022.”

Hello, I’m Ding Xiaojie. Pandas retrieves the specified data conditionally for Pandas, but if you haven’t used Query yet, you’ll be impressed by its simplicity.

Regular use

Create a DataFrame.

import pandas as pd

df = pd.DataFrame(
    {'A': ['e'.'d'.'c'.'b'.'a'].'B': ['f'.'b'.'c'.'d'.'e'].'C': range(0.10.2),
     'D': range(10.0, -2),
     'E.E': range(10.5, -1)})
Copy the code

We now select all the rows where the letters in column A appear in column B. Let’s start with two common ways of writing it.

>>> df[df['A'].isin(df['B'])]
   A  B  C   D  E.E
0  e  f  0  10   10
1  d  b  2   8    9
2  c  c  4   6    8
3  b  d  6   4    7
>>> df.loc[df['A'].isin(df['B'])]
   A  B  C   D  E.E
0  e  f  0  10   10
1  d  b  2   8    9
2  c  c  4   6    8
3  b  d  6   4    7
Copy the code

Let’s use query() to do this.

>>> df.query("A in B")
   A  B  C   D  E.E
0  e  f  0  10   10
1  d  b  2   8    9
2  c  c  4   6    8
3  b  d  6   4    7
Copy the code

As you can see, the code using Query is clean and easy to read, and it consumes less memory.

Multi-condition query

Select all rows in which column A letters appear in column B and column C is less than column D.

>>> df.query('A in B and C < D')
   A  B  C   D  E.E
0  e  f  0  10   10
1  d  b  2   8    9
2  c  c  4   6    8
Copy the code

And can also be represented by ampersand.

Reference variables

Externally defined variables can also be used in expressions, preceded by an @ sign.

>>> number = 5
>>> df.query('A in B & C > @number')
   A  B  C  D  E.E
3  b  d  6  4    7
Copy the code

Index selection

Select all rows with A column letter appearing in column B and index greater than 2.

>>> df.query('A in B and index > 2')
   A  B  C  D  E.E
3  b  d  6  4    7
Copy the code

Multi-index Select a DataFrame that creates a two-level index.

>>> import numpy as np
>>> colors = ['yellow'] *3 + ['red'] *2
>>> rank = [str(i) for i in range(5)]
>>> index = pd.MultiIndex.from_arrays([colors, rank], names=['color'.'rank'])
>>> df = pd.DataFrame(np.arange(10).reshape(5.2),columns=['A'.'B'] , index=index)
>>> df = pd.DataFrame(np.arange(10).reshape(5.2),columns=['A'.'B'] , index=index)
>>> df
             A  B
color  rank      
yellow 0     0  1
       1     2  3
       2     4  5
red    3     6  7
       4     8  9
Copy the code

1. If a multi-level index has a name, select the index name.

>>> df.query("color == 'red'")
            A  B
color rank      
red   3     6  7
      4     8  9
Copy the code

2. If multiple indexes have nameless names, select them by index level.

>>> df.index.names = [None.None]
>>> df.query("ilevel_0 == 'red'")
       A  B
red 3  6  7
    4  8  9
>>> df.query("ilevel_1 == '4'")
       A  B
red 4  8  9
Copy the code

Special characters

For column names with Spaces or other special characters such as operators, use backquotes.

>>> df.query('A == B | (C + 2 > `E.E`)')
   A  B  C  D  E.E
2  c  c  4  6    8
3  b  d  6  4    7
4  a  e  8  2    6
Copy the code

In general, query() is relatively easy to use, quick to learn, and much more readable.


For those who are new to Python or want to get started with Python, you can follow the public account “Python New Horizons” to communicate and learn Python together. They are all beginners. Sometimes a simple question is stuck for a long time, but others may suddenly realize it with a little help. There are also nearly 1,000 resume templates and hundreds of e-books waiting for you to collect!