Public account: You and the cabin by: Peter Editor: Peter
Pandas series _DataFrame Data filtering _
Pandas has a wide variety of methods for filtering data. In this article, we will focus on the following methods:
- Expression fetch
- Query, evel
- filter
- Where, mask
Further reading
For a series of articles about PANDAS, read:
1, DataFrame data filter _
2. 10 ways to create DataFrame data
3. Create Series type data
4. It all starts with the explosive function
Simulated data
The following is a complete simulation of the data, including: name, gender, age, mathematics, Chinese, total score, address a total of 7 field information.
import pandas as pd
import numpy as np
df = pd.DataFrame({
"name": ['Ming'.'wang'.'zhang fei'.'GuanYu'.Sun Xiaoxiao.'Wang Jianguo'.'pei liu']."sex": ['male'.'woman'.'woman'.'male'.'woman'.'male'.'woman']."age": [20.23.18.21.25.21.24]."math": [120.130.118.120.102.140.134]."chinese": [100.130.140.120.149.111.118]."score": [590.600.550.620.610.580.634]."address": [Nanshan District, Shenzhen City, Guangdong Province."Haidian District, Beijing"."Yuhua District, Changsha City, Hunan Province".Dongcheng District, Beijing."Baiyun District, Guangzhou City, Guangdong Province"."Jiangxia District, Wuhan City, Hubei Province"."Longhua District, Shenzhen City, Guangdong Province"
]
})
df
Copy the code
Here are the five methods of taking numbers:
- Expression fetch
- The query () take a number
- The eval () take a number
- The filter () take a number
- Where/mask access
Expression fetch
Expression fetch refers to the use of an expression to specify one or more filters to fetch numbers.
1. Specify a mathematical expression
# 1. Mathematical expressions
df[df['math'] > 125]
Copy the code
2. Reverse operation
The inverse operation is implemented with the symbol ~
# 2. Reverse operation
df[~(df['sex'] = ='male')] # Retrieve data that is not male
Copy the code
3. Specify the value of an attribute as specific data
# 3. Specify specific data
df[df.sex == 'male'] # = df[df['sex'] == 'male ']
Copy the code
4. Inequality expressions
# 4. Compare expressions
df[df['math'] > df['chinese']]
Copy the code
5. Logical operators
# 5. Logical operators
df[(df['math'] > 120) & (df['chinese'] < 140)]
Copy the code
The query () function
Directions for use
When using ⚠️, note that if there is a space in the column attribute, we need to enclose it in backquotes.
Use case
1. Use numeric expressions
df.query('math > chinese > 110')
Copy the code
df.query('math + chinese > 255')
Copy the code
df.query('math == chinese')
Copy the code
df.query('math == chinese > 120')
Copy the code
df.query('(math > 110) and (chinese < 135)') # Two inequalities
Copy the code
2. Use character expressions
df.query('sex ! = "female" ') # Is not equal to female, is all male
Copy the code
df.query('Sex not in (' girl')') # If it's not women, it's men
Copy the code
df.query('Sex in (' male', 'female')') # Gender is the whole person in male and female
Copy the code
3. Pass in variables; Variables need to be preceded by @ when used
# set variable
a = df.math.mean()
a
df.query('math > @a + 10')
Copy the code
df.query('math < (`chinese` + @a) / 2')
Copy the code
The eval () function
The eval function is used in the same way as the query function
1. Use numeric expressions
# 1. Numeric expressions
df.eval('math > 125') # is a bool expression
Copy the code
df[df.eval('math > 125')]
Copy the code
df[df.eval('math > 125 and chinese < 130')]
Copy the code
2. Character expressions
# 2, character expressions
df[df.eval('Sex in (' male')')]
Copy the code
3. Use variables
# 3. Use variables
b = df.chinese.mean() # calculating mean
df[df.eval('math < @b+5')]
Copy the code
The filter function
We can use filter to filter column or row names by using the following method:
- Specified directly
- Regular specified
- Fuzzy specified
Where axis=1 specifies the column name; Axis =0 specifies the index
Directions for use
Use case
1. Specify the attribute name directly
df.filter(items=["chinese"."score"]) # column name operation
Copy the code
Specify row indexes directly
df.filter(items=[2.4],axis=0) # line filter
Copy the code
2. Specify by re
df.filter(regex='a',axis=1) The column name contains
Copy the code
df.filter(regex='^s',axis=1) # column names start with s
Copy the code
df.filter(regex='e$',axis=1) # the column name ends with e
Copy the code
df.filter(regex='3 $',axis=0) The # row index contains 3
Copy the code
3. Vague designation
df.filter(like='s',axis=1) The column name contains s
Copy the code
df.filter(like='2',axis=0) The # row index contains 2
Copy the code
# specify both the column name and index
df.filter(regex='^a',axis=1).filter(like='2',axis=0)
Copy the code
Where and mask functions
The where and mask functions are opposites, yielding exactly the opposite result:
- Where: Retrieve data that meets the requirements. If data that does not meet the requirements is displayed as NaN
- Mask: The data that does not meet the requirements is displayed as NaN
Both methods can set NaN values to the data we specify
Where the use of
s = df["score"]
s
Copy the code
# where: if the condition is met, NaN is displayed
s.where(s>=600)
Copy the code
We can assign values to data that do not meet the requirements:
# We can assign values that are not satisfied
s.where(s>=610.600) # Assign 600 if the condition is not met
Copy the code
Take a look at the results of the two groups:
The WHERE function can also specify multiple conditions:
# return True if the condition is met, False if the condition is not met
df.where((df.sex=='male') & (df.math > 125))
Copy the code
Select the data we want:
df[(df.where((df.sex=='male') & (df.math > 125)) == df).name]
# df [(df) where ((df) sex = = 'male') & (df) math > 125)) = = df). Sex]
Copy the code
The mask function
The mask function gets the opposite of where
S.ask (s>=600) # where (s>=600Copy the code
s.mask(s>=610.600) # Assign 600 if the condition is not met
Copy the code
The mask function accepts multiple conditions:
The value is the opposite of where
df[(df.mask((df.sex=='male') & (df.math > 125)) == df).sex]
Copy the code
conclusion
There are a wide variety of ways to fetch numbers in Pandas. There are too many tricks to get the data we want, and sometimes there are different ways to get the same data. This article focuses on the expression and 5 functions to get the number, the next article will focus on 3 pairs of functions to filter data methods.