Pandas handles common functions
Simulated data set:
Boolean =[True,False] gender=[" male "," female "] color=["white","black","yellow"] data= pd.dataframe ({gender=[" male "," female "] color=["white","black","yellow"] "Height" : np. Random. Randint (150190, 10), "weight" : np. Random. The randint (40,90,10), Smoker :[Boolean [x] for x in np.random.randint(0,2,10)], "gender":[gender[x] for x in np.random.randint(0,2,10)], smoker :[Boolean [x] for x in np.random.randint(0,2,10)], "Age ":np.random. Randint (15,90,10), "color":[color[x] for x in np.random. Randint (0,len(color),10)]})Copy the code
1. map
Replace male with 1 and female with 0 in the gender column, which is easy to do using series.map ().
# 1. Use a dictionary mapping data (" gender ") = data [r]. "gender" map ({" male ": 1," female ": 0}) # (2) using the function def gender_map (x) : Data ["gender"] = data["gender"] = data["gender"].map(gender_map)Copy the code
Regardless of whether a dictionary or a function is used for mapping, the map method takes the corresponding data one by one as arguments to the dictionary or function to obtain the mapped value.
2. apply
(1) at the same time, Series object also has the apply method. The operation principle of the Apply method is similar to that of the Map method, but the difference is that apply can pass in functions with more complex functions.
Data ["age"] = data["age"]. Apply (apply_age,args=(-3,))Copy the code
(2). Apply is a very important data processing method for DataFrame. In DataFrame, axis=0 means that the operation is performed on the columns and axis=1 means that the operation is performed on the row.
Def index (series) def index (series) def index (series) def index (series) def index (series) def index (series) weight = series["weight"] height = series["height"]/100 BMI = weight/height**2 return BMI data["BMI"] = data.apply(BMI,axis=1)Copy the code
- No matter
axis=0
oraxis=1
, which passes in the default form of the specified functionSeries
.
3. applymap
Applymap is simple to use, performing the specified function on each cell in the DataFrame.
4. groupby
group = data.groupby("gender")
Copy the code
Groupby divides the original dataframes into several group Dataframes according to the groupby field. So,a series of operations after groupby (agG, Transform,apply, etc.) are all subdataframe based operations.
(1). agg
Aggregation operations are common after groupBY. Common aggregation operations are as follows:
- min
- max
- sum
- meam
- median
- std
- var
- count
For example, to calculate the average age and median weight of male and female employees, a dictionary can be used to specify aggregate operations:
data.groupby('gender').agg({'weight':'median','age':'mean'})
Copy the code
(2). transform
If you now need to add a column avg_age to the original data set to represent the average age of male and female employees (employees of the same sex have the same average age) :
data['avg_age'] = data.groupby('gender')['age'].transform('mean')
Copy the code
(3). apply
For apply after groupby, the subdataframe is passed as parameter to the specified function. The basic operation unit is DataFrame, while the basic operation unit of apply is Series.
Get data on the oldest male and female employees:
data.groupby('gender',as_index=False).apply(lambda x : x.sort_values(by='age', ascending=False).iloc[0,:])
Copy the code
Get the top two ages of male and female employees:
data['age'].groupby(data['gender']).apply(lambda x : x.sort_values(ascending=False)[:2]).reset_index()
Copy the code