Pandas handles common functions

Simulated data set:

Boolean =[True,False] gender=[" male "," female "] color=["white","black","yellow"] data= pd.dataframe ({gender=[" male "," female "] color=["white","black","yellow"] "Height" : np. Random. Randint (150190, 10), "weight" : np. Random. The randint (40,90,10), Smoker :[Boolean [x] for x in np.random.randint(0,2,10)], "gender":[gender[x] for x in np.random.randint(0,2,10)], smoker :[Boolean [x] for x in np.random.randint(0,2,10)], "Age ":np.random. Randint (15,90,10), "color":[color[x] for x in np.random. Randint (0,len(color),10)]})Copy the code

1. map

Replace male with 1 and female with 0 in the gender column, which is easy to do using series.map ().

# 1. Use a dictionary mapping data (" gender ") = data [r]. "gender" map ({" male ": 1," female ": 0}) # (2) using the function def gender_map (x) : Data ["gender"] = data["gender"] = data["gender"].map(gender_map)Copy the code

Regardless of whether a dictionary or a function is used for mapping, the map method takes the corresponding data one by one as arguments to the dictionary or function to obtain the mapped value.

2. apply

(1) at the same time, Series object also has the apply method. The operation principle of the Apply method is similar to that of the Map method, but the difference is that apply can pass in functions with more complex functions.

Data ["age"] = data["age"]. Apply (apply_age,args=(-3,))Copy the code

(2). Apply is a very important data processing method for DataFrame. In DataFrame, axis=0 means that the operation is performed on the columns and axis=1 means that the operation is performed on the row.

Def index (series) def index (series) def index (series) def index (series) def index (series) def index (series) weight = series["weight"] height = series["height"]/100 BMI = weight/height**2 return BMI data["BMI"] = data.apply(BMI,axis=1)Copy the code
  • No matteraxis=0oraxis=1, which passes in the default form of the specified functionSeries.

3. applymap

Applymap is simple to use, performing the specified function on each cell in the DataFrame.

4. groupby

group = data.groupby("gender")
Copy the code

Groupby divides the original dataframes into several group Dataframes according to the groupby field. So,a series of operations after groupby (agG, Transform,apply, etc.) are all subdataframe based operations.

(1). agg

Aggregation operations are common after groupBY. Common aggregation operations are as follows:

  • min
  • max
  • sum
  • meam
  • median
  • std
  • var
  • count

For example, to calculate the average age and median weight of male and female employees, a dictionary can be used to specify aggregate operations:

data.groupby('gender').agg({'weight':'median','age':'mean'})
Copy the code

(2). transform

If you now need to add a column avg_age to the original data set to represent the average age of male and female employees (employees of the same sex have the same average age) :

data['avg_age'] = data.groupby('gender')['age'].transform('mean')
Copy the code

(3). apply

For apply after groupby, the subdataframe is passed as parameter to the specified function. The basic operation unit is DataFrame, while the basic operation unit of apply is Series.

Get data on the oldest male and female employees:

data.groupby('gender',as_index=False).apply(lambda x : x.sort_values(by='age', ascending=False).iloc[0,:])
Copy the code

Get the top two ages of male and female employees:

data['age'].groupby(data['gender']).apply(lambda x : x.sort_values(ascending=False)[:2]).reset_index()
Copy the code