Photo from online article/Yi Zhu
Pandas provides map, Apply, and ApplyMap for most of the row-by-row, column, and element-by-element operations that can be performed on a DataFrame. In this article, we will introduce the three methods in detail by using case studies and illustrations. I believe that beginners will have a better understanding of the three methods after reading this article. \
The data set demonstrated in this article is simulated and can be generated by following the code below if you want to practice.
boolean=[True,False]
gender=["Male"."Female"]
color=["white"."black"."yellow"]
data=pd.DataFrame({
"height":np.random.randint(150.190.100),
"weight":np.random.randint(40.90.100),
"smoker":[boolean[x] for x in np.random.randint(0.2.100)]."gender":[gender[x] for x in np.random.randint(0.2.100)]."age":np.random.randint(15.90.100),
"color":[color[x] for x in np.random.randint(0.len(color),100)]})Copy the code
The data set is shown below, with columns representing height, weight, smoking, sex, age, and skin color.
Series data processing
map
What if I wanted to replace male with 1 and female with 0 in the gender column of the data set? Definitely not a for loop!! This is easy to do with Series.map() and requires at least one line of code.
# 1 use a dictionary to map data["gender"] = data["gender"].map({"Male":1."Female":0}) #② Def gender_map(x): gender =1 if x == "Male" else 0
returnGender # Notice that the function name is passed in without parentheses data["gender"] = data["gender"].map(gender_map)
Copy the code
So how does map work in practice? Take a look at the diagram below (only the first 10 data points have been captured for ease of presentation)
Regardless of whether a dictionary or a function is used for mapping, the map method takes the corresponding data one by one as arguments to the dictionary or function to obtain the mapped value.
apply
Meanwhile, the Series object also has the Apply method. The apply method works in a similar way to the Map method, except that apply can pass in more complex functions. How do you understand that? Take a look at the following example.
Hypothesis in the process of data statistics, the age, the age has great error, need to be adjustment (plus or minus one value), as a result of this plus or minus the value of the unknown, so, when defined function, need to add one more parameter, this time with the map method cannot be operation (into the map function can only receive a parameter). The Apply method solves this problem.
def apply_age(x,bias):
returnX +bias # Pass the extra argument data as a tuple ["age"] = data["age"].apply(apply_age,args=(- 3)),Copy the code
\
You can see that the age column has all been subtracted by 3. Of course, this is just a simple example, and applies better when you need to do more complex processing.
In summary, for Series, Map can solve most data processing requirements, but if you need to use more complex functions, you need to use the Apply method.
DataFrame Data processing
apply
Apply is a very important data processing method for DataFrame. It can accept a wide variety of functions (both Python built-in and custom) and handle them flexibly. Here are a few examples to see how Apply works.
Before we get into the details, we need to introduce the concept of Axis in DataFrame. In most methods of a DataFrame object, axis is a parameter that controls whether you specify an operation along axis 0 or 1. Axis =0 means that the operation is performed on the columns and axis=1 means that the operation is performed on the row, as shown in the figure below.
If you’re not familiar with this, I’ll explain how apply operates along axis 0 and 1, respectively.
Let’s say we need to take logarithms and sum numeric columns of data, respectively. In this case, we can use Apply to perform corresponding operations. Since we are operating on columns, we need to specify Axis =0.
# along0Axis sum data [["height"."weight"."age"]].apply(np.sum, axis=0) # along0Axial logarithm data[["height"."weight"."age"]].apply(np.log, axis=0)
Copy the code
The implementation is simple, but what exactly happens when Apply is called? How is the process implemented? Let’s look at it graphically. (Take the first five data as an example)
\
When operating along axis 0 (axis=0), columns will be passed in the form of Series as a parameter to the operation function you specify by default. After the operation, the corresponding result will be merged and returned.
If you need to operate by row in real use (axis=1), how does the whole process work?
In the data set, we have the data of height and weight, so according to this, we can calculate the BMI index of each person (an index often used in physical examination, and an important standard to measure the degree of obesity and health of human body), and the calculation formula is as follows: Body mass index (BMI) = weight/height squared (international unit kg/㎡). Since each sample needs to be operated, apply with Axis =1 is used for operation, and the code is as follows:
def BMI(series):
weight = series["weight"]
height = series["height"] /100
BMI = weight/height**2
return BMI
data["BMI"] = data.apply(BMI,axis=1)
Copy the code
Again, take a graphical look at how this process works (for example, the previous 5 data points).
When Apply operates on rows with Axis =1, by default each row is passed to the specified function in the form of Series (whose index is the column name) and the corresponding result is returned.
To summarize the apply operation on a DataFrame:
- when
axis=0
When theEach column of the columns
Execute the specified function; whenaxis=1
When theEach row
Executes the specified function. - No matter
axis=0
oraxis=1
, which passes in the default form of the specified functionSeries
, can be setraw=True
The incomingNumpy array
. - After executing the results for each Series, the results are consolidated and returned (required when defining the function to have a return value)
return
Corresponding value) - Of course,
DataFrame
theapply
andSeries
theapply
It can also accept more complex functions, such as passing parameters, etc. The implementation principle is the same, see the official documentation for details.
applymap
Applymap is a simple application that performs a specified function on each cell in the DataFrame. Although applyMap is not as useful as Apply, it can be useful in some situations, such as the following example.
For demonstration purposes, a new DataFrame is generated
df = pd.DataFrame(
{
"A":np.random.randn(5),
"B":np.random.randn(5),
"C":np.random.randn(5),
"D":np.random.randn(5),
"E":np.random.randn(5),
}
)
df
Copy the code
Now you want to display all the values in the DataFrame as two decimal digits. Using ApplyMap you can quickly achieve this. The code and illustration are as follows:
df.applymap(lambda x:"%.2f" % x)
Copy the code
Data processing three plate axe is introduced here, welcome to the message board below the question positive message ah!
Click become:Community registered members ** like the article, click the **Looking at the