How many axes do you know?

Photo from online article/Yi Zhu

Pandas provides map, Apply, and ApplyMap for most of the row-by-row, column, and element-by-element operations that can be performed on a DataFrame. In this article, we will introduce the three methods in detail by using case studies and illustrations. I believe that beginners will have a better understanding of the three methods after reading this article. \

The data set demonstrated in this article is simulated and can be generated by following the code below if you want to practice.

boolean=[True,False]
gender=["Male"."Female"]
color=["white"."black"."yellow"]
data=pd.DataFrame({
    "height":np.random.randint(150.190.100),
    "weight":np.random.randint(40.90.100),
    "smoker":[boolean[x] for x in np.random.randint(0.2.100)]."gender":[gender[x] for x in np.random.randint(0.2.100)]."age":np.random.randint(15.90.100),
    "color":[color[x] for x in np.random.randint(0.len(color),100)]})Copy the code

The data set is shown below, with columns representing height, weight, smoking, sex, age, and skin color.

Series data processing

map

What if I wanted to replace male with 1 and female with 0 in the gender column of the data set? Definitely not a for loop!! This is easy to do with Series.map() and requires at least one line of code.

# 1 use a dictionary to map data["gender"] = data["gender"].map({"Male":1."Female":0}) #② Def gender_map(x): gender =1 if x == "Male" else 0
    returnGender # Notice that the function name is passed in without parentheses data["gender"] = data["gender"].map(gender_map)
Copy the code

So how does map work in practice? Take a look at the diagram below (only the first 10 data points have been captured for ease of presentation)

Regardless of whether a dictionary or a function is used for mapping, the map method takes the corresponding data one by one as arguments to the dictionary or function to obtain the mapped value.

apply

Meanwhile, the Series object also has the Apply method. The apply method works in a similar way to the Map method, except that apply can pass in more complex functions. How do you understand that? Take a look at the following example.

Hypothesis in the process of data statistics, the age, the age has great error, need to be adjustment (plus or minus one value), as a result of this plus or minus the value of the unknown, so, when defined function, need to add one more parameter, this time with the map method cannot be operation (into the map function can only receive a parameter). The Apply method solves this problem.

def apply_age(x,bias):
    returnX +bias # Pass the extra argument data as a tuple ["age"] = data["age"].apply(apply_age,args=(- 3)),Copy the code

You can see that the age column has all been subtracted by 3. Of course, this is just a simple example, and applies better when you need to do more complex processing.

In summary, for Series, Map can solve most data processing requirements, but if you need to use more complex functions, you need to use the Apply method.

DataFrame Data processing

apply

Apply is a very important data processing method for DataFrame. It can accept a wide variety of functions (both Python built-in and custom) and handle them flexibly. Here are a few examples to see how Apply works.

Before we get into the details, we need to introduce the concept of Axis in DataFrame. In most methods of a DataFrame object, axis is a parameter that controls whether you specify an operation along axis 0 or 1. Axis =0 means that the operation is performed on the columns and axis=1 means that the operation is performed on the row, as shown in the figure below.

If you’re not familiar with this, I’ll explain how apply operates along axis 0 and 1, respectively.

Let’s say we need to take logarithms and sum numeric columns of data, respectively. In this case, we can use Apply to perform corresponding operations. Since we are operating on columns, we need to specify Axis =0.

# along0Axis sum data [["height"."weight"."age"]].apply(np.sum, axis=0) # along0Axial logarithm data[["height"."weight"."age"]].apply(np.log, axis=0)
Copy the code

The implementation is simple, but what exactly happens when Apply is called? How is the process implemented? Let’s look at it graphically. (Take the first five data as an example)

When operating along axis 0 (axis=0), columns will be passed in the form of Series as a parameter to the operation function you specify by default. After the operation, the corresponding result will be merged and returned.

If you need to operate by row in real use (axis=1), how does the whole process work?

In the data set, we have the data of height and weight, so according to this, we can calculate the BMI index of each person (an index often used in physical examination, and an important standard to measure the degree of obesity and health of human body), and the calculation formula is as follows: Body mass index (BMI) = weight/height squared (international unit kg/㎡). Since each sample needs to be operated, apply with Axis =1 is used for operation, and the code is as follows:

def BMI(series):
    weight = series["weight"]
    height = series["height"] /100
    BMI = weight/height**2
    return BMI

data["BMI"] = data.apply(BMI,axis=1)
Copy the code

Again, take a graphical look at how this process works (for example, the previous 5 data points).

When Apply operates on rows with Axis =1, by default each row is passed to the specified function in the form of Series (whose index is the column name) and the corresponding result is returned.

To summarize the apply operation on a DataFrame:

whenaxis=0When theEach column of the columnsExecute the specified function; whenaxis=1When theEach rowExecutes the specified function.
No matteraxis=0oraxis=1, which passes in the default form of the specified functionSeries, can be setraw=TrueThe incomingNumpy array.
After executing the results for each Series, the results are consolidated and returned (required when defining the function to have a return value)returnCorresponding value)
Of course,DataFrametheapplyandSeriestheapplyIt can also accept more complex functions, such as passing parameters, etc. The implementation principle is the same, see the official documentation for details.

applymap

Applymap is a simple application that performs a specified function on each cell in the DataFrame. Although applyMap is not as useful as Apply, it can be useful in some situations, such as the following example.

For demonstration purposes, a new DataFrame is generated

df = pd.DataFrame(
    {
        "A":np.random.randn(5),
        "B":np.random.randn(5),
        "C":np.random.randn(5),
        "D":np.random.randn(5),
        "E":np.random.randn(5),
    }
)
df
Copy the code

Now you want to display all the values in the DataFrame as two decimal digits. Using ApplyMap you can quickly achieve this. The code and illustration are as follows:

df.applymap(lambda x:"%.2f" % x)
Copy the code

Data processing three plate axe is introduced here, welcome to the message board below the question positive message ah!

Click become:Community registered members ** like the article, click the **Looking at the

Related Posts

A study of STD :: IO ::Error in Rust

Docker docker-compose: docker-compose

Python Advanced: Teaching with graphics and videos!!