Hello, I’m Peter
Here I wish everyone a happy Qixi ❤️
The three treasure functions in Pandas
When manipulating data in Pandas, it is often necessary to perform the same operation on a row or column of data, or even on all elements.
Map, Apply, and ApplyMap in Pandas address most of these data processing needs without having to redo them. This article combines concrete examples to show you how to use the three treasure trove functions.
Pandas serialized articles
Seventeen articles have been serialized, of which 1-16 are part of the first edition of Pandas Data Analysis in Depth, starting with pivottables and crossheets in Article 17.
Simulated data
The following is the data of the main simulation, the personal information of several students (the data is for study only)
import numpy as np
import pandas as pd
data = pd.DataFrame({"name": ["Xiao Ming"."Little red"."Zhang"."Chou"."Note"."Wang"]."sex": ["Male"."Female"."Female"."Male"."Male"."Male"]."birthday": ["2003-07-07"."1993-08-09"."1999-03-05"."1995-08-19"."2002-11-18"."1996-07-01"]."address": ["Nanshan District, Shenzhen"."Guangzhou Yuexiu District"."Hangzhou, Zhejiang"."Shanghai"."Beijing Haidian"."Wuchang, Wuhan city, Hubei Province"]."age": [18.28.22.26.19.25]."height": [189.178.167.172.182.185]."weight": [89.72.62.68.79.81]
})
data
Copy the code
Let’s look at the data types: the first four are characters, followed by three numeric types
map
Let’s say we now have a requirement to replace the male column with a 1 and the female column with a 0 because sometimes we have to use numerical data.
How do you do that?
Method 1: loop
If you don’t want to get too complicated: figure out the gender of a man or a woman through a loop, and then assign it directly.
Before each operation, we make a copy of the simulated data without destroying the original data
Write a loop to assign:
Method 2: Map implementation
The way to loop is easier to understand, easier to write; But when we have too much data, the loop is too slow. How is this implemented using Map?
Or Sir Into a copy:
1. Mapping through the dictionary
2. Write a function to map
Map uses summary: if you pass a dictionary or function to the map method, it will pass the data as arguments to the dictionary or function one by one, and then get the value of the map
apply
The apply method is similar to map in its use, except that apply is more comprehensive and powerful, and can pass in more complex functions.
parameter
DataFrame.apply(
func, # Function to be executed
axis=0.The default is 0-index,1-column
raw=False.Numpy = ndarray; default: false
result_type=None.# expand ', 'reduce', 'broadcast', None # When Axis =1
args=(), # Two optional arguments
**kwargs)
Copy the code
Make a copy of data3 before the operation is performed:
Pass in different functions
In the apply method we can pass in various functions:
- Custom function
- Anonymous Functions in Python
- Python built-in Functions
- Functions in Pandas
1. Custom functions
We pass in the custom function: the function above that changes the gender representation
So let’s say that all of these ages are real years, and we want to look at their imaginary years, so we’re adding 1 to age plus 1, what do we do?
2. Anonymous Python functions: lambda
3, Python built-in functions
We pass in python’s len function, which evaluates the length of each string:
4. Pandas has built-in functions
When we simulated the data, the birthday field was a character type, and now we use pandas’ built-in function to convert it to a time related data type:
Before the
After the transformation
Specify the shaft
The axis argument can be specified to indicate which axis to operate on. The default is Axis =0, in column direction
To illustrate this parameter, we are simulating a simple data:
From the default, axis=0, axis=1:
Look at this in Excel:
result_type
The column name of the newly generated DataFrame can only be operated on axis=1. There are three values:
- expand
- broadcast
- reduce
Result_type =”expand”
Result_type =”broadcast”;
The column names remain unchanged
Result_type =”reduce”
The result is a Series of data
applymap
The use of ApplyMap is limited by the fact that it performs the same operation for each piece of data in the DataFrame.
DataFrame.applymap(func, na_action=None, **kwargs)
Copy the code
To illustrate, simulate a simple piece of data:
As you can see, the data above is all of type FLOAT64.
Each of these numbers is increased by 1
Three decimal places are reserved for each data
Format output data: Retain three decimal places for each data
Changing the data type
The data type df is float64, now convert them to STR:
Missing value processing
If there is a missing value in the data, use the na_action parameter to handle it:
conclusion
It is very common to perform the same operation on the rows and columns of data. This article combines various examples to explain:
- Map: Satisfies the same operation for most Series data
- Apply: Map functions can be implemented, more flexible, can pass a variety of complex or built-in functions for data processing
- Applymap: Performs the same operation on the data in the DataFrame