The transform() method is used in pandas
The official explanation
DataFrame.transform(self, func, axis=0, * * kwargs) - > * args'DataFrame'[source]
Call func on self producing a DataFrame with transformed values.
Produced DataFrame will have same axis length as self.
Copy the code
- funcfunction, str, list or dict Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.
Accepted combinations are:
- function
Copy the code
-
string function name
-
list of functions and/or function names, e.g. [np.exp. 'sqrt']
dict of axis labels -> functions, function names or list of such.
- axis
{0 or ‘index’, 1 or ‘columns’}, default 0 If 0 or’ index ‘: apply function to each column. If 1 or’ columns’ : apply function to each row.
- *args
Positional arguments to pass to func.
- **kwargs
Keyword arguments to pass to func.
- Returns: DataFrame
A DataFrame that must have the same length as self.
- Raises: ValueError
If the returned DataFrame has a different length than self.
import numpy as np
import pandas as pd
Copy the code
The transform method
The characteristics of
The transform method is usually used in conjunction with the groupby method
- Generates a scalar value and broadcasts it into the size data of each grouping
- Transform produces an object of the same size as the input
- Transform cannot change its input
df = pd.DataFrame({
"key": ["a"."b"."c"] * 4. "values":np.arange(12.0)
})
df
Copy the code
key | values | |
---|---|---|
0 | a | 0.0 |
1 | b | 1.0 |
2 | c | 2.0 |
3 | a | 3.0 |
4 | b | 4.0 |
5 | c | 5.0 |
6 | a | 6.0 |
7 | b | 7.0 |
8 | c | 8.0 |
9 | a | 9.0 |
10 | b | 10.0 |
11 | c | 11.0 |
grouping
g = df.groupby("key").values # Group and average
g.mean()
Copy the code
Key a 4.5B 5.5C 6.5 Name: values, dType: FLOAT64Copy the code
The transform using
Each position is replaced by the mean
Passing anonymous functions
g.transform(lambda x:x.mean())
Copy the code
0 4.5
1 5.5
2 6.5
3 4.5
4 5.5
5 6.5
6 4.5
7 5.5
8 6.5
9 4.5
10 5.5
11 6.5
Name: values, dtype: float64
Copy the code
Pass the function string alias in the AGG method
The built-in aggregate function passes the alias name directly, Max \min\sum\mean
g.transform("mean")
Copy the code
0 4.5
1 5.5
2 6.5
3 4.5
4 5.5
5 6.5
6 4.5
7 5.5
8 6.5
9 4.5
10 5.5
11 6.5
Name: values, dtype: float64
Copy the code
Tranform is used with functions that return S
g.transform(lambda x:x * 2)
Copy the code
0 0.0 12.0 2 4.0 3 6.0 4 8.0 5 10.0 6 12.0 7 14.0 8 16.0 9 18.0 10 20.0 11 22.0 Name: VALUES, DTYPE: FLOAT64Copy the code
Descending rank
g.transform(lambda x:x.rank(ascending=False))
Copy the code
0 4.0
1 4.0
2 4.0
3 3.0
4 3.0
5 3.0
6 2.0
7 2.0
8 2.0
9 1.0
10 1.0
11 1.0
Name: values, dtype: float64
Copy the code
Similar to Apply
Pass functions directly into TranForm
def normalize(x):
return (x - x.mean()) / x.std()
g.transform(normalize)
Copy the code
0-1.161895 1-1.161895 2-1.161895 3-0.387298 4-0.387298 5-0.387298 6 0.387298 7 0.387298 8 0.387298 9 1.161895 10 1.161895 11 1.161895 Name: values, dType: FLOAT64Copy the code
g.apply(normalize) # result same as above
Copy the code
0-1.161895 1-1.161895 2-1.161895 3-0.387298 4-0.387298 5-0.387298 6 0.387298 7 0.387298 8 0.387298 9 1.161895 10 1.161895 11 1.161895 Name: values, dType: FLOAT64Copy the code
Normalized instance
normalized = (df["values"] - g.transform("mean")) / g.transform("std") # The built-in aggregate function is passed directly
normalized
Copy the code
0-1.161895 1-1.161895 2-1.161895 3-0.387298 4-0.387298 5-0.387298 6 0.387298 7 0.387298 8 0.387298 9 1.161895 10 1.161895 11 1.161895 Name: values, dType: FLOAT64Copy the code
The official instance
df = pd.DataFrame({'A': range(3), 'B': range(1.4)})
df
Copy the code
A | B | |
---|---|---|
0 | 0 | 1 |
1 | 1 | 2 |
2 | 2 | 3 |
df.transform(lambda x:x+1) +1 for each element
Copy the code
A | B | |
---|---|---|
0 | 1 | 2 |
1 | 2 | 3 |
2 | 3 | 4 |
s = pd.Series(range(3))
s
Copy the code
0 0
1 1
2 2
dtype: int64
Copy the code
s.transform([np.sqrt, np.exp]) Pass in the function
Copy the code
sqrt | exp | |
---|---|---|
0 | 0.000000 | 1.000000 |
1 | 1.000000 | 2.718282 |
2 | 1.414214 | 7.389056 |
Understanding the Transform Function in Pandas
There is a complete example on this site that explains the use of the Transform method
The original data
To solve the problem
You can see in the data that the file contains 3 different orders (10001, 10005 and 10006) and that each order consists has multiple products (aka skus).
The question we would like to answer is: “What percentage of The order total does each sku represent?”
For example, if we look at order 10001 with a total of $576.12, the break down would be:
- B1-20000 = $235.83 or 40.9%
- S1-27722 = $232.32 or 40.3%
- B1-86481 = $107.97 or 18.7%
Figure out the percentage of each item’s price in the order
The traditional method
Calculate the proportion of a column and merge it with the original data
import pandas as pd
df = pd.read_excel("sales_transactions.xlsx")
df.groupby('order') ["ext price"].sum()
order 10001 576.12 10005 8185.49 10006 3724.49 Name: ext price, dtype: float64 Copy the code
order_total = df.groupby('order') ["ext price"].sum().rename("Order_Total").reset_index() # Add the value of the Order_Total column attribute
df_1 = df.merge(order_total) # Merge raw df and ORDER_total data
df_1["Percent_of_Order"] = df_1["ext price"] / df_1["Order_Total"] # add Percent_of_Order
Copy the code
The use of the transform
Transform + groupby: Group and then sum