Pandas high-order uses the -transform method

The transform() method is used in pandas

The official explanation

DataFrame.transform(self, func, axis=0, * * kwargs) - > * args'DataFrame'[source]
Call func on self producing a DataFrame with transformed values.

Produced DataFrame will have same axis length as self.
Copy the code

funcfunction, str, list or dict Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

- function
Copy the code

 

string function name


list of functions and/or function names, e.g. [np.exp. 'sqrt']

dict of axis labels -> functions, function names or list of such.

axis

{0 or ‘index’, 1 or ‘columns’}, default 0 If 0 or’ index ‘: apply function to each column. If 1 or’ columns’ : apply function to each row.

*args

Positional arguments to pass to func.

**kwargs

Keyword arguments to pass to func.

Returns: DataFrame

A DataFrame that must have the same length as self.

Raises: ValueError

If the returned DataFrame has a different length than self.

import numpy as np 
import pandas as pd
Copy the code

The transform method

The characteristics of

The transform method is usually used in conjunction with the groupby method

Generates a scalar value and broadcasts it into the size data of each grouping
Transform produces an object of the same size as the input
Transform cannot change its input

df = pd.DataFrame({
    "key": ["a"."b"."c"] * 4.    "values":np.arange(12.0)
})
df
Copy the code

	key	values
0	a	0.0
1	b	1.0
2	c	2.0
3	a	3.0
4	b	4.0
5	c	5.0
6	a	6.0
7	b	7.0
8	c	8.0
9	a	9.0
10	b	10.0
11	c	11.0

grouping

g = df.groupby("key").values  # Group and average
g.mean()
Copy the code

Key a 4.5B 5.5C 6.5 Name: values, dType: FLOAT64Copy the code

The transform using

Each position is replaced by the mean

Passing anonymous functions

g.transform(lambda x:x.mean())
Copy the code

0     4.5
1     5.5
2     6.5
3     4.5
4     5.5
5     6.5
6     4.5
7     5.5
8     6.5
9     4.5
10    5.5
11    6.5
Name: values, dtype: float64
Copy the code

Pass the function string alias in the AGG method

The built-in aggregate function passes the alias name directly, Max \min\sum\mean

g.transform("mean")
Copy the code

0     4.5
1     5.5
2     6.5
3     4.5
4     5.5
5     6.5
6     4.5
7     5.5
8     6.5
9     4.5
10    5.5
11    6.5
Name: values, dtype: float64
Copy the code

Tranform is used with functions that return S

g.transform(lambda x:x * 2)
Copy the code

0 0.0 12.0 2 4.0 3 6.0 4 8.0 5 10.0 6 12.0 7 14.0 8 16.0 9 18.0 10 20.0 11 22.0 Name: VALUES, DTYPE: FLOAT64Copy the code

Descending rank

g.transform(lambda x:x.rank(ascending=False))
Copy the code

0     4.0
1     4.0
2     4.0
3     3.0
4     3.0
5     3.0
6     2.0
7     2.0
8     2.0
9     1.0
10    1.0
11    1.0
Name: values, dtype: float64
Copy the code

Similar to Apply

Pass functions directly into TranForm

def normalize(x):
    return (x - x.mean()) / x.std()

g.transform(normalize)   
Copy the code

0-1.161895 1-1.161895 2-1.161895 3-0.387298 4-0.387298 5-0.387298 6 0.387298 7 0.387298 8 0.387298 9 1.161895 10 1.161895 11 1.161895 Name: values, dType: FLOAT64Copy the code

g.apply(normalize)  # result same as above
Copy the code

0-1.161895 1-1.161895 2-1.161895 3-0.387298 4-0.387298 5-0.387298 6 0.387298 7 0.387298 8 0.387298 9 1.161895 10 1.161895 11 1.161895 Name: values, dType: FLOAT64Copy the code

Normalized instance

normalized = (df["values"] - g.transform("mean")) / g.transform("std") # The built-in aggregate function is passed directly
normalized
Copy the code

0-1.161895 1-1.161895 2-1.161895 3-0.387298 4-0.387298 5-0.387298 6 0.387298 7 0.387298 8 0.387298 9 1.161895 10 1.161895 11 1.161895 Name: values, dType: FLOAT64Copy the code

The official instance

df = pd.DataFrame({'A': range(3), 'B': range(1.4)})
df
Copy the code

	A	B
0	0	1
1	1	2
2	2	3

df.transform(lambda x:x+1)  +1 for each element
Copy the code

	A	B
0	1	2
1	2	3
2	3	4

s = pd.Series(range(3))
s
Copy the code

0    0
1    1
2    2
dtype: int64
Copy the code

s.transform([np.sqrt, np.exp])  Pass in the function
Copy the code

	sqrt	exp
0	0.000000	1.000000
1	1.000000	2.718282
2	1.414214	7.389056

Understanding the Transform Function in Pandas

There is a complete example on this site that explains the use of the Transform method

The original data

To solve the problem

You can see in the data that the file contains 3 different orders (10001, 10005 and 10006) and that each order consists has multiple products (aka skus).

The question we would like to answer is: “What percentage of The order total does each sku represent?”

For example, if we look at order 10001 with a total of $576.12, the break down would be:

B1-20000 = $235.83 or 40.9%
S1-27722 = $232.32 or 40.3%
B1-86481 = $107.97 or 18.7%

Figure out the percentage of each item’s price in the order

The traditional method

Calculate the proportion of a column and merge it with the original data

import pandas as pd
df = pd.read_excel("sales_transactions.xlsx")

df.groupby('order') ["ext price"].sum()

order 10001 576.12 10005 8185.49 10006 3724.49 Name: ext price, dtype: float64 Copy the code

order_total = df.groupby('order') ["ext price"].sum().rename("Order_Total").reset_index()  # Add the value of the Order_Total column attribute
df_1 = df.merge(order_total)    # Merge raw df and ORDER_total data
df_1["Percent_of_Order"] = df_1["ext price"] / df_1["Order_Total"]  # add Percent_of_Order
Copy the code

The use of the transform

Transform + groupby: Group and then sum

Graphic transform