The groupby function in Python is used to group data and perform calculations within groups.

Ipython returns the memory address of the DataFrameGroupBy object after groupby. DataFrameGroupBy can be converted to another type.

Groupby divides the original dataframes into several group Dataframes according to the groupby field.

1. Function after groupby

1.1 the describe ()

Describes the basic statistics of intra-group data

The unstack() index is rearranged

1.2 AGG Aggregation Operations:

The aggregate operations can be used for summing, averaging, Max, min, etc. The table below lists the common aggregate operations in Pandas.

Function Function Min Min Max Max sum Mean Mean Median STD Standard deviation var Variance count Count

As_index =False Keeps the original data index result unchanged

First () keeps the first data

Tail(n=1) Saves the last n data

1.3 the transform

The difference between Transform and AGG is that AGG will calculate the mean values corresponding to company A, B and C and directly return them. But for Transform, the corresponding results will be obtained for each piece of data and the samples in the same group will have the same value. After calculating the mean values in the group, the results will be returned in the order of the original index. If you don’t understand, you can compare this picture with the AGG one.

1.4 the apply

  1. The apply and the transform

Let’s start with the similarities and differences between apply() and transform()

Similarities:

Both can compute features against dataframe and are often used in conjunction with the groupby() method.

Difference:

Apply () can be used with custom functions, including simple summation functions and complex difference functions between features. (Note: Apply cannot directly use python’s built-in agg()/transform() functions, such as sum, Max, min, ‘count’, etc.)

There is no way to interact with custom feature functions inside transform(), because the transform really evaluates each element (that is, each column of feature operations). There are three things to remember when using the transform() method:

1. It only evaluates each column, so after groupby(), before.transform() specifies the column to operate on, which is also very different from Apply.

2. Since it can only be calculated for each column, the generality of the method is much limited compared with apply(), for example, it can only calculate the maximum/minimum/mean/variance/box splitting of columns

3. What else does transform do? The simplest case is to try to assign the result of a function back to the original Dataframe. That is, the shape returned is (len(df), 1). Note: If used in conjunction with the groupby() method, the value needs to be de-duplicated