It’s Wednesday, the hardest day of the week

Big middle. It’s still hot today

In May, 36 degrees

A few words floated across the sky

It is best to learn pandas in the house

Groupy DataFrame with Index Levels and Columns

Indexes and columns are mixed into groups

Examples go, (don’t hurry to write examples, all don’t know how to explain)

import pandas as pd

arrays = [['bar'.'bar'.'baz'.'baz'.'foo'.'foo'.'qux'.'qux'],
          ['one'.'two'.'one'.'two'.'one'.'two'.'one'.'two']]

index = pd.MultiIndex.from_arrays(arrays=arrays,names=['first'.'second'])

df = pd.DataFrame({'A': [3.1.4.5.9.2.6.1].'B': [1.1.1.1.2.2.3.3]},index=index)


print(df)

Copy the code

There are examples, there are examples to show, right

              A  B
first second      
bar   one     3  1
      two     1  1
baz   one     4  1
      two     5  1
foo   one     9  2
      two     2  2
qux   one     6  3
      two     1  3
Copy the code

Next, the big trick to show the link

I’m going to group by the index of Second and column B

The code goes first, the effects come later

grouped = df.groupby([pd.Grouper(level=1),'B']).sum()
print(grouped)
Copy the code

Grouper(level=1) is the index of index second and that of index B columns

Is mainly for you to see clearly, how is the grouping calculated oh ~

Of course, you can also group by index name

df.groupby([pd.Grouper(level='second'), 'A']).sum()
Copy the code

The same effect as above

Or even, we could just abbreviate it

df.groupby(['second', 'A']).sum()
Copy the code

The grouped data can be selected partially or iterated

This part, we’ve already implemented

Take it out again and go over it

df = pd.DataFrame({'A': ['bar'.'bar'.'foo'.'foo'.'foo'.'foo'.'foo'].'B': ['one'.'two'.'one'.'two'.'one'.'two'.'three'].'C': [3.1.4.5.9.2.6].'D': [1.1.1.1.2.2.3]})


print(df)

grouped = df.groupby('A')

for name,group in grouped:
    print(name)
    print(group)
Copy the code

You can see that the group names are bar and foo

When iterating, just use the for in loop

bar A B C D 0 bar one 3 1 1 bar two 1 1 foo A B C D 2 foo one 4 1 3 foo two 5 1 4 foo one 9 2 5 foo two 2 2 6 foo three June 3Copy the code

Groupby ([‘A’,’B’])

It will naturally form a tuple name

bars = grouped.get_group('bar') # by group name
print(bars)
Copy the code

And the other one?

df.groupby(['A'.'B']).get_group(('bar'.'one'))
Copy the code

Oh, yeah, this way, it’s more accurate

It’s going to be a little bit more difficult. It’s going to be an aggregate function

First take a look at the built-in aggregate functions

sum(), mean(), max(), min(), count(), size(), describe()
Copy the code

There are only a few. That’s because I didn’t write them all

We’ve done this multiple times

And then we can look at a more advanced one

Customizable functions passed into agG methodsCopy the code

Let’s go back to the data

 	A      B  C  D
0  bar    one  3  1
1  bar    two  1  1
2  foo    one  4  1
3  foo    two  5  1
4  foo    one  9  2
5  foo    two  2  2
6  foo  three  6  3
Copy the code

Group by A and B. A has 2 values and B has 3 values, so 5 groups are formed after grouping

Keep your eyes open. Don’t blink. Operation is coming

grouped = df.groupby(['A'.'B'])
print(grouped.agg('mean'))
Copy the code

grouped = df.groupby(['A'.'B'])
print(grouped['C'].agg('mean'))
Copy the code

Continue thinking of conversion, give a single column of multiple aggregate functions

print(grouped['C'].agg(['mean'.'sum']))
Copy the code

Very impressive. There you go

Go ahead, and don’t be afraid to change the column name while doing multiple aggregation operations

print(grouped['C'].agg([('A'.'mean'), ('B'.'max')))Copy the code

Different columns apply different aggregation functions

print(grouped.agg({'C': ['sum'.'mean'].'D': ['min'.'max']}))
Copy the code

This is all agG’s work, I could go on

In groupby, you can change it to no index. Note that the core parameter as_index=False is added


grouped = df.groupby(['A'.'B'],as_index=False)

print(grouped.agg({'C': ['sum'.'mean'].'D': ['min'.'max']}))
Copy the code

The last operation, inside agG, is the aggregation function that you can use to customize

Generally, is this case, I, of course, can not exception

grouped = df.groupby('A')

def max_min(group):
    return group.max()-group.min()

print(grouped.agg(max_min))
Copy the code

Agg (custom function)

This place custom functions, also support lambda oh ~

Confused, confused is ok, take the phone

Get it here. Get it here