Use Pandas to implement group_concat for SQL

Public account: You and the cabin by: Peter Editor: Peter

Hello, I’m Peter

The group_concAT operation can be implemented in Pandas. The group_concat operation can be implemented in Pandas.

group_concat

What does group_concat do in SQL or MySQL? Look at an example.

Here is a simple copy of the data stored in the table Information, with two fields ID and name:

+----+-----+
| id | name |
+------+---+
|1  | 10   |
|1  | 20   |
|1  | 20   |
|2  | 20   |
|3  | 200  |
|3  | 500  |
Copy the code

Method 1: Default

We group by ID, place names on the same line and separate them with commas:

select 
    id
    ,group_concat(name)  as name
from information 
group by id;
Copy the code

The result is:

|id|  name|
|1 |10.20.20|
|2 |20      |
|3 |200.500|
Copy the code

Method 2: Specify symbols

The above results are separated by commas by default, and we can specify our own symbols:

select 
    id
    ,group_concat(name separator '; ')  as name
from information 
group by id;
Copy the code

The result is:

|id|  name|
|1 |10;20;20|   -- Separated by semicolons
|2 |20      |
|3 |200;500|
Copy the code

Method 3: Delete the display

We can also group by ID, remove redundant (duplicate) data, and put the rest together; For example, if id=1 repeats 20, we want to display only one 20:

The keyword distinct was added:

select 
    id
    ,group_concat(distinct name)  as name
from information 
group by id;
Copy the code

The corresponding results are shown as follows:

|id|  name|
|1 |10.20|   -- Only a 20 is displayed
|2 |20    |
|3 |200.500|
Copy the code

Mode 4: Arrange them in descending order

In all of the above cases, the data is sorted in ascending order, and we can also order in descending order:

select 
    id
    ,group_concat(name order by name desc)  as name  
from information 
group by id;
Copy the code

Then the result displayed is:

The results are already in descending order

|id|  name|
|1 |20.20.10|
|2 |20    |
|3 |500.200|
Copy the code

Pandas is used to implement group_concat.

Simulated data

import pandas as pd
import numpy as np
Copy the code

df = pd.DataFrame({
  "name": ["Xiao Ming"."Xiao Ming"."Xiao Ming"."Little red"."Zhang"."Zhang"]."score": [10.20.20.20.200.500]
})

df
Copy the code

It is clear that we need to group The score of Xiao Ming, Xiao Hong and Xiao Zhang together.

Mode 1: Default group

Achieve default grouping, ascending order and no weight. There are three main steps:

1. Groupby groupby

2. After grouping, put all scores in a list through list

3. The third step is just index rearrangement

Method 2: Specify symbols

To specify a specific symbol, we use the join function. Since this function can only operate on strings, we need to convert numeric data in df to strings:

df.astype(str)
Copy the code

Method 3: Delete the display

Group by name field, and then use unique function for score. I’m just going to rearrange it

Mode 4: Arrange them in descending order

1. Let’s start with the default ascending order

2. Use apply function again for score field, and use sorted function for list to achieve descending order

Dear friend, have you learned?

Use Pandas to implement group_concat for SQL

group_concat

Method 1: Default

Method 2: Specify symbols

Method 3: Delete the display

Mode 4: Arrange them in descending order

Simulated data

Mode 1: Default group

Method 2: Specify symbols

Method 3: Delete the display

Mode 4: Arrange them in descending order

Related Posts

Implementation of K-means clustering in Python Machine Learning Essays

Using PyTorch to construct neural network model for handwriting recognition

Text Analysis based on TF-IDF — SHORT message Recognition and classification