Public account: You and the cabin by: Peter Editor: Peter

Hello, I’m Peter

Concat + Join +append

In my previous article, I covered the use of merge, one of the most commonly used functions in pandas. In this article, I covered three other functions related to merges:

  • concat
  • join
  • append

Pandas serial

This is the 15th article in the Pandas database.

Simulated data

The first is to simulate several different pieces of data:

import pandas as pd
import numpy as np
Copy the code

concat

Concat is also a common merge function. Here are some examples to illustrate its use.

parameter

pandas.concat(objs,  # merge object
              axis=0.# merge direction, default is 0 vertical direction
              join='outer'.Select outer from inner or union outer
              ignore_index=False.Whether the index is re-indexed after the merge
              keys=None.Add the name of the original data in the direction of the row index. Mainly used for hierarchical indexing, can be any list or array, tuple data or list array
              levels=None.Specifies the index to be used as the level of the hierarchical index, if keys is set
              names=None.The name of the row index, in the form of a list
              verify_integrity=False.Check whether the row index is duplicated. There is an error
              sort=False.# Sort the disconnected axes
              copy=True   Whether to make a deep copy
             )
Copy the code

The default

The default is to merge directly vertically

axis

Specify the direction of the merge

If a data box does not exist, NaN is displayed:

  • Adjust the merge direction based on the actual data. The default is Axis =0
  • Data that does not exist in a database is replaced by NaN

Parameter ignore_index

The default value is reserved. If the value is True, the system automatically adds natural indexes.

Parameters of the join

Specifies whether to get the intersection of inner or union outer, which is the default

Df3 and df4 have the same field only as the address field, so keep it, discard others:

Parameters of the keys

The keys parameter is invalid when we set index reorder (ignore_index=True)

Parameter name

Specify the name of each level index

We can check the index of DF6 and find that it is a hierarchical index:

Merge multiple Dataframes

Merge DF1, DF2, and DF5 at the same time

pd.concat([pd.concat([df1,df2],axis=0,ignore_index=True),df5],axis=1)
Copy the code

It can be achieved in two steps: merge DF1 and DF2 first, and merge the results with DF5

join

parameter

Parameter description from the official website:

dataframe.join(other,  # Another data box to merge
        on=None.# connect key
        how='left'.'left', 'right', 'outer', 'inner' default is left
        lsuffix=' '.# suffixes for the same key in the left (first) data box
        rsuffix=' '.The suffix for the key of the second data box
        sort=False)  Whether to sort by join key; The default False
Copy the code

Simulated data

To explain the operation of join, simulate the data again:

Parameters lsuffix and rsuffix

The function is to add the specified suffix

If not specified, an error is reported:

How parameters

The how parameter defaults to left, leaving all fields on the left. NaN is used for data that does not exist on the right

After changing to right, keep all data on the right side:

In the default parameter result, the name field is divided into name_left and name_right.

  1. Think of the key as a row index
  2. Merge via JOIN
  3. Resets the index with reset_index()

Merge two data:

The last function to reset the index:

There’s an easier way:

Merge multiple Dataframes

Join dataframes to join multiple dataframes:

What if we want to use the merge function?

Use how=”outer” to preserve data for all fields

append

The literal meaning is “add to”. Adds a new row to a Dataframe object. If the column name is not already in the dataframe object, it will be added as a new column

parameter

DataFrame.append(other, 
                 ignore_index=False, 
                 verify_integrity=False, 
                 sort=False)
Copy the code

Parameter Description:

  • Other: data to be merged. This can be a DataFrame or series in pandas, or a dictionary or list in Python
  • Ignore_index: Whether to ignore the original index and generate a new natural number index
  • Verify_integrity: The default is False. If True, creating the same index will throw an exception
  • Sort: Boolean, default is None. If the columns of self and other are not aligned, the columns are sorted and the attributes appear only in version 0.23.0.

Simulated data

Add different types of data

1. Python dictionary

2. Series type

The most commonly used DataFrame

The default merge

Df12 and DF13 merge by default:

Parameter ignore_index

Change the generated index value

Parameter verify_integrity

The default value is False. If True, creating the same index will throw an exception

Case of actual combat

Suppose there are three sheets in an Excel table: Order table, Order Item table, and Item information table:

1. Order form

2. List of goods ordered

3. Commodity information sheet

Now we associate the contents of the three sheets with a merge function:

import pandas as pd
import numpy as np

Read the contents of the order table
df1 = pd.read_excel("Fruit order commodity information 3 tables. XLSX",sheet_name=0)  # the contents of the first sheet, indexed from 0
df1
Copy the code

Read the order list
df2 = pd.read_excel("Fruit order commodity information 3 tables. XLSX",sheet_name=1)

# Commodity information sheet
df3 = pd.read_excel("Fruit order commodity information 3 tables. XLSX",sheet_name="Commodity Information")  You can specify the name of the sheet, not the index
df3
Copy the code

Step 1: Merge order table and order item list

Step 2: Merge the above results with the product information table

When we get the above results, we can complete many requirements. Here are two examples:

1. Sales volume and order number of different fruits: The quantity and order number are counted according to the groups of fruits

2. Fruit sales and number of customers in different regions

conclusion

Two articles on pandas’ data merging detail the use of four functions: Merge, concat, Join, and Append. Merge and concat are the most commonly used. They can be functions that implement joins in SQL. Whether it is intersection, union, left and right join, or even full join can be realized directly.

The actual case data above is stored in an Excel table. In PANDAS, we can merge data from a variety of sources: Excel, databases, local folders, etc.