Introduction to the

Pandas provides many powerful functions for combining Series and Dataframe for easy data analysis. This article will explain how to use Pandas to merge Series and Dataframe in detail.

Using the concat

Concat is the most commonly used method to merge DFS.

pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None,
          levels=None, names=None, verify_integrity=False, copy=True)
Copy the code

Let’s take a look at some of the parameters that we often use:

Objs is a Series or a sequence or mapping of Series.

Axis Specifies the axis of the connection.

Join: {‘ inner ‘, ‘outer’} join: {‘ inner ‘, ‘outer’} join: {‘ inner ‘, ‘outer’}

Ignore_index: ignores the original index value, using 0,1… N minus 1.

Copy: Indicates whether to copy.

Keys: Specifies the index of the outermost multi-level structure.

Let’s define a few DFS and see how we can concatenate them:

In [1]: df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], ... : 'B': ['B0', 'B1', 'B2', 'B3'], ... : 'C': ['C0', 'C1', 'C2', 'C3'], ... : 'D': ['D0', 'D1', 'D2', 'D3']}, ... : index=[0, 1, 2, 3]) ... : In [2]: df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], ... : 'B': ['B4', 'B5', 'B6', 'B7'], ... : 'C': ['C4', 'C5', 'C6', 'C7'], ... : 'D': ['D4', 'D5', 'D6', 'D7']}, ... : index=[4, 5, 6, 7]) ... : In [3]: df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'], ... : 'B': ['B8', 'B9', 'B10', 'B11'], ... : 'C': ['C8', 'C9', 'C10', 'C11'], ... : 'D': ['D8', 'D9', 'D10', 'D11']}, ... : index=[8, 9, 10, 11]) ... : In [4]: frames = [df1, df2, df3] In [5]: result = pd.concat(frames)Copy the code

Df1, DF2 and DF3 define the same column names and different indexes, and then put them in frames to form a DF list, which can be passed into concat as a parameter to merge DF.

Take a multi-tiered example:

In [6]: result = pd.concat(frames, keys=['x', 'y', 'z'])
Copy the code

Use keys to specify keys for different frames in frames.

When used, we can return a specific frame by selecting an external key:

In [7]: result.loc['y']
Out[7]: 
    A   B   C   D
4  A4  B4  C4  D4
5  A5  B5  C5  D5
6  A6  B6  C6  D6
7  A7  B7  C7  D7
Copy the code

In the above example, the axis of the join is 0 by default, that is, join by row. Let’s look at an example of join by column. If you want to join by column, you can specify axis=1:

In [8]: df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'], ... : 'D': ['D2', 'D3', 'D6', 'D7'], ... : 'F': ['F2', 'F3', 'F6', 'F7']}, ... : index=[2, 3, 6, 7]) ... : In [9]: result = pd.concat([df1, df4], axis=1, sort=False)Copy the code

The default join=’outer’ and NaN will be added where index does not exist after the merge.

Let’s look at a join=’inner’ :

In [10]: result = pd.concat([df1, df4], axis=1, join='inner')
Copy the code

Join =’inner’ only selects the same index for display.

If we want to save the index of the original frame after the merge, we can use reindex:

In [11]: result = pd.concat([df1, df4], axis=1).reindex(df1.index)
Copy the code

Or this:

In [12]: pd.concat([df1, df4.reindex(df1.index)], axis=1)
Out[12]: 
    A   B   C   D    B    D    F
0  A0  B0  C0  D0  NaN  NaN  NaN
1  A1  B1  C1  D1  NaN  NaN  NaN
2  A2  B2  C2  D2   B2   D2   F2
3  A3  B3  C3  D3   B3   D3   F3
Copy the code

Take a look at the results:

You can combine DF and Series:

In [18]: s1 = pd.Series(['X0', 'X1', 'X2', 'X3'], name='X')

In [19]: result = pd.concat([df1, s1], axis=1)
Copy the code

If there are multiple Series, use concat to specify column names:

In [23]: s3 = pd.Series([0, 1, 2, 3], name='foo')

In [24]: s4 = pd.Series([0, 1, 2, 3])

In [25]: s5 = pd.Series([0, 1, 4, 5])
Copy the code

In [27]: pd.concat([s3, s4, s5], axis=1, keys=['red', 'blue', 'yellow'])
Out[27]: 
   red  blue  yellow
0    0     0       0
1    1     1       1
2    2     2       4
3    3     3       5
Copy the code

Use append

Append can be seen as a simplified version of concat, concat along Axis =0:

In [13]: result = df1.append(df2)
Copy the code

NaN is automatically completed if the columns of two DFS in append are different:

In [14]: result = df1.append(df4, sort=False)
Copy the code

If ignore_index=True, we can ignore the original index and rewrite the assigned index:

In [17]: result = df1.append(df4, ignore_index=True, sort=False)
Copy the code

Add a Series to DF append:

In [35]: s2 = pd.Series(['X0', 'X1', 'X2', 'X3'], index=['A', 'B', 'C', 'D'])

In [36]: result = df1.append(s2, ignore_index=True)
Copy the code

Use the merge

The closest analogy to DF is database tables. You can use merge to merge DF operations similar to database operations.

Let’s look at the definition of merge:

pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
         left_index=False, right_index=False, sort=True,
         suffixes=('_x', '_y'), copy=True, indicator=False,
         validate=None)
Copy the code

Left and right are the two DFS or Series to merge.

On represents the column or index name of the join.

Left_on: left connection

Right_on: connect right

Left_index: After joining, choose to use the left index or column.

Right_index: After joining, choose to use the index or column on the right.

How: Connection mode, ‘left’, ‘right’, ‘outer’, ‘inner’. Inner by default.

Sort: indicates whether to sort.

Suffixes: Handle duplicate columns.

Copy: Indicates whether to copy data

Let’s start with a simple merge example:

In [39]: left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'], .... : 'A': ['A0', 'A1', 'A2', 'A3'], .... : 'B': ['B0', 'B1', 'B2', 'B3']}) .... : In [40]: right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'], .... : 'C': ['C0', 'C1', 'C2', 'C3'], .... : 'D': ['D0', 'D1', 'D2', 'D3']}) .... : In [41]: result = pd.merge(left, right, on='key')Copy the code

The above two DFS are connected by key.

Let’s look at a more than one key join example:

In [42]: left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'], .... : 'key2': ['K0', 'K1', 'K0', 'K1'], .... : 'A': ['A0', 'A1', 'A2', 'A3'], .... : 'B': ['B0', 'B1', 'B2', 'B3']}) .... : In [43]: right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'], .... : 'key2': ['K0', 'K0', 'K0', 'K0'], .... : 'C': ['C0', 'C1', 'C2', 'C3'], .... : 'D': ['D0', 'D1', 'D2', 'D3']}) .... : In [44]: result = pd.merge(left, right, on=['key1', 'key2'])Copy the code

How can be used to specify merge mode, like a database, can be used to specify inner join, outer join, etc:

Combined methods	SQL methods
`left`	`LEFT OUTER JOIN`
`right`	`RIGHT OUTER JOIN`
`outer`	`FULL OUTER JOIN`
`inner`	`INNER JOIN`

In [45]: result = pd.merge(left, right, how='left', on=['key1', 'key2'])
Copy the code

Specify indicator=True to indicate specific line connection mode:

In [60]: df1 = pd.DataFrame({'col1': [0, 1], 'col_left': ['a', 'b']})

In [61]: df2 = pd.DataFrame({'col1': [1, 2, 2], 'col_right': [2, 2, 2]})

In [62]: pd.merge(df1, df2, on='col1', how='outer', indicator=True)
Out[62]: 
   col1 col_left  col_right      _merge
0     0        a        NaN   left_only
1     1        b        2.0        both
2     2      NaN        2.0  right_only
3     2      NaN        2.0  right_only
Copy the code

If you pass an indicator a string, rename the indicator column:

In [63]: pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column') Out[63]: Col1 col_left COL_right indicator_column 0 0 A NaN left_only 1 1 B 2.0 Both 2 NaN 2.0 right_only 3 2 NaN 2.0 right_onlyCopy the code

Merge multiple indexes:

In [112]: leftindex = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'), .....: ('K1', 'X2')], ..... : names=['key', 'X']) ..... : In [113]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], ..... : 'B': ['B0', 'B1', 'B2']}, ..... : index=leftindex) ..... : In [114]: rightindex = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), .....: ('K2', 'Y2'), ('K2', 'Y3')], ..... : names=['key', 'Y']) ..... : In [115]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], ..... : 'D': ['D0', 'D1', 'D2', 'D3']}, ..... : index=rightindex) ..... : In [116]: result = pd.merge(left.reset_index(), right.reset_index(), ..... : on=['key'], how='inner').set_index(['key', 'X', 'Y'])Copy the code

Support merging of multiple columns:

In [117]: left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1') In [118]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], ..... : 'B': ['B0', 'B1', 'B2', 'B3'], ..... : 'key2': ['K0', 'K1', 'K0', 'K1']}, ..... : index=left_index) ..... : In [119]: right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1') In [120]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], ..... : 'D': ['D0', 'D1', 'D2', 'D3'], ..... : 'key2': ['K0', 'K0', 'K0', 'K1']}, ..... : index=right_index) ..... : In [121]: result = left.merge(right, on=['key1', 'key2'])Copy the code

Using the join

Join merges two DFS with different indexes into one. It’s short for merge.

In [84]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], .... : 'B': ['B0', 'B1', 'B2']}, .... : index=['K0', 'K1', 'K2']) .... : In [85]: right = pd.DataFrame({'C': ['C0', 'C2', 'C3'], .... : 'D': ['D0', 'D2', 'D3']}, .... : index=['K0', 'K2', 'K3']) .... : In [86]: result = left.join(right)Copy the code

You can specify how to specify the connection mode:

In [87]: result = left.join(right, how='outer')
Copy the code

The default join is index.

You can also join by column:

In [91]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], .... : 'B': ['B0', 'B1', 'B2', 'B3'], .... : 'key': ['K0', 'K1', 'K0', 'K1']}) .... : In [92]: right = pd.DataFrame({'C': ['C0', 'C1'], .... : 'D': ['D0', 'D1']}, .... : index=['K0', 'K1']) .... : In [93]: result = left.join(right, on='key')Copy the code

Join single index and multiple indexes:

In [100]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], ..... : 'B': ['B0', 'B1', 'B2']}, ..... : index=pd.Index(['K0', 'K1', 'K2'], name='key')) ..... : In [101]: index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), .....: ('K2', 'Y2'), ('K2', 'Y3')], ..... : names=['key', 'Y']) ..... : In [102]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], ..... : 'D': ['D0', 'D1', 'D2', 'D3']}, ..... : index=index) ..... : In [103]: result = left.join(right, how='inner')Copy the code

Duplicate column names:

In [122]: left = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'v': [1, 2, 3]})

In [123]: right = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'v': [4, 5, 6]})

In [124]: result = pd.merge(left, right, on='k')
Copy the code

You can customize the naming rules for repeating column names:

In [125]: result = pd.merge(left, right, on='k', suffixes=('_l', '_r'))
Copy the code

Overwrite the data

Sometimes we need to use DF2 data to fill DF1 data, we can use combine_first:

In [131] : df1 = pd DataFrame ([[np. Nan, (3), 5.], [4.6, np. Nan, np. Nan],... : [np. Nan, 7., np, nan]])... : In [132] : df2 = pd. DataFrame ([[42.6, np. Nan, 8.2], [5., 1.6, 4]],... : index=[1, 2]) ..... :Copy the code

In [133]: result = df1.combine_first(df2)
Copy the code

Or use update:

In [134]: df1.update(df2)
Copy the code

This article is available at www.flydean.com/04-python-p…

The most popular interpretation, the most profound dry goods, the most concise tutorial, many tips you didn’t know waiting for you to discover!

Welcome to pay attention to my public number: “procedures those things”, understand technology, more understand you!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Pandas Advanced tutorial: Merging dataframes

Introduction to the

Using the concat

Use append

Use the merge

Using the join

Overwrite the data

Pandas Advanced tutorial: Merging dataframes

Introduction to the

Using the concat

Use append

Use the merge

Using the join

Overwrite the data

Related Posts

LeetCode removes duplicates from sorted arrays

Semaphore

What exactly do machine learning practitioners do?