Introduction to the
Pandas provides many powerful functions for combining Series and Dataframe for easy data analysis. This article will explain how to use Pandas to merge Series and Dataframe in detail.
Using the concat
Concat is the most commonly used method to merge DFS.
pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None,
levels=None, names=None, verify_integrity=False, copy=True)
Copy the code
Let’s take a look at some of the parameters that we often use:
Objs is a Series or a sequence or mapping of Series.
Axis Specifies the axis of the connection.
Join: {‘ inner ‘, ‘outer’} join: {‘ inner ‘, ‘outer’} join: {‘ inner ‘, ‘outer’}
Ignore_index: ignores the original index value, using 0,1… N minus 1.
Copy: Indicates whether to copy.
Keys: Specifies the index of the outermost multi-level structure.
Let’s define a few DFS and see how we can concatenate them:
In [1]: df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], ... : 'B': ['B0', 'B1', 'B2', 'B3'], ... : 'C': ['C0', 'C1', 'C2', 'C3'], ... : 'D': ['D0', 'D1', 'D2', 'D3']}, ... : index=[0, 1, 2, 3]) ... : In [2]: df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], ... : 'B': ['B4', 'B5', 'B6', 'B7'], ... : 'C': ['C4', 'C5', 'C6', 'C7'], ... : 'D': ['D4', 'D5', 'D6', 'D7']}, ... : index=[4, 5, 6, 7]) ... : In [3]: df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'], ... : 'B': ['B8', 'B9', 'B10', 'B11'], ... : 'C': ['C8', 'C9', 'C10', 'C11'], ... : 'D': ['D8', 'D9', 'D10', 'D11']}, ... : index=[8, 9, 10, 11]) ... : In [4]: frames = [df1, df2, df3] In [5]: result = pd.concat(frames)Copy the code
Df1, DF2 and DF3 define the same column names and different indexes, and then put them in frames to form a DF list, which can be passed into concat as a parameter to merge DF.
Take a multi-tiered example:
In [6]: result = pd.concat(frames, keys=['x', 'y', 'z'])
Copy the code
Use keys to specify keys for different frames in frames.
When used, we can return a specific frame by selecting an external key:
In [7]: result.loc['y']
Out[7]:
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
Copy the code
In the above example, the axis of the join is 0 by default, that is, join by row. Let’s look at an example of join by column. If you want to join by column, you can specify axis=1:
In [8]: df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'], ... : 'D': ['D2', 'D3', 'D6', 'D7'], ... : 'F': ['F2', 'F3', 'F6', 'F7']}, ... : index=[2, 3, 6, 7]) ... : In [9]: result = pd.concat([df1, df4], axis=1, sort=False)Copy the code
The default join=’outer’ and NaN will be added where index does not exist after the merge.
Let’s look at a join=’inner’ :
In [10]: result = pd.concat([df1, df4], axis=1, join='inner')
Copy the code
Join =’inner’ only selects the same index for display.
If we want to save the index of the original frame after the merge, we can use reindex:
In [11]: result = pd.concat([df1, df4], axis=1).reindex(df1.index)
Copy the code
Or this:
In [12]: pd.concat([df1, df4.reindex(df1.index)], axis=1)
Out[12]:
A B C D B D F
0 A0 B0 C0 D0 NaN NaN NaN
1 A1 B1 C1 D1 NaN NaN NaN
2 A2 B2 C2 D2 B2 D2 F2
3 A3 B3 C3 D3 B3 D3 F3
Copy the code
Take a look at the results:
You can combine DF and Series:
In [18]: s1 = pd.Series(['X0', 'X1', 'X2', 'X3'], name='X')
In [19]: result = pd.concat([df1, s1], axis=1)
Copy the code
If there are multiple Series, use concat to specify column names:
In [23]: s3 = pd.Series([0, 1, 2, 3], name='foo')
In [24]: s4 = pd.Series([0, 1, 2, 3])
In [25]: s5 = pd.Series([0, 1, 4, 5])
Copy the code
In [27]: pd.concat([s3, s4, s5], axis=1, keys=['red', 'blue', 'yellow'])
Out[27]:
red blue yellow
0 0 0 0
1 1 1 1
2 2 2 4
3 3 3 5
Copy the code
Use append
Append can be seen as a simplified version of concat, concat along Axis =0:
In [13]: result = df1.append(df2)
Copy the code
NaN is automatically completed if the columns of two DFS in append are different:
In [14]: result = df1.append(df4, sort=False)
Copy the code
If ignore_index=True, we can ignore the original index and rewrite the assigned index:
In [17]: result = df1.append(df4, ignore_index=True, sort=False)
Copy the code
Add a Series to DF append:
In [35]: s2 = pd.Series(['X0', 'X1', 'X2', 'X3'], index=['A', 'B', 'C', 'D'])
In [36]: result = df1.append(s2, ignore_index=True)
Copy the code
Use the merge
The closest analogy to DF is database tables. You can use merge to merge DF operations similar to database operations.
Let’s look at the definition of merge:
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=True,
suffixes=('_x', '_y'), copy=True, indicator=False,
validate=None)
Copy the code
Left and right are the two DFS or Series to merge.
On represents the column or index name of the join.
Left_on: left connection
Right_on: connect right
Left_index: After joining, choose to use the left index or column.
Right_index: After joining, choose to use the index or column on the right.
How: Connection mode, ‘left’, ‘right’, ‘outer’, ‘inner’. Inner by default.
Sort: indicates whether to sort.
Suffixes: Handle duplicate columns.
Copy: Indicates whether to copy data
Let’s start with a simple merge example:
In [39]: left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'], .... : 'A': ['A0', 'A1', 'A2', 'A3'], .... : 'B': ['B0', 'B1', 'B2', 'B3']}) .... : In [40]: right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'], .... : 'C': ['C0', 'C1', 'C2', 'C3'], .... : 'D': ['D0', 'D1', 'D2', 'D3']}) .... : In [41]: result = pd.merge(left, right, on='key')Copy the code
The above two DFS are connected by key.
Let’s look at a more than one key join example:
In [42]: left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'], .... : 'key2': ['K0', 'K1', 'K0', 'K1'], .... : 'A': ['A0', 'A1', 'A2', 'A3'], .... : 'B': ['B0', 'B1', 'B2', 'B3']}) .... : In [43]: right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'], .... : 'key2': ['K0', 'K0', 'K0', 'K0'], .... : 'C': ['C0', 'C1', 'C2', 'C3'], .... : 'D': ['D0', 'D1', 'D2', 'D3']}) .... : In [44]: result = pd.merge(left, right, on=['key1', 'key2'])Copy the code
How can be used to specify merge mode, like a database, can be used to specify inner join, outer join, etc:
Combined methods | SQL methods |
---|---|
left |
LEFT OUTER JOIN |
right |
RIGHT OUTER JOIN |
outer |
FULL OUTER JOIN |
inner |
INNER JOIN |
In [45]: result = pd.merge(left, right, how='left', on=['key1', 'key2'])
Copy the code
Specify indicator=True to indicate specific line connection mode:
In [60]: df1 = pd.DataFrame({'col1': [0, 1], 'col_left': ['a', 'b']})
In [61]: df2 = pd.DataFrame({'col1': [1, 2, 2], 'col_right': [2, 2, 2]})
In [62]: pd.merge(df1, df2, on='col1', how='outer', indicator=True)
Out[62]:
col1 col_left col_right _merge
0 0 a NaN left_only
1 1 b 2.0 both
2 2 NaN 2.0 right_only
3 2 NaN 2.0 right_only
Copy the code
If you pass an indicator a string, rename the indicator column:
In [63]: pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column') Out[63]: Col1 col_left COL_right indicator_column 0 0 A NaN left_only 1 1 B 2.0 Both 2 NaN 2.0 right_only 3 2 NaN 2.0 right_onlyCopy the code
Merge multiple indexes:
In [112]: leftindex = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'), .....: ('K1', 'X2')], ..... : names=['key', 'X']) ..... : In [113]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], ..... : 'B': ['B0', 'B1', 'B2']}, ..... : index=leftindex) ..... : In [114]: rightindex = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), .....: ('K2', 'Y2'), ('K2', 'Y3')], ..... : names=['key', 'Y']) ..... : In [115]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], ..... : 'D': ['D0', 'D1', 'D2', 'D3']}, ..... : index=rightindex) ..... : In [116]: result = pd.merge(left.reset_index(), right.reset_index(), ..... : on=['key'], how='inner').set_index(['key', 'X', 'Y'])Copy the code
Support merging of multiple columns:
In [117]: left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1') In [118]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], ..... : 'B': ['B0', 'B1', 'B2', 'B3'], ..... : 'key2': ['K0', 'K1', 'K0', 'K1']}, ..... : index=left_index) ..... : In [119]: right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1') In [120]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], ..... : 'D': ['D0', 'D1', 'D2', 'D3'], ..... : 'key2': ['K0', 'K0', 'K0', 'K1']}, ..... : index=right_index) ..... : In [121]: result = left.merge(right, on=['key1', 'key2'])Copy the code
Using the join
Join merges two DFS with different indexes into one. It’s short for merge.
In [84]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], .... : 'B': ['B0', 'B1', 'B2']}, .... : index=['K0', 'K1', 'K2']) .... : In [85]: right = pd.DataFrame({'C': ['C0', 'C2', 'C3'], .... : 'D': ['D0', 'D2', 'D3']}, .... : index=['K0', 'K2', 'K3']) .... : In [86]: result = left.join(right)Copy the code
You can specify how to specify the connection mode:
In [87]: result = left.join(right, how='outer')
Copy the code
The default join is index.
You can also join by column:
In [91]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], .... : 'B': ['B0', 'B1', 'B2', 'B3'], .... : 'key': ['K0', 'K1', 'K0', 'K1']}) .... : In [92]: right = pd.DataFrame({'C': ['C0', 'C1'], .... : 'D': ['D0', 'D1']}, .... : index=['K0', 'K1']) .... : In [93]: result = left.join(right, on='key')Copy the code
Join single index and multiple indexes:
In [100]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], ..... : 'B': ['B0', 'B1', 'B2']}, ..... : index=pd.Index(['K0', 'K1', 'K2'], name='key')) ..... : In [101]: index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), .....: ('K2', 'Y2'), ('K2', 'Y3')], ..... : names=['key', 'Y']) ..... : In [102]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], ..... : 'D': ['D0', 'D1', 'D2', 'D3']}, ..... : index=index) ..... : In [103]: result = left.join(right, how='inner')Copy the code
Duplicate column names:
In [122]: left = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'v': [1, 2, 3]})
In [123]: right = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'v': [4, 5, 6]})
In [124]: result = pd.merge(left, right, on='k')
Copy the code
You can customize the naming rules for repeating column names:
In [125]: result = pd.merge(left, right, on='k', suffixes=('_l', '_r'))
Copy the code
Overwrite the data
Sometimes we need to use DF2 data to fill DF1 data, we can use combine_first:
In [131] : df1 = pd DataFrame ([[np. Nan, (3), 5.], [4.6, np. Nan, np. Nan],... : [np. Nan, 7., np, nan]])... : In [132] : df2 = pd. DataFrame ([[42.6, np. Nan, 8.2], [5., 1.6, 4]],... : index=[1, 2]) ..... :Copy the code
In [133]: result = df1.combine_first(df2)
Copy the code
Or use update:
In [134]: df1.update(df2)
Copy the code
This article is available at www.flydean.com/04-python-p…
The most popular interpretation, the most profound dry goods, the most concise tutorial, many tips you didn’t know waiting for you to discover!
Welcome to pay attention to my public number: “procedures those things”, understand technology, more understand you!