This is the 14th day of my participation in the August More Text Challenge. For details, see: August More Text Challenge
In the previous two articles, regardless of the type of the join table or the type of the join key, the two tables to be joined always have at least one column whose contents are identical. Of course, the column names may not be the same. If the column names are the same, it is a common column and you can just take the join table. If not, use the connection key. If there are no columns of the same content, the only way to do this is to join.
Table joins are inner, outer, left, and right
The schematic diagram of the four connection modes is as follows:
The connection mode is specified by the How parameter
For the sake of illustration, assume that the two tables to operate on are A and B
The two tables used in this article are as follows:
League tables
Student Information Sheet
Let’s first look at the data situation for two tables
Import pandas as pd score_df = pd.read_excel(r 'c :\Users\admin\ desktop.xlsx ') print(score_df)Copy the code
result:
Student Id name score ranking 0 0 Xiaoming 90 3 1 1 Xiaogang 89 4 2 2 Xiaohong 98 1 3 3 Xiaolan 63 7 4 4 Li Hua 95 2 5 5 Zhang SAN 23 9 6 6 Li Si 56 8 7 7 Wang Wu 85 5 8 8 Xiao Zhang 72 6Copy the code
Stu_df = pd.read_excel(r'C:\Users\admin\Desktop\ XLSX ') print(stu_df)Copy the code
result:
0 Grade 3 Class 2 1 1 Grade 5 Class 3 2 2 Grade 6 Class 1 3 9 Grade 6 Class 1Copy the code
1. Inner connection
An inner join is the intersection of two tables, namely A∩B
Score_df = pd.read_excel(r 'c :\Users\admin\ desktop.xlsx ') stu_df = Pd.read_excel (r'C:\Users\admin\Desktop\ student info table. XLSX ') new_df = pd.merge(score_df, stu_df, how='inner') print(new_df)Copy the code
result:
Student Id Name Result Rank Grade Class 0 xiaoming 90 3 Grade 3 Class 2 11 Xiaogang 89 4 Grade 5 Class 3 2 2 Xiahong 98 1 Grade 6 Class 1Copy the code
The results only show the corresponding grades of the students in the student information table. Neither the record of student number 9 in the student information table nor the record of Xiaolan, Li Hua and others in the student information table are displayed.
(2) outer connection
An internal join is A union of two tables, A∪B
Score_df = pd.read_excel(r 'c :\Users\admin\ desktop.xlsx ') stu_df = Pd.read_excel (r'C:\Users\admin\Desktop\ students.xlsx ') new_df = pd.merge(score_df, stu_df, how='outer') print(new_df)Copy the code
result:
Student No name Result Rank Grade Class 0 0 Xiaoming 90.0 3.0 Class 2, Grade 3 1 1 Xiaogang 89.0 4.0 Class 3, Grade 5 2 2 Xiaohong 98.0 1.0 Class 1, Grade 63 3 Xiaolan 63.0 7.0 NaN NaN 4 4 Li Hua 95.0 2.0 NaN NaN 5 5 zhang SAN 23.0 9.0 NaN NaN 6 6 Li Si 56.0 8.0 NaN NaN 7 7 Wang Wu 85.0 5.0 NaN NaN 8 8 Xiao Zhang 72.0 6.0 NaN NaN 9 9 NaN NaN NaN Class One, Grade sixCopy the code
The results will show information for all students, even if they don’t have grades (e.g. Xiao LAN, Li Hua, etc.). Similarly, for the score sheet, it will be displayed even if no student matches it (for example, records with student number 9).
3. Left connection
The left connection is to get A∪(A∩B)
Score_df = pd.read_excel(r 'c :\Users\admin\ desktop.xlsx ') stu_df = Pd.read_excel (r'C:\Users\admin\Desktop\ students.xlsx ') new_df = pd.merge(score_df, stu_df, how='left') print(new_df)Copy the code
result:
Student No. Name Result Rank Grade Class 0 0 Xiaoming 90 3 Grade 3 Class 2 1 Xiaogang 89 4 Grade 5 Class 3 2 2 Xiahong 98 1 Grade 6 Class 1 3 Xiaolan 63 7 NaN NaN 4 4 Li Hua 95 2 NaN NaN 5 5 Zhang 3 23 9 NaN NaN 6 6 Li 4 56 8 NaN NaN 7 7 Wang 5 85 5 NaN NaN 8 8 Xiao Zhang 72 6 NaN NaNCopy the code
The results show all the data from the score sheet, regardless of whether the student information sheet has a matching record. The record for school number 9 in the student information sheet is not displayed because the corresponding record is not found in the score sheet
4. Right connection
The right connection is going to get B∪(A∩B)
Score_df = pd.read_excel(r 'c :\Users\admin\ desktop.xlsx ') stu_df = Pd.read_excel (r'C:\Users\admin\ desktop.xlsx ') new_df = pd.merge(score_df, stu_df, how='right') print(new_df)Copy the code
result:
Student No. Name Result Rank Grade Class 0 0 Xiaoming 90.0 3.0 Class 2, Grade 3 1 1 Xiaogang 89.0 4.0 Class 3, Grade 5 2 2 Xiaohong 98.0 1.0 Class 1, Grade 6 3 9 NaN NaN NaN Class 1, Grade 6Copy the code
The result displays all the data from the student information table, regardless of whether the score table has a matching record. However, xiaolan and Li Hua were not shown because they did not find their corresponding records in the student information sheet