An overview,
In fact, Excel merge this need, should be a very common need. Today we will use Python to complete the Excel merge (split) operation, as follows:
- ① Merge multiple Excel sheets into one Excel (there is only one sheet in each Excel);
- ② Merge multiple Excel sheets into one Excel (each Excel contains more than one sheet);
- ③ Merge multiple sheets in an Excel table and save them in the same Excel;
- ④ Divide an Excel table into multiple tables according to a certain column;
- Click the link to join the Group chat
Two, knowledge point explanation
In fact, there are too many knowledge points involved in completing these operations, so before we talk about this knowledge point, we will take you to review some common knowledge points.
- (1) Common knowledge points of OS module;
- The pandas module is used for reading.
- ③ Explain the common knowledge points of xlsxwriter module;
- ④ XLRD commonly used knowledge points to explain;
- Click the link to join the Group chat
1. Explain the OS module
Os.walk (), os.path.join() and so on.
1.1 OS. Walk ()
For this knowledge, we need to explain the following points:
- The return value of os.walk() is a generator that we need to loop through to retrieve its contents;
- Each traversal returns a triple (path, dirs, files);
- Path: Returns the path address of the folder being traversed;
- Dirs: returns the names of all directories in this folder (excluding subdirectories). The number of directories is returned as “list”.
- Files: Returns all files in this folder (excluding files in subdirectories). The number of files is returned as “list”.
Let’s say I have a folder like this.
What do we get with the code below?
pwd = "G:\\a"
print(os.walk(pwd))
for i in os.walk(pwd):
print(i)
for path,dirs,files in os.walk(pwd):
print(files)```
123456
Copy the code
The results are as follows:
<generator object walk at 0x0000029BB5AEAB88>
('G:\\a', [], ['aa.txt', 'bb.xlsx', 'cc.txt', 'dd.docx'])
['aa.txt', 'bb.xlsx', 'cc.txt', 'dd.docx']
123
Copy the code
1.2 OS. Path. The join ()
This function, which is used to combine multiple paths and return, is super simple and won’t be explained too much.
path1 = 'G:\\a'
path2 = 'aa.txt'
print(os.path.join(path1,path2))
123
Copy the code
The results are as follows:
G:\a\aa.txt
1
Copy the code
2. I like pandas
Pandas is used to merge Excel, so we need to learn how to use Pandas to merge data vertically.
Let’s create two dataframes:
0 0 import numpy as np xx = np.arange(15).0 (5,3) yy = np.arange(1,16).0 (5,3) xx = 0 Pd. DataFrame (xx, the columns = [" language ", "mathematics", "foreign language"]) yy = pd. The DataFrame (yy, columns = [" language ", "mathematics", "foreign language"]) print (xx) print (yy) 1234567Copy the code
The effect is as follows:
The vertical concatenation can then be done using the concat() function in Pandas.
-
In pd.concat(list) [default axis=0] the default is vertical merge of data;
-
Pd. concat(list) parentheses pass in a list;
-
Ignore_list =True: ignores original indexes and generates a new set of indexes.
-
Or z = pd.concat([xx,yy],ignore_list=True);
concat_list = [] concat_list.append(xx) concat_list.append(yy) z = pd.concat(concat_list,ignore_list=True) print(z) 12345
The effect is as follows:
3. Explain the xlsxWriter module
The XlsxWriter module is used together with the XLRD module. Xlsxwriter: writes data, and XLRD: reads data. Next, we introduce the common uses of each library.
1) How to create a “workbook”?
Import xlsxwriter # this step creates a new "workbook"; # "demo. XLSX "file does not exist, indicating new" workbook "; # "demo. XLSX "file exists, indicating that a new" workbook "overwrites the original" workbook "; Workbook = xlsxwriter. workbook ("demo.xlsx") # close workbook.close() 123456789Copy the code
2) How to add a “Sheet”
We know that an Excel file is an Excel workbook, and in each workbook, there are many “sheets”. Now, how do we do this in code?
Import xlsxwriter workbook = xlsxwriter.workbook ("cc.xlsx") worksheet = workbook.add_worksheet("2018 sales volume ") workbook.close() 12345Copy the code
The effect is as follows:
3) How do I insert data into the table?
Import xlsxwriter # create a workbook named demo. XLSX; Workbook = xlsxwriter. workbook ("demo. XLSX ") Worksheet = workbook.add_worksheet("2018年 sales ") # add a header to worksheet("2018年 sales "); Keywords/keywords = [' product ',' sales volume '," unit price "] Keywords/keywords keywords = "keywords ",500,8.9) worksheet.write(1,i,data[i]) workbook.close() 123456789101112131415161718Copy the code
The effect is as follows:
4. Explain the XLRD module
Here is a workbook “test.xlsx”, in which there are two “Sheet worksheets” named “2018 Sales volume” and “2019 Sales Volume”, as shown in the figure.
1) How to open a workbook? – > open_workbook ()
Load the table into memory and open it. Import XLRD file = r"G:\Jupyter\test.xlsx" xlrd.open_workbook(file) 12345Copy the code
The results are as follows:
<xlrd.book.Book at 0x29bb8e4eda0>
1
Copy the code
2) How do I get all the “Sheet” names in a workbook? – > sheet_names ()
import xlrd
file = r"G:\Jupyter\test.xlsx"
fh = xlrd.open_workbook(file)
fh.sheet_names()
1234
Copy the code
The results are as follows:
['2018 Sales volume ', '2019 Sales Volume '] 1Copy the code
3) How to obtain the object list of all “sheets”? – > sheets ()
import xlrd
file = r"G:\Jupyter\test.xlsx"
fh = xlrd.open_workbook(file)
fh.sheets()
1234
Copy the code
The results are as follows:
[<xlrd.sheet.Sheet at 0x29bb8f07a90>, <xlrd.sheet.Sheet at 0x29bb8ef1390>]
1
Copy the code
We can use the index to get the object of each sheet, and then we can operate on each of those objects.
fh.sheets()[0]
<xlrd.sheet.Sheet at 0x29bb8f07a90>
fh.sheets()[1]
<xlrd.sheet.Sheet at 0x29bb8ef1390>
1234
Copy the code
4) How to obtain the number of rows and columns in each Sheet? – > Nrows and nCOLs properties
Import XLRD file = r"G:\Jupyter\test.xlsx" fh = xlrd.open_workbook(file) fh.sheets() fH.sheets ()[0].nrows # 4 fh.sheets()[0].ncols # result: 3 fh.sheets()[1].nrows # result: 4 fH.sheets ()[1].ncols # result: 3 12345678Copy the code
5) Obtain data from each Sheet by row — >row_values()
import xlrd
file = r"G:\Jupyter\test.xlsx"
fh = xlrd.open_workbook(file)
sheet1 = fh.sheets()[0]
for row in range(fh.sheets()[0].nrows):
value = sheet1.row_values(row)
print(value)
1234567
Copy the code
The effect is as follows:
Iii. Case description
1. Merge multiple Excel sheets into one Excel (there is only one sheet in each Excel)
There are four tables, and I don’t want to explain them too much.
The implementation code is as follows:
import pandas as pd
import os
pwd = "G:\\b"
df_list = []
for path,dirs,files in os.walk(pwd):
for file in files:
file_path = os.path.join(path,file)
df = pd.read_excel(file_path)
df_list.append(df)
result = pd.concat(df_list)
print(result)
result.to_excel('G:\\b\\result.xlsx',index=False)
123456789101112
Copy the code
The results are as follows:
2. Merge multiple Excel sheets into one Excel (each Excel contains more than one sheet)
There are two workbooks, as shown. One workbook is pp.xlsx and the other is Q.xlsx. Sheet1 and Sheet2 worksheets are available in the workbook PP. XLSX. Sheet1 and Sheet2 worksheets are also available under Qq.xlsX.
The implementation code is as follows:
Click the link to join the Group chat
Import XLRD import xlsxwriter import OS # Open an Excel file and create a workbook object def open_xlsx(file): Fh =xlrd.open_workbook(file) return fh # def get_sheet_num(fh): X = len(fh.sheets()) return x # def get_file_content(file,shnum): fh=open_xlsx(file) table=fh.sheets()[shnum] num=table.nrows for row in range(num): rdata=table.row_values(row) datavalue.append(rdata) return datavalue def get_allxls(pwd): allxls = [] for path,dirs,files in os.walk(pwd): for file in files: Append (os.path.join(path,file)) return allxls # Datavalue = [] PWD = "G:\\d" for fl in get_allxls(PWD): Fh = open_xlsx(fl) x = get_sheet_num(fH) for shnum in range(x): print(" + STR (FL)+" "+ STR (shnum)+"... ) rvalue = get_file_content(fl,shnum) # Define the new file generated after the final merge endFile = "G:\\d\\ concate.xlsx "wb1= xlsxwriter.workbook (endfile) # Create a sheet object ws=wb1.add_worksheet() for a in range(len(rvalue)): for b in range(len(rvalue[a])): C =rvalue[a][b] ws.write(a,b,c) wb1.close() print(" file merge complete ") 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152Copy the code
The effect is as follows:
3. Merge multiple sheets in an Excel table and save them in the same Excel
The implementation code is as follows:
Click the link to join the Group chat
import xlrd import pandas as pd from pandas import DataFrame from openpyxl import load_workbook excel_name = r"D:\pp.xlsx" wb = xlrd.open_workbook(excel_name) sheets = wb.sheet_names() alldata = DataFrame() for i in range(len(sheets)): df = pd.read_excel(excel_name, sheet_name=i, index=False, encoding='utf8') alldata = alldata.append(df) writer = pd.ExcelWriter(r"C:\Users\Administrator\Desktop\score.xlsx",engine='openpyxl') book = load_workbook(writer.path) Writer. book = book # must have the above two lines, if not, will delete the rest of the sheet, Only the final merged sheet alldata.to_excel(excel_writer=writer,sheet_name=" alldata ") writer.save() writer.close() 12345678910111213141516171819202122Copy the code
The effect is as follows:
4. Divide an Excel table into multiple tables according to a certain column
Click the link to join the Group chat
The implementation code is as follows:
Click the link to join the Group chat
import pandas as pd import xlsxwriter data=pd.read_excel(r"C:\Users\Administrator\Desktop\chaifen.xlsx",encoding='gbk') ExcelWriter(r"C:\Users\Administrator\Desktop\ XLSX ",engine='xlsxwriter'); area_list=list(set(data[' store '])); Data. To_excel (writer,sheet_name=" total table ",index=False) for j in area_list: Df =data[data[' store ']==j] df.to_excel(writer,sheet_name=j,index=False) writer.save() # "Dismantled table" will display 1234567891011121314Copy the code
The effect is as follows: