background
The story goes that a friend of mine who had more than ten Excel files to merge gave him the idea to do so. But then I thought, what if one of my friends needs to do this again and again and miss a date with my little sister? No, you can’t be like me. So study hard today for a better tomorrow
In actual combat
“Requirement” merges the data of all files in the files folder into one total file (note: the header of each file is the same)
“Implementation steps“
- Gets the names of all files in the files folder
- Read the data for each file
- Concatenate the data for each file
- Save to a new file
“Gets the names of all files in the files folder“
First, we should consider whether Python has a module that helps us get the names of all files in a given folder, which is actually on the operating system (e.g., Windows, MAC), so we can use Python to interact with the operating system: the OS module
# import OS
import os
Change the current working path to the folder where all files are located
need_dir = os.chdir(r"C:\Users\lenovo\Desktop\data_ana02\04-auto_merge_files\files")
List all file names to render in a list
need_file_name = os.listdir(need_dir)
need_file_name
Copy the code
perform
“Read the data for each file“
So here, we need to think about, how do we read the data from the file? Wow ~ in Python, there are quite a number of modules can be read, such as: XLRD, openpyxl, numpy, pandas.. So who to choose? In this context, I highly recommend pandas
# import pandas
import pandas as pd
# Read Yunnan Baiyao. CSV
ylby_df = pd.read_csv("Yunnan Baiyao. CSV")
# display the first five
ylby_df.head()
Copy the code
perform
As you can see, all you need to do to read different files is pass in a different file name, so how do we read all the files in the table below folder?
- Use the for loop to iterate through the list of files
- Read data again
Extract the name of each file
for file_name in need_file_name:
Read data from each file
data = pd.read_csv(f"{file_name}")
print(data.head())
Copy the code
Intercept a portion of it
“Concatenate the data for each file“
How do you concatenate the data as follows?
Pandas. Concat () For example, the concatenation code for two tables is as follows
# Obtain yunnan Baiyao. CSV data
ylby_df = pd.read_csv("Yunnan Baiyao. CSV")
# Obtain wuliangye. CSV data
wly_df = pd.read_csv("Wuliangye. CSV")
# Stack concatenation: Note that concatenation of multiple tables is passed into a tuple or list
pd.concat((ylby_df,wly_df))
Copy the code
perform
So how do you put all the data together? Concat () concatenates the data into a list or tuple
Define an empty list of data
data_li = []
Extract the name of each file
for file_name in need_file_name:
Read data from each file
data = pd.read_csv(f"{file_name}")
Add data to the empty data list
data_li.append(data)
Len (data_li)) # print(len(data_li)
# concatenate all file data
all_data = pd.concat(data_li)
Row 620 is used to obtain data
all_data.shape
Copy the code
“Save to a new file“
Save the file directly using df.to_csv()
all_data.to_csv("Total data.csv")
Copy the code
. No matter how many files in your folder are the same, play with them
“The complete code“
import os
import pandas as pd
Change the current working path to the folder where all files are located
need_dir = os.chdir(r"C:\Users\lenovo\Desktop\data_ana02\04-auto_merge_files\files")
List all file names to render in a list
need_file_name = os.listdir(need_dir)
def concat_file(file_name_li):
data_li = []
# 3. Loop through each filename
for file_name in file_name_li:
# fetch data
# print(file_name)
data = pd.read_csv(f"{file_name}")
Add the read data to a list
data_li.append(data)
# splice
all_data = pd.concat(data_li)
# save file
all_data.to_csv("Total data.csv")
concat_file(need_file_name)
Copy the code
Code is not a lot, want to learn automation office students. This article can be regarded as a case to knock on their own, our learning, as long as the threshold is lowered, I believe that every student can learn Internet programming.