Python Office Automation 04

background

The story goes that a friend of mine who had more than ten Excel files to merge gave him the idea to do so. But then I thought, what if one of my friends needs to do this again and again and miss a date with my little sister? No, you can’t be like me. So study hard today for a better tomorrow

In actual combat

“Requirement” merges the data of all files in the files folder into one total file (note: the header of each file is the same)

“Implementation steps“

Gets the names of all files in the files folder
Read the data for each file
Concatenate the data for each file
Save to a new file

“Gets the names of all files in the files folder“

First, we should consider whether Python has a module that helps us get the names of all files in a given folder, which is actually on the operating system (e.g., Windows, MAC), so we can use Python to interact with the operating system: the OS module

# import OS

import os



Change the current working path to the folder where all files are located

need_dir = os.chdir(r"C:\Users\lenovo\Desktop\data_ana02\04-auto_merge_files\files")



List all file names to render in a list

need_file_name = os.listdir(need_dir)

need_file_name

Copy the code

perform

“Read the data for each file“

So here, we need to think about, how do we read the data from the file? Wow ~ in Python, there are quite a number of modules can be read, such as: XLRD, openpyxl, numpy, pandas.. So who to choose? In this context, I highly recommend pandas

# import pandas

import pandas as pd

# Read Yunnan Baiyao. CSV

ylby_df = pd.read_csv("Yunnan Baiyao. CSV")

# display the first five

ylby_df.head()

Copy the code

perform

As you can see, all you need to do to read different files is pass in a different file name, so how do we read all the files in the table below folder?

Use the for loop to iterate through the list of files
Read data again

Extract the name of each file

for file_name in need_file_name:

    Read data from each file

    data = pd.read_csv(f"{file_name}")

    print(data.head())

Copy the code

Intercept a portion of it

“Concatenate the data for each file“

How do you concatenate the data as follows?

Pandas. Concat () For example, the concatenation code for two tables is as follows

# Obtain yunnan Baiyao. CSV data

ylby_df = pd.read_csv("Yunnan Baiyao. CSV")



# Obtain wuliangye. CSV data

wly_df = pd.read_csv("Wuliangye. CSV")



# Stack concatenation: Note that concatenation of multiple tables is passed into a tuple or list

pd.concat((ylby_df,wly_df))

Copy the code

perform

So how do you put all the data together? Concat () concatenates the data into a list or tuple

Define an empty list of data

data_li = []



Extract the name of each file

for file_name in need_file_name:

    Read data from each file

    data = pd.read_csv(f"{file_name}")

    

    Add data to the empty data list

    data_li.append(data)

    

Len (data_li)) # print(len(data_li)



# concatenate all file data

all_data = pd.concat(data_li)  



Row 620 is used to obtain data

all_data.shape

Copy the code

“Save to a new file“

Save the file directly using df.to_csv()

all_data.to_csv("Total data.csv")  

Copy the code

. No matter how many files in your folder are the same, play with them

“The complete code“

import os

import pandas as pd



Change the current working path to the folder where all files are located

need_dir = os.chdir(r"C:\Users\lenovo\Desktop\data_ana02\04-auto_merge_files\files")

List all file names to render in a list

need_file_name = os.listdir(need_dir)

def concat_file(file_name_li):

    data_li = []   

    # 3. Loop through each filename

    for file_name in file_name_li:

        # fetch data

        # print(file_name)

        data = pd.read_csv(f"{file_name}")

        Add the read data to a list

        data_li.append(data)  

    # splice

    all_data = pd.concat(data_li)

    # save file

    all_data.to_csv("Total data.csv")  

concat_file(need_file_name) 

Copy the code

Code is not a lot, want to learn automation office students. This article can be regarded as a case to knock on their own, our learning, as long as the threshold is lowered, I believe that every student can learn Internet programming.

background

In actual combat

Related Posts

Graph theory correlation algorithm foundation

Algorithm problem – leetCode – Find the median of two positive ordinal groups

Process group, session, control terminal concepts, how to create a daemon process?