The business requirements
An Excel table with hundreds of thousands of rows of data now needs to concatenate all the data in one column into a single string, such as the following short rows of table data:
id | code | price | num |
---|---|---|---|
11 | 22 | 33 | 44 |
22 | 33 | 44 | 55 |
33 | 44 | 55 | 66 |
44 | 55 | 66 | 77 |
55 | 66 | 77 | 88 |
66 | 77 | 88 | 99 |
Now we need to concatenate the code column with a comma into a string, and each cell is enclosed in single quotes. We need to concatenate the string ’22’,’33’,’44’,’55’,’66’,’77’. What do we do in this case? Of course there are many ways…
Batch processing of multi-line text
Many Text editors support batch manipulation of multiple lines of Text. I would like to focus on Sublime Text here. Here are some shortcuts for manipulating Text that you can use if you need them.
- Select the rows that you want to operate on and press
Ctr+Shift+L
You can edit these lines at the same time - Select the text and press repeatedly
CTRL+D
You can go down and select the next same text for simultaneous editing - Select the text and press
Alt+F3
You can select all the same text at once for simultaneous editing
How to save efficiency
At work, there may be some tabular data processing situations. For example, an operation may give you a table with something like: Order number ah, product ID, product SKU, etc., need you to assist the export these analysis using the detailed data in the data so that they do, one or two, we can quickly with the above approach, but this way for the processing of large text may exist caton, operational efficiency is low, if small text, it is very convenient.
If we encounter this situation many times, do you want to make a tool to deal with it quickly? That is, this batch splicing of the same format of data, we can write a small tool to achieve it, which is fast and easy, can greatly reduce the repetitive work consumption.
Pandas reads the table data and processes it
We use Python’s pandas module to read the table column, loop through the concatenation format, and write the concatenated string to a text file.
sheet = pandas.read_excel(io=file_name, Usecols =[line_num]) data = sheet.values. Tolist () str_data = '' Start processing the data... ') for x in range(len(data)): if str(data[x][0]) ! = 'nan': str_data += "'" + str(data[x][0]) + "',"Copy the code
Complete source code
Because the script need to use for many times, and for different files of different columns, so we have to accept the form of key parameters, we could not change any code, you can directly use this script to complete our data splicing, at the same time, we can also use pyinstaller module to the script window of packaged into exe executable file, To make it available even on a non-Python runtime, the package command is: Pyinstaller -f -i favicon.ico join_excel_data.py, I have a package to upload to Github, you can click to view, make a friend address: github.com/gxcuizy
#! /usr/bin/env python # -* -coding: utf-8 -*- "" 2021-03-01 """ import pandas import random import os import time def print_msg(msg=''): "" "print information "is" "now_time = time. Strftime (" % % Y - m - H: % d % % m: % S", Time. The localtime ()) print ('/' + now_time + ' '+ MSG) # program main entry if __name__ = = "__main__" : File_num = input(' please enter the name of the table in the current directory (e.g. 01. XLSX) : ') line_num = input(' Please enter the number of columns to assemble (e.g. '1 ') : ') # Check whether the file exists if os.path.exists(file_name) == False: Print_msg (' file does not exist ') os.system("pause") exit(0) Print_msg (' please enter column number ') os.system("pause") exit(0) try: Line_num = int(line_num) - 1 sheet = line_num = 1 pandas.read_excel(io=file_name, Usecols =[line_num]) data = sheet.values. Tolist () str_data = '' Start processing the data... ') for x in range(len(data)): if str(data[x][0]) ! : = 'nan' str_data + = "'" + STR (data [x] [0]) + ""," # written text file print_msg (' data processing is completed, began to write... ') random_num = random.randint(1000, 9999) with open('str_' + str(random_num) + '.txt', 'w') as f: F. Write (str_data.strip(',')) print_msg(' data is written.') except Exception as err_info: Print_msg (STR (err_info))Copy the code
The last
If you have other fun, easy to use welcome to share, we learn and exchange together. By the way, if there is something wrong or wrong, please point it out. I will redouble my study and strive for improvement. I hope to make progress with you.