Convert excel files to CSV and export them to the CSV directory

Why do you want to go from Excel to CSV,

CSV is a plain text file, Excel is a binary file, Excel contains a lot of formatting information in it. Therefore, CSV files can be opened with Notepad, while Excel files can be opened with Notepad will report an error.

Note: CSV files are essentially plain text files without compression, so they are generally larger than excel files with the same content. It is not clear whether exported CSV files are larger or smaller than Excel files

The chart below compares the differences between the two, which looks more direct.

Py: async_execute_csv: async_execute_csv: async_execute_csv: async_execute_csv: async_execute_csv Execute the execute function in the export_csv.py file.

Here is the full code of the export_csv.py file

import traceback
import os
import sys
import xlrd
import codecs
import unicodecsv
import debug
import checker
import setting
import Parser
from tableData import tableManager
import excelCellData
import re
import checkAll
import globValue
import json
import start
 
def getTableColCount(sheet):
    title = []
    try:
        for i in range(100):
            title.append(sheet.cell_value(3, i))
    except:
        return i
    return 0
 
 
def getTableName(sheet):
    if(sheet.cell(1,0).ctype == 0):
        return None
    return sheet.cell_value(1,0)
 
 
def getTablePart(sheet):
    if(sheet.cell(1, 1).ctype == 0):
        return None
    return int(sheet.cell_value(1, 1))
 
 
def getTableUser(sheet, endCol):
    firstCell = sheet.cell(2, 0).value
    for i in range(1, endCol):
        if(sheet.cell(3, i).ctype == 0):
            continue
        if(sheet.cell(2, i).value != firstCell):
            return u"common"
 
    if(firstCell == "c"):
        return u"client"
    elif(firstCell == "s"):
        return u"server"
    return u"common"
 
 
def getTableKey(sheet):
    if(sheet.cell(4, 0).ctype == 0):
        return None
    return sheet.cell_value(4, 0)
 
 
def toUTF8(data):
    if not (type(data) is unicode):
        data = unicode(data)
    return data.encode('utf-8')
 
 
# 导出XML
def execute(excelName):
    return_csvs = []
    xlrd.Book.encoding = "gbk"
    excelPath = setting.dirPath_excel + excelName
    
    # 打开Excel
    excel = xlrd.open_workbook(excelPath)
    # 遍历子表,导出XML
    for sheet_name in excel.sheet_names():
        # 检测表名,是否符合规范,不符合则跳过
        if(sheet_name[-1] != "="):
            name = sheet_name.encode('utf-8').decode('unicode_escape')
            continue
        # 打开一个表单
        sheet = excel.sheet_by_name(sheet_name)
 
        checkTableHead(sheet, debug.toUTF8(excelName), toUTF8(sheet_name))
 
        # 第三行开始是字段
        # 第七行开始是数据
        # 导出每个子表
        tableName = getTableName(sheet)
 
        csvName = tableName
        tablePart = getTablePart(sheet)
 
        if(tablePart is not None):
            csvName = "{}_{}".format(tableName, tablePart)
 
        return_csvs.append("{}".format(csvName))
 
        # 当前sheet的列数
        endCol = getTableColCount(sheet)
        filePath = setting.dirPath_csv + csvName + ".csv"
        fileObj = codecs.open(filePath, "wb")
        fileObj.seek(0)
        csv_writer = unicodecsv.writer(fileObj, encoding='utf-8-sig')
 
        # 写入配表使用者
        tableUser = getTableUser(sheet, endCol)
        csv_writer.writerow([tableUser, excelName.decode("gbk"), sheet_name])
 
        # 如果一行数据都没有,写入一行空数据
        fillEmpty = False
        rowCount = sheet.nrows
        if rowCount < 9:
            fillEmpty = True
            rowCount = 9
 
        # 每个子表每行数据
        for i in range(8, rowCount):
            rowData = excelCellData.ExcelRowData(debug.toUTF8(excelName), toUTF8(sheet_name), i)
            # 每个子表每列数据
            for j in range(0, endCol):     # endCol
                if(sheet.cell(3, j).ctype == 0):
                    continue
                sign = toUTF8(sheet.cell(2, j).value)
                title = toUTF8(sheet.cell(3, j).value)
                word = toUTF8(sheet.cell(4, j).value)
                link = toUTF8(sheet.cell(7, j).value)   #检测是否有link标识
                cellData = toUTF8(readCell(sheet, i, j, fillEmpty))
                rowData.addCellData(sign, title, word, cellData, i, j, i == 8)
                if globValue.getIsCheck() == 1:
                    if link != "" and link != "__options__" and link.startswith("link"):    #拿到要Link的数据
                        link = re.split(':|：',link)[1]
                        if title == "INT":
                            if len(cellData) != 0:
                                    if(len(cellData) >= 3 and cellData[-2:] == ".0"):
                                        cellData = cellData[:-2]
                                    if cellData != '-1' and cellData != '0' and cellData != '':
                                        if checkAll.CheckLink(link,cellData) == 0:
                                            debug.error("表{}的子表[{}]中第{}行要link的ID:{}在对应的表[{}]中不存在\n".format(debug.toUTF8(excelName),csvName,i+1,cellData,link))
                        elif title == "INT[]":
                            if len(cellData) != 0:
                                data = cellData.split('*')
                                for v in data:
                                    if(len(v) >= 3 and v[-2:] == ".0"):
                                        v = cellData[:-2]
                                    if v != '-1' and v != '0' and v != '':
                                        if checkAll.CheckLink(link,v) == 0:
                                            debug.error("表{}的子表[{}]中第{}行要link的ID:{}内的{}在对应的表[{}]中不存在\n".format(debug.toUTF8(excelName),csvName,i+1,cellData,v,link))
                        elif title == "STRING":
                            if len(cellData) != 0:
                                if(len(cellData) >= 3 and cellData[-2:] == ".0"):
                                    cellData = cellData[:-2]
                                if cellData != '-1' and cellData != '0' and cellData != '':
                                    if checkAll.CheckLink(link,cellData) == 0:
                                        debug.error("表{}的子表[{}]中第{}行要link的ID:{}在对应的表[{}]中不存在\n".format(debug.toUTF8(excelName),csvName,i+1,cellData,link))
                        elif title == "STRING[]":
                                if len(cellData) != 0:
                                    data = cellData.split('*')
                                for v in data:
                                    if v != '-1' and v != '0' and v != '':
                                        if checkAll.CheckLink(link,v) == 0:
                                            debug.error("表{}的子表[{}]中第{}行要link的ID:{}内的{}在对应的表[{}]中不存在\n".format(debug.toUTF8(excelName),csvName,i+1,cellData,v,link))
                        elif title == "JSON":
                            data = json.loads(cellData)
                            for v in data:
                                for value in v:
                                    value = value.decode("unicode_escape").encode("utf8")
                                    if len(value) != 0:
                                        if checkAll.CheckLink(link,value) == 0:
                                            debug.error("表{}的子表[{}]中第{}行要link的ID:{}在对应的表[{}]中不存在\n".format(debug.toUTF8(excelName),csvName,i+1,value,link))
                        
                
            # 如果是第一行数据
            if i == 8:
                rowSigns = rowData.getAllCellDataSigns()
                csv_writer.writerow(rowSigns)   # 写入归属标识
                rowTitles = rowData.getAllCellDataTitles()
                csv_writer.writerow(rowTitles)  # 写入数据类型
                rowWords = rowData.getAllCellDataWords()
                csv_writer.writerow(rowWords)   # 写入数据字段
            
            rowDatas = rowData.getAllCellDataContents()
            csv_writer.writerow(rowDatas)
        fileObj.close()
 
        # 添加到配表管理器
        tableManager.addTableData(tableName, tablePart, getTableKey(sheet), tableUser)
 
    return return_csvs
 
 
#检查Excel单元格是否是空的
def isEmpty(sheet_cell):
    return sheet_cell.ctype == 0 or sheet_cell.value == u''
 
 
#检查表头的合法性
def checkTableHead(sheet, excelName, sheetName):
    cell_value = readCell(sheet, 0, 0)
    if cell_value != u"TableName":
        debug.throwError("表头不合法:Excel:{}分页{} 单元格A1 {} 不是\'TableName\'".format(excelName, sheetName, toUTF8(cell_value)))
 
    cell_value = readCell(sheet, 0, 1)
    if cell_value != u"Part":
        debug.throwError("表头不合法:Excel:{}分页{} 单元格B1 {} 不是\'Part\'".format(excelName, sheetName, toUTF8(cell_value)))
 
    cell_value = readCell(sheet, 1, 0)
    if cell_value == u"":
        debug.throwError("表头不合法:Excel:{}分页{} 单元格A2 没有填写表格名称".format(excelName, sheetName))
    else:
        if checker.isNumber(toUTF8(cell_value)):
            debug.throwError("表头不合法:Excel:{}分页{} 单元格A2 表格名称:{} 不能为数字".format(excelName,sheetName,toUTF8(cell_value)))
 
 
# 读取单元格数据
# fillEmpty:True 强行返回空
def readCell(sheet, rowx, colx, fillEmpty=False):
    if not fillEmpty:
        if sheet.cell(rowx, colx).ctype == 0:
            return u''
        return sheet.cell(rowx, colx).value
    return u''
Copy the code

The following is the Excel table data of our project

The core content of the code is still readCell reading the contents of the grid and writing them into the CSV grid. The table of our project has many unspoken rules which are commonly known as our own conventions. Since many contents of the code are related to the unspoken rules, I will make a brief introduction.

Excel lines 1-8 are configuration lines, and line 9 is the official data, so in the code for loop we start at 8 (reading the table is line 0, so 8 is line 9) to loop through each cell.

First line: fixed write dead, is a hint of no meaning

The second line: the name of the table and whether the table has other parts. If a table is composed of multiple sheets, write the first sheet as Part 1 and the following sheet as Part 23456 instead of TableName. After that, multiple CSV files will be generated and these data will be merged externally

The third row: the data in this column is for the client or the server, marked with C and S. The data on the client and the server are typed separately in the tabulated process. That is to say, the same CSV file generates two bytes files, which can save a lot of memory.

Row 4: The data type of each column, in addition to the standard STRING INT FLOAT, also support JSON, INT[] STING[] FLOAT[] array, etc. We’ll see how arrays are implemented later, but in this process, you just read each grid and stuff it into CSV, so there’s no type manipulation involved.

Row 5: The variable names of each column of data. This is nothing to say, as long as the variable names of the sheet are not repeated. If they are repeated, an exception will be thrown.

Line 6: Chinese interpretation of variable names

Line 7: Details and rules

Line 8: If there is an associated table, if this column of data has an associated table, write the name of the associated table, and then check the associated table for duplicate Id and other operations. There’s no need to delete this line of items and at the same time change the code and start looping over from 7.

You can open a CSV file in Notepad like this, you can open a CSV file in Excel, it looks like Excel.

Pay attention to the output directory when exporting. CSV for all tables is output to a folder

Convert excel files to CSV and export them to the CSV directory

Convert excel files to CSV and export them to the CSV directory

Related Posts

How wrong did I get about seckilling? Until I looked at this high-concurrency architecture

Windows 10 system installation Oracle11g database and client client method

Pulumi AWS continued to display Key errors when performing preview updates