How easy is it to manipulate Excel documents? Let’s see how Python handles this.
1 introduction
Recently, I need to read and write Excel files frequently, and I want to automate the processing of Excel files through the program. I find it is very convenient to use Python OpenPyXL library to read and write Excel files, with clear structure and simple operation. This article summarizes the use of OpenPyXL, including the following:
- Introduction to OpenPyXL and excel file structure
- Worksheet read and write processing
- Read and write processing of columns and columns
- Read and write processing of cells
2 OpenPyXL and Excel file structure
Openpyxl is a Python library for reading and writing 2010 Excel documents in XLSX/XLSM/XLTX/XLTM format. It is well documented on the official website. It must be installed and introduced before use
# installation
pip install openpyxl
# Introduce the OpenPyXL module
import openpyxl
Copy the code
Before performing Excel operations, make a brief understanding of the Excel file structure to familiarize yourself with subsequent operations. The diagram below:
An Excel file is divided into workbook (file) -> sheet -> row -> cells according to the hierarchy, corresponding to the above figure, the whole Excel file is a workbook; There can be multiple worksheets under a workbook (Sheet1/ Test1, etc.). The worksheet is the corresponding table data, divided into rows and columns, rows are represented by serial numbers, columns are represented by capital letters (also available serial numbers); The intersection of rows and columns is each cell where data is stored. So, when we read and write excel tables, we basically follow this hierarchical approach: read files, find worksheets, traverse rows and columns, locate cells, and read and write cells. Therefore, read and write operations on worksheets, columns, and cells are involved. To do this, you need to load the file in. An Excel file is a workbook. The loading operation is as follows (in this example, the Excel file is text.xlsx) :
Load the Excel file
file_path = "E:/pythontest/test.xlsx"
workbook = openpyxl.load_workbook(file_path)
Copy the code
3. Work sheet processing
3.1 Worksheet reading
There can be multiple sheets, and all sheets can be read. When reading a single sheet, it can be read by sheet name or by subscript (subscript starts at 0).
- All worksheet objects:
workbook.worksheets
- Name of all worksheets:
workbook.sheetnames
- Get the worksheet by name (sheet_name) :
workbook[sheet_name]
- Press the subscript (I starts at 0) to get the worksheet:
workbook.worksheets[i]
- Get the worksheet in use:
workbook.active
- Get the properties of the worksheet (such as the worksheet name, maximum number of rows, number of columns, and so on) :
sheet.title
,sheet.max_row
,sheet.max_column
As follows:
# all sheet objects
>>> workbook.worksheets
[<Worksheet "Sheet1">, <Worksheet "test1">, <Worksheet "test2">]
# all sheet names
>>> workbook.sheetnames
['Sheet1'.'test1'.'test2']
Read sheet by name
>>> workbook["Sheet1"]
<Worksheet "Sheet1">
Read by subscript
>>> workbook.worksheets[0]
<Worksheet "Sheet1">
Get the sheet that is currently in use
>>> workbook.active
<Worksheet "Sheet1">
Get the sheet property
>>> sheet_active.title
Sheet1
>>> sheet_active.max_row
6
>>> sheet_active.max_column
3
Copy the code
3.2 Adding worksheets
If you need to add a worksheet, follow the procedure to add the worksheet and then save the file. Create the create_sheet file and save the file to take effect.
- Create a worksheet with the same name and rename it automatically:
workbook.create_sheet("test3")
- Creates a worksheet at the specified subscript:
workbook.create_sheet("test4",1)
- Save the file. If the file path is the same as the open file path, the file is overwritten. Otherwise, the original file will be copied and saved (equivalent to saved as) :
workbook.save(file_path)
3.3 Worksheet modification
To change the worksheet name, simply set the title of the worksheet and save the file.
Change the worksheet name
>>> sheet1 = workbook['test1']
>>> sheet1.title = 'test11'
# save file
>>> workbook.save(file_path)
Copy the code
3.4 Worksheet deletion
To delete a worksheet, obtain the sheet object and then delete it. There are two ways to remove a workbook, either by using the remove method provided by workbook, or by using python del directly. After deleting, you also need to save the file:
# remove Delete worksheet
sheet = workbook["test-1"]
workbook.remove(sheet)
Delete with # del
del workbook["test2"]
# save file
workbook.save(file_path)
Copy the code
4 Row and Column Processing
After the sheet object is obtained, the row operations can be performed, including reading, adding, and deleting rows.
4.1 read the ranks
- Get all rows and columns, which can then be traversed:
sheet.rows
,sheet.columns
- Read part row: Read a line
sheet[1]
, to read moresheet[2:3]
, read a columnsheet['A']
Read more columnssheet['B:C']
Pass through all rows
>>> for row in sheet.rows:
. print(row)
...
(<Cell 'Sheet1'.A1>, <Cell 'Sheet1'.B1>, <Cell 'Sheet1'.C1>)
(<Cell 'Sheet1'.A2>, <Cell 'Sheet1'.B2>, <Cell 'Sheet1'.C2>)
....
Read part of the column
>>> sheet[1]
(<Cell 'Sheet1'.A1>, <Cell 'Sheet1'.B1>, <Cell 'Sheet1'.C1>)
>>> sheet["A:B"]
((<Cell 'Sheet1'.A1>, <Cell 'Sheet1'.A2>, <Cell 'Sheet1'.A3>, <Cell 'Sheet1'.A4>, <Cell 'Sheet1'.A5>, <Cell 'Sheet1'.A6>), (<Cell 'Sheet1'.B1>, <Cell 'Sheet1'.B2>, <Cell 'Sheet1'.B3>, <Cell 'Sheet1'.B4>, <Cell 'Sheet1'.B5>, <Cell 'Sheet1'.B6>))
Copy the code
4.2 Adding Rows and Columns
To add a row, you can specify where to add a single row or more rows.
- Append row data directly to worksheet:
sheet.append(rowdata)
- Add rows and columns at the specified index (counting from 1) :
sheet.insert_rows
.sheet.insert_cols
Insert 1 blank line at line 4
>>> sheet.insert_rows(4)
Insert 2 blank lines at line 2
>>> sheet.insert_rows(idx=2,amount=2)
Add a row of data to the table
>>> row_data = ["tom".15."[email protected]"]
>>> sheet.append(row_data)
# Save the changes
>>> workbook.save(file_path)
Copy the code
4.3 Deleting Rows and Columns
The delete operation is the same as the insert row operation, using the delete_rows and delete_cols methods.
# delete rows
>>> sheet.delete_rows(2.2)
>>> workbook.save(file_path)
Copy the code
5 cell processing
Our data is ultimately stored in each Cell, so ultimately we manipulate the data in the Cell, where OpenPyXL uses Cell objects. As we walked through the row data, we could see the output
, which corresponds to the Cell object. Operations on cells are described below.
5.1 Obtaining cell data values and attribute values
There are two ways to locate and retrieve cells:
- Specify row names directly:
sheet[A1]
- use
cell
Functions (row and column subscripts start at 1) :sheet.cell(row=2,column=1)
Get the cell by specifying the row coordinates
>>> sheet["A1"]
<Cell 'Sheet1'.A1>
The cell function retrieves the cell
>>> sheet.cell(row=1, column=1)
<Cell 'Sheet1'.A1>
Copy the code
After obtaining the cell object, you can obtain the data value and its attributes, including the number of rows it is in, coordinates, values, and so on.
>>> cell = sheet["A2"]
>>> cell.value
'Joe'
>>> cell.coordinate
'A2'
>>> cell.column
1
>>> cell.row
2
Copy the code
5.2 Moving cells
You can move a range of cells up, down, left, or right using move_range(range, Rows, COLs), where rows and COLs are integers, positive integers are down or right, and negative integers are up or left.
Move the data area (up 2 rows, right 3 columns), positive integers are down or right, negative integers are up or left
sheet.move_range("A3:C3", rows=2 -, cols=3)
wb.save(file_path)
Copy the code
5.3 Merging and Splitting Cells
For rows and columns, cells need to be merged using merge_cells(range_string, STARt_ROW, start_column, end_ROW, end_column). If all the cells to be merged have data, only the data in the upper left corner is retained, and the rest is discarded. Both merge and split can be done by row and column coordinates (e.g. A1) or row and column subscripts (e.g. 1,2).
# Cell merge, using range coordinates
sheet.merge_cells("A2:B3")
# cell merge, specify row and column subscripts (subscripts start at 1)
sheet.merge_cells(start_row=5, start_column=3, end_row=7, end_column=4)
wb.save(file_path)
# Split cells
sheet.unmerge_cells("A2:B3")
sheet.unmerge_cells(start_row=5, start_column=3, end_row=7, end_column=4)
# save file
wb.save(file_path)
Copy the code
5.4 Writing cells
To modify and write cell values, assign cell. Value directly. Note that you can write excel formulas that are identical to those used in Excel, and if you write a formula, the value you get when you read it is also a formula, not a formula value.
# write value
cell.value = "Zhang"
# write the formula (average)
cell.value = "=AVERAGE(B2:B6)"
Copy the code
5.5 Setting the cell format
Cell formats include row height, column width, font, border, alignment, fill color, and so on. This is all in the Styles module of OpenPyXL.
- Row height/column width:
row_dimensions[row_num].height = xx
.sheet.column_dimensions[col_name].width = xx
- Font (Font object) : includes field name, size, bold, italic, color, etc.
Font(name="微软雅黑", size=20, bold=True, italic=True, color="000000")
- Border (Border object and Side object) : The format size/color of each Side of the Border
Side(style="thin", color="000000")
, build border objects with edges:Border(left=side, right=side, top=side, bottom=side)
- Alignment object: Indicates whether the Alignment is vertical or horizontal.
Alignment(horizontal="center", vertical="center", wrap_text=True)
- Fill color, divided into normal color fill and gradient color fill:
PatternFill(fill_type="solid", fgColor="FF0000")
andGradientFill(stop=("FF0000", "FD1111", "000000"))
Set the row height and column width
sheet.row_dimensions[1].height = 50
sheet.column_dimensions["A"].width = 20
Set the cell font
cell = sheet["A1"]
current_font = cell.font
font = Font(name=Microsoft Yahei, size=20, bold=True, italic=True, color="000000")
cell.font = font
# Set border (thin, black)
side_style = Side(style="thin", color="000000")
border = Border(left=side_style, right=side_style, top=side_style, bottom=side_style)
cell.border = border
# center align, wrap
cell_alignment = Alignment(horizontal="center", vertical="center", wrap_text=True)
cell.alignment = cell_alignment
# Fill color (red fill, and red to black gradient fill)
p_fill = PatternFill(fill_type="solid", fgColor="FF0000")
g_fill = GradientFill(stop=("FF0000"."FD1111"."000000"))
cell.fill = p_fill
sheet["B1"].fill = g_fill
Copy the code
Finally, note that these changes require a save operation (wb.save(file_path)) to take effect.
6 summarizes
Through the above explanation, understand how to use Python OpenPyXL library for Excel document processing operations, you can find that its operation logic is quite clear and simple, in line with our habits of using Excel. The process is basically loading files and locating worksheets, rows, columns, and cells that need to be processed. Read, write, and format them. So, if you need to automate excel files, use OpenPyXL, but it limits you to excel documents in 2010 format. The recommendations for old formats (XLS) can be replaced with new formats, or you can use XLRD and XLWT modules.
The resources
- Openpyxl official documentation:
https://openpyxl.readthedocs.io/
- Excel for Python Office Automation series
The articles
- Basic Python operations – files, directories, and paths
- Distributed deployment of MinIO
- Use MinIO to easily build static resource services
- SpringBoot Multiple data sources (3) : Parameterize change sources
- SpringBoot Multiple Data Sources (2) : dynamic data sources
- SpringBoot Multiple Data Sources (1) : Multiple source policies
- Java development must learn: dynamic proxy
- Good books to read in 2019
My official account (search Mason technical records) for more technical records: