For more, please visit my personal blog.
preface
A few general libraries for handling Excel tables in Python:
- XLRD used to read
- XLWT used to write
- Xlutils is used to copy and filter documents
None of these libraries is as powerful as OpenPyXL, which can both read and write Excel 2010+ XLSX XLSM XLTX XLTM files. However, the OpenPyXL library is also memory intensive, about 50 times larger than the original file. For example, a 50M Excel file requires 2.5g of memory to run. For a performance comparison of the above libraries, go to The OpenPyXL Performance Test.
The installation
Enter the following command in the terminal:
pip install openpyxl
Copy the code
Openpyxl loses images and charts of the original file when saving Excel tables. If you want to manipulate images, you need the Pillow library, which is installed as follows:
pip install pillow
Copy the code
As the recent military games were held in China, the wall had to be scaled to be installed.
Method of use
Creating an Excel table
Create a new Excel table. By default, there will be a table named Sheet as follows:
from openpyxl import Workbook
wb = Workbook() Create a file object
ws = wb.active Get the default sheet
wb.save("sample.xlsx")
Copy the code
Open an existing Excel table
Perform the following operations on the existing Excel table:
from openpyxl import Workbook, load_workbook
wb = load_workbook('sample.xlsx')
wb.save("sample.xlsx")
Copy the code
Create/obtain the Sheet
Create a new Sheet with the workbook.create_sheet () method. Sheet1 Sheet2 Sheet3… Way naming; The second argument is where to insert the Sheet, starting with 0 and ending if left blank. As follows:
ws1 = wb.create_sheet("Mysheet") Insert at the end by default
ws2 = wb.create_sheet("Mysheet", 0) Insert in the first position
wb.remove(ws1) # delete sheet
Copy the code
You can also change the name of the sheet at any time later, as follows:
ws.title = "New Title"
Copy the code
Modify the sheet label color as follows:
ws.sheet_properties.tabColor = "1072BA"
Copy the code
If you know the name of the sheet, you can get it as follows:
ws = wb.get_sheet_by_name("New Title")
ws = wb["New Title"]
Copy the code
You can also get the names of all the sheets, traversing the sheet names as follows:
sheets = wb.sheetnames
for sheet in sheets:
print(sheets)
for sheet in wb:
print sheet.title
['Sheet1'.'New Title'.'Sheet2']
Copy the code
You can also locate the corresponding sheet page, where [0] is the sheet page index, as follows:
sheet_names = wb.sheetnames Get all sheet page names
ws = wb[sheet_names[0]]
Copy the code
Copy Sheet
Only the value style of the cell can be copied, hyperlink comment blocks, etc., while images and tables, etc., cannot be copied, as follows:
source = wb.active
target = wb.copy_worksheet(source)
Copy the code
Operation cell
Get the cell from Worksheet, or assign a value to the cell directly, as follows:
cell = ws['A4'] Get the cell in row 4, column A
ws['A4'] = 4 # assign 4 to the cell in row 4, column A
ws.cell(row=4, column=2, value=10) # assign 10 to the cell in row 4, column 2
ws.cell(4, 2, 10) # same as above
Copy the code
Gets the cells in the range as follows:
cell_range = ws['A1':'C2'] # fetch the region within A1-C2
colC = ws['C'] Get column C
col_range = ws['C:D'] Get columns C-d
row10 = ws[10] Get column 10
row_range = ws[5:10] Get columns 5 through 10
Copy the code
If you get a cell, you can assign it as follows:
cell.value = 'hello, world'Cell = ws. Cell (row= I, column=j, value="The amount")
Copy the code
Gets the value of the cell as follows:
cellValue = ws.cell(row=i, column=j).value
Copy the code
Get the number of rows and columns as follows:
row = ws.max_row # maximum number of rows
column = ws.max_column # maximum number of columns
Copy the code
Get the data line by line as follows:
>>> for row in ws.iter_rows(min_row=1, max_col=3, max_row=2):
... for cell in row:
... print(cell)
<Cell Sheet1.A1>
<Cell Sheet1.B1>
<Cell Sheet1.C1>
<Cell Sheet1.A2>
<Cell Sheet1.B2>
<Cell Sheet1.C2>
Copy the code
Columns of data are obtained as follows:
>>> for col in ws.iter_cols(min_row=1, max_col=3, max_row=2):
... for cell in col:
... print(cell)
<Cell Sheet1.A1>
<Cell Sheet1.A2>
<Cell Sheet1.B1>
<Cell Sheet1.B2>
<Cell Sheet1.C1>
<Cell Sheet1.C2>
Copy the code
For performance reasons, the worksheet.iter_cols () method cannot be used in read-only mode.
Get all columns or rows as follows:
rows = ws.rows
columns = ws.columns
Copy the code
The worksheet. columns method cannot be used in read-only mode for performance reasons.
If you only want to get values from worksheet, you can use the worksheet. values attribute as follows:
for row in ws.values:
for value in row:
print(value)
Copy the code
The worksheet.iter_rows () and worksheet.iter_cols () methods can both add values_only arguments to get only values, as follows:
for row in ws.iter_rows(min_row=1, max_col=3, max_row=2, values_only=True):
print(row)
Copy the code
Save the file
Save the Workbook using the workbook.save () method, which overwrites the original file without prompting as follows:
wb = Workbook()
wb.save('balances.xlsx')
Copy the code
Gets the cell type
from openpyxl import Workbook, load_workbook
import datetime
wb = load_workbook('sample.xlsx')
ws=wb.active
wb.guess_types = True Get cell type enabled
ws["A1"]=datetime.datetime(2010, 7, 21)
print ws["A1"].number_format
ws["A2"] ="12%"
print ws["A2"].number_format
ws["A3"] = 1.1print ws["A4"].number_format
ws["A4"] ="China"
print ws["A5"].number_format
wb.save("sample.xlsx")
# Execution result:
# yyyy-mm-dd h:mm:ss
# 0%
# General
# General
# display general for general, '0.00_ 'for number, 0% for percentage
Numbers need to be typed in Excel. Numbers written directly are normal types
Copy the code
Using the formula
from openpyxl import Workbook, load_workbook
wb = load_workbook('sample.xlsx')
ws1=wb.active
ws1["A1"]=1
ws1["A2"]=2
ws1["A3"]=3
ws1["A4"] = "=SUM(1, 1)"
ws1["A5"] = "=SUM(A1:A3)"
print ws1["A4"].value # print the contents of the formula, not the calculated value of the formula, the program can not get the calculated value
print ws1["A5"].value # print the contents of the formula, not the calculated value of the formula, the program can not get the calculated value
wb.save("sample.xlsx")
Copy the code
Merged cell
from openpyxl import Workbook, load_workbook
wb = load_workbook('sample.xlsx')
ws1=wb.active
ws.merge_cells('A2:D2')
ws.unmerge_cells('A2:D2') The script will report an error if it splits the cells separately, so it needs to merge and split the cells again
# or equivalently
ws.merge_cells(start_row=2,start_column=1,end_row=2,end_column=4)
ws.unmerge_cells(start_row=2,start_column=1,end_row=2,end_column=4)
wb.save("sample.xlsx")
Copy the code
Insert a picture
The Pillow library is required, and the installation is as follows:
pip install pillow
Copy the code
from openpyxl import load_workbook
from openpyxl.drawing.image import Image
wb = load_workbook('sample.xlsx')
ws1=wb.active
img = Image('1.png')
ws1.add_image(img, 'A1')
wb.save("sample.xlsx")
Copy the code
Hidden cell
from openpyxl import load_workbook
wb = load_workbook('sample.xlsx')
ws = wb.active
ws.column_dimensions.group('A'.'D', hidden=True) # hide columns A through D
ws.row_dimensions.group(1, 10, hidden=True) # Hide lines 1 to 10
ws.row_dimensions[2].hidden Get if the second line is hidden
wb.save("sample.xlsx")
Copy the code
Optimization model
When dealing with very large XLSX files, OpenPyXL’s normal mode cannot handle this load. Fortunately, there are two modes that can read and write an unlimited amount of data with (almost) constant memory consumption.
Read-only mode
from openpyxl import load_workbook
wb = load_workbook(filename='large_file.xlsx', read_only=True)
ws = wb['big_data']
for row in ws.rows:
for cell in row:
print(cell.value)
Copy the code
Only write mode
from openpyxl import Workbook
wb = Workbook(write_only=True)
ws = wb.create_sheet()
# now we'll fill it with 100 rows x 200 columns
for irow in range(100):
... ws.append(['%d' % i for i in range(200)])
# save the file
wb.save('new_big_file.xlsx') # doctest: +SKIP
Copy the code
- Unlike a normal workbook, a newly created write-only workbook does not contain any worksheets; You must use the
create_sheet()
Method specifically to create a worksheet. - In a write-only workbook, only
append()
Add a line. usecell()
或iter_rows()
It is not possible to write (or read) cells anywhere. - It can export an unlimited amount of data (even more than Excel can actually handle) while keeping memory usage under 10Mb.
Insert/delete rows/columns, move zone cells
Insert rows/columns
Insert a line above line 7 as follows:
ws.insert_rows(7)
Copy the code
Insert a column to the left of column 7 as follows:
ws.insert_cols(7)
Copy the code
Deletes rows/columns
Starting with column 6, delete 3 columns, i.e., columns 6, 7, 8, as follows:
ws.delete_cols(6, 3)
Copy the code
Move area cells
Move the D4:F10 region up one row and to the right 2 columns as follows:
ws.move_range("D4:F10", rows=-1, cols=2)
Copy the code
If the region contains a formula, then the following method can be moved along with the formula:
ws.move_range("G4:H10", rows=1, cols=1, translate=True)
Copy the code
Use Pandas and NumPy
For details, see Working with Pandas and NumPy
The chart
A chart consists of one or more cell area data points in at least one series. For more information, please refer to the chart
annotation
Openpyxl can read/write comments, but formatting information is lost. Action comments are not supported in read-only mode. Comments must include content and author.
Read the comments as follows:
comment = ws["A1"].comment
comment.text # Comment content
comment.author # Comment author
Copy the code
Write a comment like this:
comment = Comment("Text"."Author")
comment.width = 300 # set width
comment.height = 50 # set height
ws["A1"].comment = comment
ws["B2"].comment = comment
Copy the code
Table style
Font style
Font name, font size, font color, bold, italic, vertical alignment (baseline, superscript, subscript), underline, strikeout, as follows:
from openpyxl.styles import Font
font = Font(name='Calibri',
size=11,
color='FF000000',
bold=False,
italic=False,
vertAlign=None,
underline='none',
strike=False)
ws['A1'].font = font
cell2.font = Font(name=cell1.font.name, sz=cell1.font.sz, b=cell1.font.b, i=cell1.font.i)
Copy the code
The font color can be RGB or aRGB, as follows:
font = Font(color="FFBB00")
font = Font(color="FFFFBB00")
Copy the code
Inherit and rewrite the style as follows:
ft1 = Font(name='Arial', size=14)
ft2 = copy(ft1)
ft2.name = "Tahoma"
Copy the code
Fill the style
For details, move to fill styles
from openpyxl.styles import PatternFill
The style of # fill_type is None or solid
cell2.fill = PatternFill(fill_type=cell1.fill.fill_type, fgColor=cell1.fill.fgColor)
Copy the code
Border style
For details, move to border styles
from openpyxl.styles import Border, Side
border = Border(left=Side(border_style=None, color='FF000000'),
right=Side(border_style=None, color='FF000000'),
top=Side(border_style=None, color='FF000000'),
bottom=Side(border_style=None, color='FF000000'),
diagonal=Side(border_style=None, color='FF000000'),
diagonal_direction=0,
outline=Side(border_style=None, color='FF000000'),
vertical=Side(border_style=None, color='FF000000'),
horizontal=Side(border_style=None, color='FF000000'))Copy the code
The alignment style
Horizontal: distributed, justify, center, left, fill, centerContinuous, right, general vertical bottom, distributed, justify, center, top
from openpyxl.styles import Alignment
alignment=Alignment(horizontal='general',
vertical='bottom',
text_rotation=0,
wrap_text=False,
shrink_to_fit=False,
indent=0)
Copy the code
To protect the style
Lock and hide
from openpyxl.styles import Protection
protection = Protection(locked=True, hidden=False)
Copy the code
Apply styles to entire rows or columns
col = ws.column_dimensions['A']
col.font = Font(bold=True)
row = ws.row_dimensions[1]
row.font = Font(underline="single")
Copy the code
Change the merged cell style
The merged cell can be manipulated by thinking of being the cell in the upper left corner.
Filter and sort
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
data = [
["Fruit"."Quantity"],
["Kiwi", 3],
["Grape", 15],
["Apple", 3],
["Peach", 3],
["Pomegranate", 3],
["Pear", 3],
["Tangerine", 3],
["Blueberry", 3],
["Mango", 3],
["Watermelon", 3],
["Blackberry", 3],
["Orange", 3],
["Raspberry", 3],
["Banana", 3]]for r in data:
ws.append(r)
ws.auto_filter.ref = "A1:B15"
ws.auto_filter.add_filter_column(0, ["Kiwi"."Apple"."Mango"])
ws.auto_filter.add_sort_condition("B2:B15")
wb.save("filtered.xlsx")
Copy the code
The generated Excel table has the operation of filtering and sorting, but does not show the actual effect, as shown below:
You need to manually click on the Rewrite app to see the effect, as shown below:
Password protection
This function can only provide a very basic password protection, there is no encryption processing, online common cracking software can crack the password. However, daily use is ok.
This function only applies to new Excel tables, not existing Excel tables.
Workbook Workbook protection
The workbook structure can be protected by preventing the view of the hidden sheet, adding, moving, deleting, hiding, or renaming the sheet, as follows:
wb.security.workbookPassword = '... '
wb.security.lockStructure = True
Copy the code
Worksheet protection
Worksheet protection does not require a password, as follows:
ws = wb.active
wb.protection.sheet = True
Copy the code
Openpyxl package files
For more details, visit OpenPyXL Package
More programming teaching please pay attention to the public account: Pan Gao accompany you to learn programming