For more, please visit my personal blog.


preface

A few general libraries for handling Excel tables in Python:

  • XLRD used to read
  • XLWT used to write
  • Xlutils is used to copy and filter documents

None of these libraries is as powerful as OpenPyXL, which can both read and write Excel 2010+ XLSX XLSM XLTX XLTM files. However, the OpenPyXL library is also memory intensive, about 50 times larger than the original file. For example, a 50M Excel file requires 2.5g of memory to run. For a performance comparison of the above libraries, go to The OpenPyXL Performance Test.

The installation

Enter the following command in the terminal:

pip install openpyxl
Copy the code

Openpyxl loses images and charts of the original file when saving Excel tables. If you want to manipulate images, you need the Pillow library, which is installed as follows:

pip install pillow
Copy the code

As the recent military games were held in China, the wall had to be scaled to be installed.

Method of use

Creating an Excel table

Create a new Excel table. By default, there will be a table named Sheet as follows:

from openpyxl import Workbook

wb = Workbook() Create a file object
ws = wb.active  Get the default sheet

wb.save("sample.xlsx")
Copy the code

Open an existing Excel table

Perform the following operations on the existing Excel table:

from openpyxl import Workbook, load_workbook

wb = load_workbook('sample.xlsx')

wb.save("sample.xlsx")
Copy the code

Create/obtain the Sheet

Create a new Sheet with the workbook.create_sheet () method. Sheet1 Sheet2 Sheet3… Way naming; The second argument is where to insert the Sheet, starting with 0 and ending if left blank. As follows:

ws1 = wb.create_sheet("Mysheet") Insert at the end by default

ws2 = wb.create_sheet("Mysheet", 0) Insert in the first position

wb.remove(ws1)    # delete sheet
Copy the code

You can also change the name of the sheet at any time later, as follows:

ws.title = "New Title"
Copy the code

Modify the sheet label color as follows:

ws.sheet_properties.tabColor = "1072BA"
Copy the code

If you know the name of the sheet, you can get it as follows:

ws = wb.get_sheet_by_name("New Title")


ws = wb["New Title"]
Copy the code

You can also get the names of all the sheets, traversing the sheet names as follows:

sheets = wb.sheetnames
for sheet in sheets:
   print(sheets)


for sheet in wb:
   print sheet.title
   
   
['Sheet1'.'New Title'.'Sheet2']
Copy the code

You can also locate the corresponding sheet page, where [0] is the sheet page index, as follows:

sheet_names = wb.sheetnames  Get all sheet page names
ws = wb[sheet_names[0]]
Copy the code

Copy Sheet

Only the value style of the cell can be copied, hyperlink comment blocks, etc., while images and tables, etc., cannot be copied, as follows:

source = wb.active
target = wb.copy_worksheet(source)
Copy the code

Operation cell

Get the cell from Worksheet, or assign a value to the cell directly, as follows:

cell = ws['A4'] Get the cell in row 4, column A

ws['A4'] = 4 # assign 4 to the cell in row 4, column A

ws.cell(row=4, column=2, value=10) # assign 10 to the cell in row 4, column 2
ws.cell(4, 2, 10) # same as above
Copy the code

Gets the cells in the range as follows:

cell_range = ws['A1':'C2']  # fetch the region within A1-C2

colC = ws['C']  Get column C
col_range = ws['C:D']  Get columns C-d
row10 = ws[10]  Get column 10
row_range = ws[5:10]  Get columns 5 through 10
Copy the code

If you get a cell, you can assign it as follows:

cell.value = 'hello, world'Cell = ws. Cell (row= I, column=j, value="The amount")
Copy the code

Gets the value of the cell as follows:

cellValue = ws.cell(row=i, column=j).value
Copy the code

Get the number of rows and columns as follows:

row = ws.max_row # maximum number of rows
column = ws.max_column # maximum number of columns
Copy the code

Get the data line by line as follows:

>>> for row in ws.iter_rows(min_row=1, max_col=3, max_row=2):
...    for cell in row:
...        print(cell)

<Cell Sheet1.A1>
<Cell Sheet1.B1>
<Cell Sheet1.C1>
<Cell Sheet1.A2>
<Cell Sheet1.B2>
<Cell Sheet1.C2>
Copy the code

Columns of data are obtained as follows:

>>> for col in ws.iter_cols(min_row=1, max_col=3, max_row=2):
...     for cell in col:
...         print(cell)
<Cell Sheet1.A1>
<Cell Sheet1.A2>
<Cell Sheet1.B1>
<Cell Sheet1.B2>
<Cell Sheet1.C1>
<Cell Sheet1.C2>
Copy the code

For performance reasons, the worksheet.iter_cols () method cannot be used in read-only mode.

Get all columns or rows as follows:

rows = ws.rows

columns = ws.columns
Copy the code

The worksheet. columns method cannot be used in read-only mode for performance reasons.

If you only want to get values from worksheet, you can use the worksheet. values attribute as follows:

for row in ws.values:
   for value in row:
     print(value)
Copy the code

The worksheet.iter_rows () and worksheet.iter_cols () methods can both add values_only arguments to get only values, as follows:

for row in ws.iter_rows(min_row=1, max_col=3, max_row=2, values_only=True):
    print(row)
Copy the code

Save the file

Save the Workbook using the workbook.save () method, which overwrites the original file without prompting as follows:

wb = Workbook()
wb.save('balances.xlsx')
Copy the code

Gets the cell type

from openpyxl import Workbook, load_workbook
import datetime

wb = load_workbook('sample.xlsx')

ws=wb.active
wb.guess_types = True   Get cell type enabled

ws["A1"]=datetime.datetime(2010, 7, 21)
print ws["A1"].number_format

ws["A2"] ="12%"
print ws["A2"].number_format

ws["A3"] = 1.1print ws["A4"].number_format

ws["A4"] ="China"
print ws["A5"].number_format

wb.save("sample.xlsx")

# Execution result:
# yyyy-mm-dd h:mm:ss
# 0%
# General
# General
# display general for general, '0.00_ 'for number, 0% for percentage
Numbers need to be typed in Excel. Numbers written directly are normal types
Copy the code

Using the formula

from openpyxl import Workbook, load_workbook

wb = load_workbook('sample.xlsx')
ws1=wb.active

ws1["A1"]=1
ws1["A2"]=2
ws1["A3"]=3

ws1["A4"] = "=SUM(1, 1)"
ws1["A5"] = "=SUM(A1:A3)"

print ws1["A4"].value  # print the contents of the formula, not the calculated value of the formula, the program can not get the calculated value
print ws1["A5"].value  # print the contents of the formula, not the calculated value of the formula, the program can not get the calculated value

wb.save("sample.xlsx")
Copy the code

Merged cell

from openpyxl import Workbook, load_workbook

wb = load_workbook('sample.xlsx')
ws1=wb.active

ws.merge_cells('A2:D2')
ws.unmerge_cells('A2:D2')  The script will report an error if it splits the cells separately, so it needs to merge and split the cells again

# or equivalently
ws.merge_cells(start_row=2,start_column=1,end_row=2,end_column=4)
ws.unmerge_cells(start_row=2,start_column=1,end_row=2,end_column=4)

wb.save("sample.xlsx")
Copy the code

Insert a picture

The Pillow library is required, and the installation is as follows:

pip install pillow
Copy the code
from openpyxl import load_workbook
from openpyxl.drawing.image import Image

wb = load_workbook('sample.xlsx')
ws1=wb.active

img = Image('1.png')
ws1.add_image(img, 'A1')

wb.save("sample.xlsx")
Copy the code

Hidden cell

from openpyxl import load_workbook

wb = load_workbook('sample.xlsx')
ws = wb.active

ws.column_dimensions.group('A'.'D', hidden=True)    # hide columns A through D

ws.row_dimensions.group(1, 10, hidden=True)    # Hide lines 1 to 10

ws.row_dimensions[2].hidden Get if the second line is hidden

wb.save("sample.xlsx")
Copy the code

Optimization model

When dealing with very large XLSX files, OpenPyXL’s normal mode cannot handle this load. Fortunately, there are two modes that can read and write an unlimited amount of data with (almost) constant memory consumption.

Read-only mode

from openpyxl import load_workbook
wb = load_workbook(filename='large_file.xlsx', read_only=True)
ws = wb['big_data']

for row in ws.rows:
    for cell in row:
        print(cell.value)
Copy the code

Only write mode

from openpyxl import Workbook
wb = Workbook(write_only=True)
ws = wb.create_sheet()

# now we'll fill it with 100 rows x 200 columns

for irow in range(100):
...  ws.append(['%d' % i for i in range(200)])
# save the file
wb.save('new_big_file.xlsx') # doctest: +SKIP
Copy the code
  • Unlike a normal workbook, a newly created write-only workbook does not contain any worksheets; You must use thecreate_sheet()Method specifically to create a worksheet.
  • In a write-only workbook, onlyappend()Add a line. usecell()iter_rows()It is not possible to write (or read) cells anywhere.
  • It can export an unlimited amount of data (even more than Excel can actually handle) while keeping memory usage under 10Mb.

Insert/delete rows/columns, move zone cells

Insert rows/columns

Insert a line above line 7 as follows:

ws.insert_rows(7)
Copy the code

Insert a column to the left of column 7 as follows:

ws.insert_cols(7)
Copy the code

Deletes rows/columns

Starting with column 6, delete 3 columns, i.e., columns 6, 7, 8, as follows:

ws.delete_cols(6, 3)
Copy the code

Move area cells

Move the D4:F10 region up one row and to the right 2 columns as follows:

ws.move_range("D4:F10", rows=-1, cols=2)
Copy the code

If the region contains a formula, then the following method can be moved along with the formula:

ws.move_range("G4:H10", rows=1, cols=1, translate=True)
Copy the code

Use Pandas and NumPy

For details, see Working with Pandas and NumPy

The chart

A chart consists of one or more cell area data points in at least one series. For more information, please refer to the chart

annotation

Openpyxl can read/write comments, but formatting information is lost. Action comments are not supported in read-only mode. Comments must include content and author.

Read the comments as follows:

comment = ws["A1"].comment
comment.text # Comment content
comment.author # Comment author
Copy the code

Write a comment like this:

comment = Comment("Text"."Author")
comment.width = 300 # set width
comment.height = 50 # set height
ws["A1"].comment = comment
ws["B2"].comment = comment
Copy the code

Table style

Font style

Font name, font size, font color, bold, italic, vertical alignment (baseline, superscript, subscript), underline, strikeout, as follows:

from openpyxl.styles import Font

font = Font(name='Calibri',
            size=11,
            color='FF000000',
            bold=False,
            italic=False,
            vertAlign=None,
            underline='none',
            strike=False)

ws['A1'].font = font

cell2.font = Font(name=cell1.font.name, sz=cell1.font.sz, b=cell1.font.b, i=cell1.font.i)
Copy the code

The font color can be RGB or aRGB, as follows:

font = Font(color="FFBB00")

font = Font(color="FFFFBB00")
Copy the code

Inherit and rewrite the style as follows:

ft1 = Font(name='Arial', size=14)
ft2 = copy(ft1)
ft2.name = "Tahoma"
Copy the code

Fill the style

For details, move to fill styles

from openpyxl.styles import PatternFill

The style of # fill_type is None or solid
cell2.fill = PatternFill(fill_type=cell1.fill.fill_type, fgColor=cell1.fill.fgColor)
Copy the code

Border style

For details, move to border styles

from openpyxl.styles import Border, Side

border = Border(left=Side(border_style=None, color='FF000000'),
                right=Side(border_style=None, color='FF000000'),
                top=Side(border_style=None, color='FF000000'),
                bottom=Side(border_style=None, color='FF000000'),
                diagonal=Side(border_style=None, color='FF000000'),
                diagonal_direction=0,
                outline=Side(border_style=None, color='FF000000'),
                vertical=Side(border_style=None, color='FF000000'),
                horizontal=Side(border_style=None, color='FF000000'))Copy the code

The alignment style

Horizontal: distributed, justify, center, left, fill, centerContinuous, right, general vertical bottom, distributed, justify, center, top

from openpyxl.styles import Alignment

alignment=Alignment(horizontal='general',
                    vertical='bottom',
                    text_rotation=0,
                    wrap_text=False,
                    shrink_to_fit=False,
                    indent=0)
Copy the code

To protect the style

Lock and hide

from openpyxl.styles import Protection

protection = Protection(locked=True, hidden=False)
Copy the code

Apply styles to entire rows or columns

col = ws.column_dimensions['A']
col.font = Font(bold=True)
row = ws.row_dimensions[1]
row.font = Font(underline="single")
Copy the code

Change the merged cell style

The merged cell can be manipulated by thinking of being the cell in the upper left corner.

Filter and sort

from openpyxl import Workbook

wb = Workbook()
ws = wb.active

data = [
    ["Fruit"."Quantity"],
    ["Kiwi", 3],
    ["Grape", 15],
    ["Apple", 3],
    ["Peach", 3],
    ["Pomegranate", 3],
    ["Pear", 3],
    ["Tangerine", 3],
    ["Blueberry", 3],
    ["Mango", 3],
    ["Watermelon", 3],
    ["Blackberry", 3],
    ["Orange", 3],
    ["Raspberry", 3],
    ["Banana", 3]]for r in data:
    ws.append(r)

ws.auto_filter.ref = "A1:B15"
ws.auto_filter.add_filter_column(0, ["Kiwi"."Apple"."Mango"])
ws.auto_filter.add_sort_condition("B2:B15")

wb.save("filtered.xlsx")
Copy the code

The generated Excel table has the operation of filtering and sorting, but does not show the actual effect, as shown below:

You need to manually click on the Rewrite app to see the effect, as shown below:

Password protection

This function can only provide a very basic password protection, there is no encryption processing, online common cracking software can crack the password. However, daily use is ok.

This function only applies to new Excel tables, not existing Excel tables.

Workbook Workbook protection

The workbook structure can be protected by preventing the view of the hidden sheet, adding, moving, deleting, hiding, or renaming the sheet, as follows:

wb.security.workbookPassword = '... '
wb.security.lockStructure = True
Copy the code

Worksheet protection

Worksheet protection does not require a password, as follows:

ws = wb.active
wb.protection.sheet = True
Copy the code

Openpyxl package files

For more details, visit OpenPyXL Package


More programming teaching please pay attention to the public account: Pan Gao accompany you to learn programming