Microsoft Office is widely used in business and operational analytics, with Excel being especially popular. Excel can be used to store tabular data, create reports, graph trends, and more. Before delving into processing Excel documents in Python, let’s look at some basic terms:
- Spreadsheet or Workbook – refers to the document itself (.xls or.xlsx).
- Worksheet or Sheet – A single table of contents in a workbook. A spreadsheet can contain multiple worksheets.
- Column – A vertical sequence of numbers marked with an English letter, beginning with “A”.
- Row – a horizontal sequence of numbers starting at 1.
- Cell – combination of columns and rows, such as A1.
In this article, we’ll work with Excel spreadsheets using Python. You will learn the following:
- Python reads and writes Excel third-party libraries
- Get the worksheet from the workbook
- Read cell data
- Iterate over rows and columns
- Write to an Excel spreadsheet
- Add and delete worksheets
- Add and delete rows and columns
Most companies and universities use Excel, which can be used in a number of different ways and can be enhanced with Visual Basic for Applications (VBA). However, VBA is a bit clunky, which is why it’s important to learn how to use Excel with Python.
Now let’s learn how to use Python to process Microsoft Excel spreadsheets!
Python is a third-party library for handling Excel
You can use Python to create, read, and write Excel spreadsheets. However, The Python standard library does not support using Excel, so you need to install third-party packages. The most popular of these is OpenPyXL, whose documentation you can read here:
https://openpyxl.readthedocs.io/en/stable/
OpenPyXL isn’t your only option, there are several other software packages that support Microsoft Excel:
- XLRD – Used to read Excel (.xls) files in older formats
- XLWT – Used to write old format Excel (.xls) files
- Xlwings – Excel format for new formats with macro capabilities
A few years ago, the first two were Python’s most popular libraries for manipulating Excel documents. However, the authors of these packages have stopped maintaining them. The XLwings package has great potential, but it doesn’t work on all platforms and requires Microsoft Excel to be installed.
You will use OpenPyXL in this article because it is under continuous development and maintenance. OpenPyXL does not require Microsoft Excel to be installed and can be used on all platforms.
You can install OpenPyXL with the PIP command:
$ python -m pip install openpyxl
Copy the code
With the installation complete, let’s learn how to read Excel spreadsheets using OpenPyXL!
Get the worksheet from the workbook
The first step is to find an Excel file to use with OpenPyXL, and a books.xlsx file is available to you in the Github repository for this article project. You can download it at:
https://github.com/driscollis/python101code/tree/master/chapter38_excel
You can also use your own files, although the output of your own files is different from the examples in this article.
The next step is to write some code to open the spreadsheet. To do this, create a new file called open_workbook.py and add the following code to it:
# open_workbook.py
from openpyxl import load_workbook
def open_workbook(path):
workbook = load_workbook(filename=path)
print(f'Worksheet names: {workbook.sheetnames}')
sheet = workbook.active
print(sheet)
print(f'The title of the Worksheet is: {sheet.title}')
if __name__ == '__main__':
open_workbook('books.xlsx')
Copy the code
In the example above, you import the load_workbook () function from OpenPyXL, and then create the open_workbook () function to import it into the path of the Excel spreadsheet. Next, use the load_workbook () to create a openpyxl workbook. Workbook. The workbook object. This object gives you access to worksheets and cells in a spreadsheet. It does indeed have a double workbook name, and that’s not a typo!
The rest of the open_workbook () function shows how to print out all the currently defined worksheets in the spreadsheet, how to get the currently active worksheet, and how to print the title of the worksheet.
When you run this code, you see the following output:
Worksheet names: ['Sheet 1 - Books']
<Worksheet "Sheet 1 - Books">
The title of the Worksheet is: Sheet 1 - Books
Copy the code
Now that you know how to access the worksheet in the spreadsheet, you can proceed to access the cell data!
Read cell data
With Microsoft Excel, data is stored in cells. You need to make Python accessible to these cells in order to extract this data. OpenPyXL makes this process easy.
Create a new file called workbook_cells.py and add the following code:
# workbook_cells.py
from openpyxl import load_workbook
def get_cell_info(path):
workbook = load_workbook(filename=path)
sheet = workbook.active
print(sheet)
print(f'The title of the Worksheet is: {sheet.title}')
print(f'The value of {sheet["A2"].value=}')
print(f'The value of {sheet["A3"].value=}')
cell = sheet['B3']
print(f'{cell.value=}')
if __name__ == '__main__':
get_cell_info('books.xlsx')
Copy the code
This script loads Excel files in the OpenPyXL workbook. You get the current worksheet and print out its title and several different cell values. You can access cells using a worksheet object, followed by square brackets and the column name and row number within it. For example, Sheet [“A2”] will get you the cell in row 2, column A. To get the value of the cell, use the value attribute.
Note: This code uses the new f-string formatting feature in Python 3.8. If you run it with an earlier version, you will receive an error message.
When you run this code, you get the following output:
<Worksheet "Sheet 1 - Books">
The title of the Worksheet is: Sheet 1 - Books
The value of sheet["A2"].value='Title'
The value of sheet["A3"].value='Python 101'
cell.value='Mike Driscoll'
Copy the code
You can try to use some of its other properties to get additional information about the cell. Add the following function to the file and update the conditional statement at the end to run it:
def get_info_by_coord(path):
workbook = load_workbook(filename=path)
sheet = workbook.active
cell = sheet['A2']
print(f'Row {cell.row}, Col {cell.column} = {cell.value}')
print(f'{cell.value=} is at {cell.coordinate=}')
if __name__ == '__main__':
get_info_by_coord('books.xlsx')
Copy the code
In this example, you will use the row and column properties of the cell object to get row and column information. Notice that the “A” column maps to “1”, “B” maps to “2”, and so on. If you want to traverse an Excel document, you can use the coordinate attribute to get the cell name.
When you run this code, the output looks like this:
Row 2, Col 1 = Title
cell.value='Title' is at cell.coordinate='A2'
Copy the code
Speaking of traversal, let’s take a look at the next step!
Iterate over rows and columns
Sometimes, you will need to traverse the entire Excel spreadsheet or parts of the spreadsheet. OpenPyXL allows you to do this in several different ways. Create a new file called iterating_over_cells.py and write the following code to it:
# iterating_over_cells.py
from openpyxl import load_workbook
def iterating_range(path):
workbook = load_workbook(filename=path)
sheet = workbook.active
for cell in sheet['A'] :print(cell)
if __name__ == '__main__':
iterating_range('books.xlsx')
Copy the code
Here, you load the spreadsheet and iterate through all the cells in the “A” column. For each cell, the cell object is printed out. If you want to format the output more finely, you can use some of the cell attributes mentioned in the previous section.
This is achieved by running this code:
<Cell 'Sheet 1 - Books'.A1>
<Cell 'Sheet 1 - Books'.A2>
<Cell 'Sheet 1 - Books'.A3>
<Cell 'Sheet 1 - Books'.A4>
<Cell 'Sheet 1 - Books'.A5>
<Cell 'Sheet 1 - Books'.A6>
<Cell 'Sheet 1 - Books'.A7>
<Cell 'Sheet 1 - Books'.A8>
<Cell 'Sheet 1 - Books'.A9>
<Cell 'Sheet 1 - Books'.A10>
# output truncated for brevity
Copy the code
The output is truncated because it will print many cells by default. OpenPyXL provides additional ways to traverse rows and columns by using the iter_rows () and iter_cols () functions. These methods take the following parameters:
min_row
max_row
min_col
max_col
You can also add a values_only parameter that tells OpenPyXL to return the value of the cell instead of the cell object. Go ahead and create a new file called iterating_over_cell_values.py and add the following code to it:
# iterating_over_cell_values.py
from openpyxl import load_workbook
def iterating_over_values(path):
workbook = load_workbook(filename=path)
sheet = workbook.active
for value in sheet.iter_rows(
min_row=1, max_row=3,
min_col=1, max_col=3,
values_only=True,
):
print(value)
if __name__ == '__main__':
iterating_over_values('books.xlsx')
Copy the code
This code demonstrates how to use iter_rows () to iterate over rows in an Excel spreadsheet and print out the values for those rows. When you run this code, you get the following output:
('Books', None, None)
('Title'.'Author'.'Publisher')
('Python 101'.'Mike Driscoll'.'Mouse vs Python')
Copy the code
The output is a Python tuple containing the data in each column. At this point, you’ve seen how to open a spreadsheet and read data from specific cells and through iteration. Now you’re ready to learn how to create Excel spreadsheets using OpenPyXL!
Write to an Excel spreadsheet
Writing Excel spreadsheets using OpenPyXL doesn’t require a lot of code. You can create spreadsheets using the Workbook () class. Go ahead and create a new file called writing_hello.py and add the following code:
# writing_hello.py
from openpyxl import Workbook
def create_workbook(path):
workbook = Workbook()
sheet = workbook.active
sheet['A1'] = 'Hello'
sheet['A2'] = 'from'
sheet['A3'] = 'OpenPyXL'
workbook.save(path)
if __name__ == '__main__':
create_workbook('hello.xlsx')
Copy the code
Here, you initialize the Workbook () and get the current worksheet. Then set the first three lines in the “A” column to different strings. Finally, the save () function is called and passed the path to which the new document is saved. Congratulations to you! You have just created an Excel spreadsheet using Python.
Next, let’s look at how to add and remove worksheets from a workbook!
Add and delete worksheets
Many people prefer to work with data in multiple worksheets in a workbook. OpenPyXL supports adding a new worksheet to a Workbook () object through its create_sheet () method.
Create a new file called creating_sheets.py and add the following code:
# creating_sheets.py
import openpyxl
def create_worksheets(path):
workbook = openpyxl.Workbook()
print(workbook.sheetnames)
# Add a new worksheet
workbook.create_sheet()
print(workbook.sheetnames)
# Insert a worksheet
workbook.create_sheet(index=1,
title='Second sheet')
print(workbook.sheetnames)
workbook.save(path)
if __name__ == '__main__':
create_worksheets('sheets.xlsx')
Copy the code
Here, you use create_sheet () twice to add two new worksheets to the workbook. The second example shows how to set the title of the worksheet and at which index to insert the worksheet. The index = 1 parameter indicates that the worksheet will be added after the first existing worksheet because their indexes start at 0.
When you run this code, you see the following output:
['Sheet']
['Sheet'.'Sheet1']
['Sheet'.'Second sheet'.'Sheet1']
Copy the code
You can see that the new worksheet has been gradually added to your workbook. Once the file is saved, you can verify the existence of multiple worksheets by opening Excel or another Excel compatible application.
After completing the automatic worksheet creation process, there are suddenly too many worksheets, so let’s delete some of them. There are two ways to delete a worksheet, go ahead and create the delete_sheets.py file to see how to delete a worksheet using Python’s del method:
# delete_sheets.py
import openpyxl
def create_worksheets(path):
workbook = openpyxl.Workbook()
workbook.create_sheet()
# Insert a worksheet
workbook.create_sheet(index=1,
title='Second sheet')
print(workbook.sheetnames)
del workbook['Second sheet']
print(workbook.sheetnames)
workbook.save(path)
if __name__ == '__main__':
create_worksheets('del_sheets.xlsx')
Copy the code
This code creates a new workbook and then adds two new worksheets to it. Then delete the workbook[‘Second sheet’] using Python’s del method. You can verify that it works as expected by looking at the printout of the worksheet list before and after using the del command:
['Sheet'.'Second sheet'.'Sheet1']
['Sheet'.'Sheet1']
Copy the code
Another way to remove a worksheet from a workbook is to use the remove () method. Create a new file called remove_sheets.py and enter the following code to see how it works:
# remove_sheets.py
import openpyxl
def remove_worksheets(path):
workbook = openpyxl.Workbook()
sheet1 = workbook.create_sheet()
# Insert a worksheet
workbook.create_sheet(index=1,
title='Second sheet')
print(workbook.sheetnames)
workbook.remove(sheet1)
print(workbook.sheetnames)
workbook.save(path)
if __name__ == '__main__':
remove_worksheets('remove_sheets.xlsx')
Copy the code
At this point you can retain references to the first worksheet created by assigning the results to Sheet1. Then remove it later in the code. Alternatively, you can delete the worksheet using the same syntax as before, as shown below:
workbook.remove(workbook['Sheet1'])
Copy the code
Regardless of which method you choose to delete the worksheet, the output will be the same:
['Sheet'.'Second sheet'.'Sheet1']
['Sheet'.'Second sheet']
Copy the code
Now, learn how to add and remove rows and columns.
Add and delete rows and columns
OpenPyXL has several methods for adding and removing rows and columns in a spreadsheet. These are the four methods to be studied in this section:
.insert_rows()
.delete_rows()
.insert_cols()
.delete_cols()
Each can use the following two parameters:
idx
– Insert indexes for rows or columnsamount
– Number of rows or columns to be added
To see how this works, create a file called insert_demo.py and add the following code to it:
# insert_demo.py
from openpyxl import Workbook
def inserting_cols_rows(path):
workbook = Workbook()
sheet = workbook.active
sheet['A1'] = 'Hello'
sheet['A2'] = 'from'
sheet['A3'] = 'OpenPyXL'
# insert a column before A
sheet.insert_cols(idx=1)
# insert 2 rows starting on the second row
sheet.insert_rows(idx=2, amount=2)
workbook.save(path)
if __name__ == '__main__':
inserting_cols_rows('inserting.xlsx')
Copy the code
Here, you will create A worksheet and insert A new column before the “A” column. Indexes for columns start at 1, while indexes for worksheets start at 0. This effectively moves all the cells in column A to column B. Then insert two new rows starting at line 2.
Now that you know how to insert columns and rows, it’s time to learn how to delete them.
To learn how to delete columns or rows, create a new file called delete_demo.py and add the following code:
# delete_demo.py
from openpyxl import Workbook
def deleting_cols_rows(path):
workbook = Workbook()
sheet = workbook.active
sheet['A1'] = 'Hello'
sheet['B1'] = 'from'
sheet['C1'] = 'OpenPyXL'
sheet['A2'] = 'row 2'
sheet['A3'] = 'row 3'
sheet['A4'] = 'row 4'
# Delete column A
sheet.delete_cols(idx=1)
# delete 2 rows starting on the second row
sheet.delete_rows(idx=2, amount=2)
workbook.save(path)
if __name__ == '__main__':
deleting_cols_rows('deleting.xlsx')
Copy the code
This code creates text in multiple cells and then deletes column A using delete_cols (). It also deletes two rows starting with the second by delete_rows (). Being able to add and remove columns and rows can be very useful when working with data.
conclusion
Because Excel is widely used in many industries, being able to use Python to interact with Excel files is a very useful skill, such as helping a girl with operational data. In this article, you learned the following:
- Python a third-party software package for handling Excel
- Get the worksheet from the workbook
- Read cell data
- Iterate over rows and columns
- Write to an Excel spreadsheet
- Add and delete worksheets
- Add and delete rows and columns
OpenPyXL can do even more than described here. For example, you can use OpenPyXL to add formulas to cells, change fonts, and apply other types of styles to cells. Read the documentation honestly, and try OpenPyXL on some of your spreadsheets to get the most out of it.
Phase to recommend
Complete PyPy mastery in 5 minutes
5 minutes to master common Configuration files in Python
OpenCV artificial intelligence image recognition technology practical case
Click below to read the article and join the community
Give it a thumbs up