Python operation Excel

Commonly used tools

Data processing is a big use of Python, and Excel is the most popular data processing software. So when you do data processing in Python, it’s easy to get involved with Excel. Thanks to the hard work of previous generations, There are many wheels in Python for Excel processing, such as XLRD & XLWT & Xlutils, XlsxWriter, OpenPyXL, On The Windows platform, you can directly call the open interface of Microsoft Excel. These are commonly used tools, and there are some other excellent tools here. Next, we will show the characteristics of each tool through a table:

type xlrd&xlwt&xlutils XlsxWriter OpenPyXL Excel Open Interface
read support Does not support support support
write support support support support
Modify the support Does not support support support
xls support Does not support Does not support support
xlsx High version support support support
A large file Does not support support support Does not support
The efficiency of fast fast fast Super slow
function The weaker strong general Super strong

The above tools can be selected according to different needs. Now we mainly introduce the use of the most commonly used XLRD & XLWT & Xlutils series tools.

XLRD & XLWT & Xlutils introduction

Xlrd&xlwt &xlutils consists of the following three libraries:

  • xlrd: used to read Excel files;
  • xlwt: used to write Excel files.
  • xlutils: Utility for manipulating Excel files, such as copying, splitting, filtering, etc.

Install the library

It is easy to install the three libraries directly with the PIP tool. The installation command is as follows:

$ pip install xlrd xlwt xlutils
1
Copy the code





Write to Excel

Let’s start by writing Excel. Without further ado, let’s look at the code as follows:

Wb = xlwt.workbook () # Add two form pages sh1 = wb.add_sheet(' score ') sh2 = wb.add_sheet(' summary ') # Then add the data by position, the first argument is row, Write (0, 1, 'professional ') sh1.write(0, 2,' subject ') sh1.write(0, 3, 'subject ') sh1.write(0, 3,' subject ') 'results') sh1. Write (1, 0,' zhang ') sh1. Write (1, 1, 'information and communication engineering) sh1. Write (1, 2,' numerical analysis) sh1. Write (1, 3, 88) sh1. Write (2, 0, 'bill') sh1. Write (2, 1, 'Internet of things engineering) sh1. Write (2, 2,' digital signal processing analysis) sh1. Write (2, 3, 95) sh1. Write (3, 0, 'wang hua) sh1. Write (3, 1, 'Electronics & Communication Engineering ') sh1.write(3, 2,' fuzzy mathematics ') sh1.write(3, 3, 90) # Write the second sheet sh2.write(0, 0, 'total score ') sh2.write(1, 0, Wb. save('test.xls')Copy the code

Run the code and you will see an Excel file named test.xls generated. Open the file and view it as shown below:





The above is the code to write Excel, is not very simple, let’s see how to read Excel.

Reading Excel

It is not difficult to read Excel, as shown in the following code:

Wb = XLRD. Open_workbook ("test_w.xls") # print("sheet ") # print("sheet" Print ("sheet :", wb.nsheets) # print("sheet :", Sheet_by_index (0) # sh = Wb.sheet_by_name (' sheet_name ') # Print (u"sheet %s %d row %d row "% (sh1.name, sh1.nrows, Print (" sh1.cell_value(0, sh1.cell_value(0, sh1.cell_value(0, sh1.cell_value) ") Rows = sh1.row_values(0) cols = sh1.col_values(1) Print (" sh1.cell(1, 1, 1, 1) print(" sh1.cell(1, 1, 2, 1) 0).ctype) # For sh in wb.sheets(): for rin range(sh.nrows):Copy the code

The following output is displayed:



Careful friends may notice that we can get hereThe type of cell, which we got when we read the type aboveThe number 1So what type does 1 represent and what types are there? Don’t worry. Let’s show it in a table:

The numerical type instructions
0 empty empty
1 string string
2 number digital
3 date The date of
4 boolean Boolean value
5 error error

From the table above, we can see that the number 1 just returned by the cell type is a string type.

Modify the excel

Now that we’ve talked about writing and reading Excel content, we’re going to talk about how to update and modify Excel, using the methods in Xlutils. Go straight to the code and see the simplest modification:

Readbook = XLRD. Open_workbook ("test_w.xls") # copy a copy of wb = Copy (readbook) # select first form sh1 = wb.get_sheet(0) # add data to fifth line sh1.write(4, 0, 'w ') sh1.write(4, 1,' communication project ') sh1.write(4, 1, 'communication project ') sh1.write(4, 1,' communication project ') sh1.write(4, 1, 'communication project ') Sh1. write(1, 0, 362) # Save wb.save('test.xls')Copy the code

As can be seen from the above code, the Excel modification here is to copy the entire original Excel through the copy method of Xlutils library, and then modify the operation, and finally save. The modification result is as follows:



Format conversion operation

When we use Excel, we will format the data or set the style. Here, we will simply modify the code described above to change the output format slightly. The code is as follows:

StyleBR = xlwt.easyxf('font: name Times New Roman, color-index red, Easyxf (num_format_str='#,##0.00') # Set date format to yyyY-MM-DD styleDate = Xlwt.easyxf (num_format_str='YYYY-MM-DD') # create XLS file object wb = xlwt.workbook () # add two form pages sh1 = wb.add_sheet(' result ') sh2 = Wb.add_sheet (' summary ') # Then add data by position, the first argument is row, Sh1. write(0, 1, 'date ', styleBR) # set the header font to bold red sh1.write(0, 2,' result ', styleBR) # set the header font to bold red Write (1, 1, '2020-07-01', styleDate) sh1.write(1, 1, '2020-07-01', styleDate) sh1.write(1, 2, 90, styleDate) sh1.write(1, 2, 90, styleDate) sh1.write(1, 2, 90, styleDate) StyleNum) sh1. Write (2, 1, '2020-08-02') sh1. Write (2, 2, 2, 95) sh1. Align.horz = xlwt.align.horz_center style = xlwt.xfstyle () Merge (3, 3, 0, 1, 'score ', style) # merge(3, 3, 0, 1,' score ', style) # merge(3, 3, 0, 1, 'score ', style) Write (3, 2, xlwt.Formula("C2+C3")) # write data to sheet2 sh2.write(0, 0, 'total score ', styleBR) sh2.write(1, 0, Wb. save('test.xls')Copy the code

Output result:



As you can see, we can use code to set fonts, colors, alignments, merges and other normal Excel operations, as well as format date and number type data. Of course, this is only a partial introduction of the functions, but it is enough for our daily use, if you want to know more functions, you can refer to the official website.

Python-excel official website: www.python-excel.org/

Python action Word

Install python – docx

The python-docx library (0.8.10) is used to process Word.

$PIP install python - docx # # # # # # # # # # # # # # # # # operation result # # # # # # # # # # # # # # # # C: \ Users \ Y > PIP install python - docx & indexes in:  https://pypi.doubanio.com/simple Collecting python-docx Downloading https://pypi.doubanio.com/packages/e4/83/c66a1934ed5ed8ab1dbb9931f1779079f8bca0f6bbc5793c06c4b5e7d671/python-docx-0.8.10 . Tar. Gz (5.5 MB) | █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ | 5.5 MB 3.2 MB/s Requirement already satisfied: LXML >=2.3.2 in C :\users\y\ AppData \local\programs\python\python37\lib\site-packages (from python-docx) (4.5.0) Building wheels for collected packages: python-docx Building wheel for python-docx (setup.py) ... done Created wheel for python-docx: Filename = python_docx - 0.8.10 cp37 - none - any. WHL size = 184496 sha256=7ac76d3eec848a255b4f197d07e7b78ab33598c814d536d9b3c90b5a3e2a57fb Stored in directory: C:\Users\Y\AppData\Local\pip\Cache\wheels\05\7d\71\bb534b75918095724d0342119154c3d0fc035cedfe2f6c9a6c Successfully built Python-docx Installing COLLECTED packages: python-docx Successfully installed python-docX-0.8.10Copy the code

OK. If the preceding information is displayed, the installation is successful.

Write the Word

When we use Word to write a document, we usually divide it into several parts: title, chapter, paragraph, picture, table, quotation and bullet number. Here’s how to use Python in each of these sections.

The title

Creating the Document title is relatively simple, creating a blank Document with Document() and simply calling the add_heading method to create the title.

From docx import Document from docx.shared import Pt from docx.shared import Inches from docx.oxml. Ns import qn # Add_heading (' How to use Python to create and manipulate Word',0) # Save file doc1.save('word1.docx')Copy the code

This completes the creation of the document and the title of the article. Run the program and it will generate a document named word1.docx. Open the article and it will look like the image below:

Chapters and paragraphs

With the title of the article, let’s look at how the section and paragraph operate. Add the section and paragraph operation after the above code as follows:

From docx import Document from docx.shared import Pt from docx.shared import Inches from docx.oxml. Ns import qn # Doc1.add_heading (' How to use Python to create and manipulate Word',0) # create paragraph description doc1.add_paragraph(' Word Documents are often used in our daily life and work. We usually use WPS or Office to process Word. We may not have thought that it can be generated in Python. Add_headings (' Install python-docx library ',1) # create paragraph description of doc1.add_paragraph(' Now we will introduce how to install python-docx library, in the following two steps: Doc1. add_headings (' Step 1: Install Python',2) # create a paragraph describing doc1.add_paragraph(' Download the Python installation package from the Python website and install it. ') # create level 3 heading doc1.add_heading(' step 2: Doc1.add_paragraph ('window win+R ', enter CMD to open the command line, enter PIP install python-docx to download. ') # Save file doc1.save('word2.docx')Copy the code

Above we said that the add_heading method is used to add the article title, but we can see from the above code that this method is usedThe second parameter is a numberIn fact, this is to useTo mark several levels of headings“Is used to mark chapters in our daily life. The add_Paragraph method is used to add paragraphs to the text. Run the program to see what happens:

Fonts and references

We added three paragraphs using the add_Paragraph method. Now let’s look at how we operate with the font in the paragraph and refer to the paragraph. Continue to modify the above code, add the article font size, bold, tilt and other operations, the specific code is as follows:

From docx import Document from docx.shared import Pt from docx.shared import Inches from docx.oxml. Ns import qn From docx.shared import RGBColor # create a blank doc1 = Document() # add doc1.add_heading(' How to use Python to create and operate Word',0) # Doc1.add_paragraph (' Word document is often used in our daily life and work. We usually use WPS or Office to process Word. We may not think that it can be generated in Python. Here's how to do it... Add_headings (' Install python-docx library ',1) # create paragraph description of doc1.add_paragraph(' Now we will introduce how to install python-docx library, in the following two steps: Doc1. add_headings (' Step 1: Install Python',2) # create a paragraph describing doc1.add_paragraph(' Download the Python installation package from the Python website and install it. ') # create level 3 heading doc1.add_heading(' step 2: Doc1.add_paragraph ('window win+R ', enter CMD to open the command line, enter PIP install python-docx to download. Paragraph = doc1.add_PARAGRAPH (' This is the installation description for step 2! ') # add text to paragraph and set font size run = paragraph.add_run('(note: Size = Pt(20) # run = doc1.add_paragraph(' run = doc1.add_paragraph ') ').add_run('This Font is Times New Roman') run.font. Name = 'Times New Roman' # Doc1.add_paragraph (' here set the Chinese font: ').add_run(' current font is bold ') run.font. Name =' bold 'r = run._element r.pr.rfonts. Set (qn('w:eastAsia'), 'boldface ') # set italic run = doc1.add_paragraph(' ').add_run(' italic ') run.italic = True # run = doc1.add_paragraph(' ').add_run(' set this to bold '). Bold = True # Set the font to be underlined. Run = doc1.add_paragraph(' This is underlined: ').add_run(' underline ').underline = True run = doc1.add_paragraph(' ').add_run(' here set font to red ') run.font-color.rgb = RGBColor(0xFF, 0x00, 0x00) # add reference to doc1.add_paragraph(' here we quote the following paragraph: Change your life and change the world with Python. FIGHTING. ', style='Intense Quote') # save file doc1.save('word2.docx')Copy the code

The above code is mainly for paragraph font Settings, each paragraph is marked with comments should be easier to understand, run the program to see the effect:

A list of items

In order to display the contents in a list, we usually use bullets and numbers to display the contents in a list. Let’s create a new file word1.py and write the following code:

From docx import Document from docx.shared import Pt from docx.shared import Inches from docx.oxml. Ns import qn # Doc2 = Document() doc2.add_paragraph(' Which is not an animal: ') # add unordered List doc2.add_paragraph(' apple ', style='List Bullet') Style =' Bullet') doc2.add_paragraph(' lazy ', style=' Bullet') doc2.add_paragraph(' Bullet', style=' Bullet') Style ='List Bullet') doc2.add_paragraph(' grey Wolf ', style='List Bullet') doc2.add_paragraph('2020 annual plan: Add_paragraph ('CSDN reach blog expert ', style='List Number') Style ='List Number') doc2.add_paragraph(' learn a new point every day ', style='List Number') doc2.add_paragraph(' learn 50 books ', Style ='List Number') doc2.add_paragraph(' reduce hours ', style='List Number') # save file doc2.save('word1.docx')Copy the code

Pictures and tables

How to insert images and tables in Python? First we found a random image. I used the Python logo image, named python-logo. PNG, and added the image with add_picture. Add the table using add_table and add the following code to the word1.py file:

From docx import Document from docx.shared import Pt from docx.shared import Inches from docx.oxml. Ns import qn # Doc2 = Document() doc2.add_paragraph(' Which is not an animal: ') # add unordered List doc2.add_paragraph(' apple ', style='List Bullet') Style =' Bullet') doc2.add_paragraph(' lazy ', style=' Bullet') doc2.add_paragraph(' Bullet', style=' Bullet') Style ='List Bullet') doc2.add_paragraph(' grey Wolf ', style='List Bullet') doc2.add_paragraph('2020 annual plan: Add_paragraph ('CSDN reach blog expert ', style='List Number') Style ='List Number') doc2.add_paragraph(' learn a new point every day ', style='List Number') doc2.add_paragraph(' learn 50 books ', Style ='List Number') doc2.add_paragraph(' reduce work hours ', Style = "List Number ') doc2. Add_heading (' images', 2) increased image doc2 # add_picture (' C: / Users/Y/Pictures/python - logo. PNG ', Width =Inches(5.5)) doc2.add_heading(' table ',2) # Table = doc2.add_table(rows=1, Cols =4) hdr_cells = table.rows[0].cells hdr_cells[0].text = 'id' hdr_cells[1].text = 'id' hdr_cells[2].text = 'career' # Is this table data records = ((1, 'zhang', 'the electrician), (2,' zhang five ', 'boss'), (3, 'the horse 6', 'IT'), (4, 'bill', 'engineer')) # traversal data and display the for id, name, work in records: Row_cells = table.add_row().cells row_cells[0]. Text = STR (id) row_cells[1]. Text = name row_cells[2] Doc2.add_page_break () # Save file doc2.save('word1.docx')Copy the code

Reading Word files

Use Python to create a blank Word file, format the font, and save it to the file.

Doc1 = Document('word1.docx') pl = [paragraph Print ('###### print the contents of word1's article ######') Doc2 = Document('word2.docx') print('\n###### ') pl2 = [paragraph. Text for paragraph / / What's more... / / what's more... / / What's more... Tables = [table for table in doc2.tables] for table in tables: for row in table.rows: for cell in row.cells: print (cell.text,end=' ') print() print('\n')Copy the code

The above code is to read out the contents of the two documents we printed earlier, of course, just print to the console, and do nothing else. Now let’s execute the result:

Python CSV operation

Introduction to the

CSV

CSV, short for comma-separated Values, stores table data (numbers and text) in plain text format. Essentially, it is a sequence of characters that can consist of any number of records Separated by a newline character. Each record consists of fields. Typically, all records have exactly the same sequence of fields, separated by commas or tabs. CSV file format is simple, universal, has a wide range of applications in reality, among which the most used is to transfer table data between programs.

CSV and Excel

Because CSV files and Excel files are opened with Excel tools by default, what’s the difference between them? Let’s take a look at the table below.

CSV Excel
The file suffix is.csv The file extension is.xls or.xlsx
Plain text file Binary file
The stored data does not contain formats, formulas, etc You can not only store data, but also manipulate it
It can be opened using Excel tools or a text editor It can only be opened through the Excel tool
Column headers can only be written once Each column in each row has a start tag and an end tag
Data import consumes less memory Data consumes a lot of memory

The basic use

Python uses the CSV module to read and write data in CSV files. The module provides the function of exporting and reading data files in Excel mode, so we do not need to know the details of THE CSV format used in Excel. It can also define CSV formats available to other applications or specific requirements.

CSV module uses reader class and Writer class to read and write serialized data, and uses DictReader class and DictWriter class to read and write data in the form of dictionary. The following is a detailed look at corresponding functions. First, take a look at the constant information of the CSV module, as follows:

attribute instructions
QUOTE_ALL Instructs the Writer object to quote all fields
QUOTE_MINIMAL Instructs the Writer object to quote only fields that contain special characters, such as delimiters, quote characters, and line terminators
QUOTE_NONNUMERIC Instructs the Writer object to quote all non-numeric fields
QUOTE_NONE Indicates that writer objects do not use quotes to elicit fields

Writer (csvfile, the dialect = “excel”, * * fmtparams)

Returns a Writer object that converts the user’s data into a delimited string on the given file-like object.

  • csvfileIt can be any object with a write() method, or if csvFile is a file object, open it with newline= ‘;
  • Optional parametersdialecT is a specific set of parameters for different CSV variants;
  • Optional keyword parameterfmtparamsYou can override individual formatting Settings in the current variant format.

Take a look at an example:

import csv with open('test.csv', 'w', newline='') as csvfile: Writer. = the CSV writer (csvfile) writer. Writerow ([' id ', 'name', 'age']) # write multiline data = [(' 1001 ', 'zhang', '21'), (' 1002 ', 'bill', '31')] writer.writerows(data)Copy the code

Reader (csvfile, the dialect = “excel”, * * fmtparams)

Return a Reader object that iterates through the csvFile line by line. The csvFile can be a file object or a list object. If it is a file object, it is opened with newline= “. Take a look at an example:

import csv

with open('test.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=' ')
    for row in reader:
        print(', '.join(row))
Copy the code

Sniffer class

Used to infer the format of a CSV file, this class provides the following two methods:

sniff(sample, delimiters=None)

The given sample is parsed, and if the optional delimiters parameter is given, it is interpreted as a string containing possible valid delimiters.

has_header(sample)

Parse the sample text (assumed in CSV format) and return True if the first line is likely to be a series of column headings.

This class and its methods are rarely used. Here is a simple example.

import csv

with open('test.csv', newline='') as csvfile:
     dialect = csv.Sniffer().sniff(csvfile.read(1024))
     csvfile.seek(0)
     reader = csv.reader(csvfile, dialect)
     for row in reader:
         print(row)
Copy the code

Reader object

The Reader object refers to the object returned by the DictReader instance and the Reader () function. Let’s look at its public properties and methods.

next()

Returns the next line of reader’s iterable. The return value may be a list or dictionary.

dialect

Dialect describes, read-only, for use by parsers.

line_num

The number of rows that the source iterator has read.

fieldnames

Field name, which is the DictReader object property.

The Writer object

The Writer object refers to the object returned by DictWriter instances and the Writer () function. Let’s look at its public properties and methods.

writerow(row)

Writes the row argument to writer’s file object.

writerows(rows)

Write all the elements in rows* (that is, the iterator that iterates over multiple of the above *row objects) to writer’s file object.

writeheader()

In the file object of Writer, write a line of field names. This method is the DictWriter object method.

dialect

Dialect Describes the dialect, which is read-only and used by writers.

Write read append state

'r' : read 'w' : write 'a' : 'w+' == w+r 'a+' ==a+r 'a+ r '==a+r 'a+ r' ==a+r 'rb' 'wb' 'ab' rb + ' 'wb +' 'ab +'Copy the code

Resource portal

1\. Follow [do a gentle program ape] public account 2\. In [do a gentle program ape] public account background reply [Python information] [2020 autumn recruit] can obtain corresponding surprise oh!Copy the code