This is the sixth day of my participation in the First Challenge 2022. For details: First Challenge 2022.
preface
A few days ago, a friend of mine asked me about learning Python and VBS. He wanted to automate his complicated Excel operation. He realized that both Python and VBS could achieve his goal, so he was wondering which one to learn? Since I’m a Python guy, I suggested Python on the spot.
In order to experience the simplicity and efficiency of Python in practice, we will enter the Excel world of Pandas.
The installation
pip install pandas
Copy the code
When you encounter installation errors, you can usually resolve them by upgrading PIP and setupTools.
practice
About the DataFrame
A DataFrame is an abstract data object (table type) in Pandas. Data in Excel can be converted to a DataFrame object.
DataFrame and Excel attribute comparison table
pandas | Excel |
---|---|
DataFrame | Sheet page |
Series | column |
Index | The line Numbers |
row | line |
NaN | An empty cell |
Data input output
In Jupyter, we used Pandas to print a table with two rows and two columns.
Then we write the above data into excel file, and finally read the data of excel file and print it out.
Data limit
First we increase the data in the test table to 10 rows and 10 columns.
We can re-read the table to show only the first five rows.
If you want to specify an action column name (the starting line), you can use the header argument, which defaults to 0.
As you can see above, the data starts at row 5 and only shows the “first five rows”.
We can also show only data with columns 10 greater than 10.
You’ll see how simple and elegant it all is.
The data processing
If we want to manipulate the data in the table, we can do so. Let’s say I divide column 6 by 2.
The other operations are similar and will not be shown.
All of the above are column-based operations. How do you implement row operations? Simple, you can see that all operations on columns use column names, so that operations on rows use row indexes.
The operation
Show the third row of data.
Add 10 to all the data in row 4.
I’m sure you’ve got the rest of the operation figured out.
Operations on cells
In general, we may need to operate on a cell in addition to rows and columns.
DataFrame is used to locate cells. Pandas, however, provides more than one method. See the following figure for the differences.
methods | explain |
---|---|
at | Uniquely locate cells by row/column labels |
loc | Access cells through labels or arrays |
iat | Row/column cells are accessed through integer positions |
Locate the cell in row 2, column 2.
Suppose we want to square data 4 on column 3, row 3.
The last
In addition, there are many other features that are more flexible than the original Excel: formulas, data formats, sorting, look-and-replace, and perspectives. I strongly recommend that any student who is interested in Pandas try it. It will surprise you.