Author: Unreal good
Source: Hang Seng LIGHT Cloud Community
Background introduction
In the process of quantitative analysis, it is always necessary to use a large number of data bases to mine the association between data and finally find the data we need. Analyzing data in Python alone is very complex. Is there a simpler tool that can help us analyze data efficiently and quickly?
Pandas is a powerful tool set for analyzing structured data.
This article is aimed at students who have some basic Python syntax. Those who need to learn Python can find tutorials in the community (developer.hs.net/course/?nav…). .
The basic concept
The Pandas library is a free, open source, third-party Python library that provides high-performance, easy-to-use data structures for Python data analysis, including Series and DataFrame.
Pandas uses Numpy (high-performance matrix computing) as the foundation; It is used for data mining and analysis, and also provides data cleaning function.
Pandas is based on the Python NumPy library and can be used in conjunction with Python’s scientific computing library.
Pandas has been used in many fields since its birth, including finance, statistics, social sciences, and architectural engineering.
Pandas is used for Pandas. It is very important to understand what Pandas do. Pandas is the Python equivalent of Excel: It uses tables (also known as dataframes) and can do a variety of transformations on data, but there are many other functions.
The data structure
DataFrame
A DataFrame is a tabular data structure that contains an ordered set of columns, each of which can be of a different value type (numeric, string, Boolean). DataFrame has both row and column indexes, and can be thought of as a dictionary of Series (with a common index).
The DataFrame constructor is as follows:
pandas.DataFrame( data, index, columns, dtype, copy)
Copy the code
Parameter Description:
- Data: A group of data (ndarray, Series, Map, Lists, dict, etc.).
- Index: The index value, or the row label.
- Columns: The column label. The default value is RangeIndex (0, 1, 2… , n).
- Dtype: indicates the data type.
- Copy: copies data. The default value is False.
Series
A Series is like a column in a table, like a one-dimensional array, and can hold any data type.
Series consists of indexes and columns, and functions are as follows:
pandas.Series( data, index, dtype, name, copy)
Copy the code
Parameter Description:
- Data: A group of data (ndarray type).
- Index: data index label. If this parameter is not specified, the value starts from 0 by default.
- Dtype: indicates the data type, which is determined by default.
- Name: Set the name.
- Copy: copies data. The default value is False.
Quick learning
The introduction of the component
Importing the Pandas component into the code:
import pandas as pd
Copy the code
If not, there is a problem with the environment configuration or you didn’t download it at all. Download the component by:
pip install Pandas
Copy the code
Series object manipulation
The Series() function creates a Series object from which the corresponding methods and properties can be called:
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)
Copy the code
DataFrame object operation
The syntax for creating an object from DataFrame() is as follows:
Import pandas as pd data = [1,2,3,4,5] df = pd.DataFrame(data) print(df)Copy the code
Reading file data
Local.csv files can be read using the read_csv() function:
Data = pd.read_csv('file.csv') data = pd.read_csv('file.csv', nrows=1000, skiprows=[1,5], encoding= GBK)Copy the code
Parameter Meanings:
'file.csv'
: indicates the file name that can be added to the system location for readingnrows
: indicates the number of rows before readingskiprows
: indicates the number of unread lines that are automatically skipped when a file is read.encoding
: Indicates the encoding format of the read file
Similar to read_csv, read_excel reads data from Excel files.
Write file data
Pandas provides the to_csv() function to convert DataFrame to CSV data. If you want to write CSV data to a file, you simply pass a file object to the function. Otherwise, the CSV data is returned as a string.
Data. To_csv (' my_new_file. CSV, 'index = None)Copy the code
Parameter Meanings:
index
: indicates whether to add an index. The index is automatically added by default
Similar to to_CSV, to_excel writes data to Excel files.
conclusion
The Pandas tool set is designed to help Pandas process and analyze data quickly. It will be updated in the future.