Introduction of Pandas
The Pandas library, built on NumPy, provides easy-to-use data structures and data analysis tools for the Python programming language.
Import Pandas using the following conventions
import pandas as pd
Copy the code
help
help(pd.Series.loc)
Copy the code
Pandas data structure
Series
A one-dimensional token array that can hold any data type
s = pd.Series([1.3.5.7], index=['day'.'to'.'the xuan'.'yellow'])
Copy the code
The left column index
s
Copy the code
Sky 1 Earth 3 Xuan 5 Yellow 7 dtype: int64
Data box (DataFrame)
A two-dimensional labeled data structure for different types of columns, similar to an Excel spreadsheet
The above column name
The left column index
– | The surname | The name | national | Name don’t | age |
---|---|---|---|---|---|
1 | jia | Small arms | han | male | 3 |
2 | jia | Little long | han | male | 1 |
3 | zhang | The duckling | han | female | – |
data = {'name': ['jia'.'jia'.'张'].'name': ['small arms'.'the little long'.'the duckling ́'].'national': ['han'.'han'.'han'].'age': [3.1.None]}
Copy the code
data
Copy the code
{‘ name ‘: [‘ jia’, ‘jia’, ‘a’], ‘name’ : [‘ small arms’, ‘little long’, ‘the duckling ́],’ national ‘: [‘ han’, ‘han’, ‘han’], ‘ages’ : (3, 1, None]}
df = pd.DataFrame(data, columns=['name'.'name'.'age'])
Copy the code
df
Copy the code
– | The surname | The name | national | Name don’t | age |
---|---|---|---|---|---|
1 | jia | Small arms | han | male | 3 |
2 | jia | Little long | han | male | 1 |
3 | zhang | The duckling | han | female | – |
File I/O
Read and write the CSV
pd.read_csv('file.csv', header=None, nrows=5)
df.to_csv('myDataFrame.csv')
Copy the code
Read and write Excel
pd.read_excel('file.xlsx')
pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')
xlsx = pd.ExcelFile('file.xls')
df = pd.read_excel(xlsx, 'Sheet1')
Copy the code
Reading database
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
pd.read_sql("SELECT * FROM my_table;", engine)
pd.read_sql_table('my_table', engine)
pd.read_sql_query("SELECT * FROM my_table;", engine)
Copy the code
Read_sql () is a convenient wrapper for read_SQL_table () and read_SQL_query ()
pd.to_sql('myDf', engine)
Copy the code
choose
To obtain
Get 1 data
s['day']
Copy the code
1
Get a subset of the DataFrame
df[1:]
Copy the code
Select, Boolean index & Settings
location
Select individual values by row and column
df.iloc[[0], [1]]
Copy the code
df.iat[0.1]
Copy the code
‘small arms’
The label
Select individual values by row and column labels
df.loc[0.'name']
Copy the code
‘jia’
df.at[0.'name']
Copy the code
‘jia’
Boolean indexing
s[~(s > 1)]
Copy the code
Day 1 dtype: int64
s[(s < - 1) | (s > 2)]
Copy the code
Di 3 Xuan 5 Yellow 7 dtype: int64
df[df['age'] >1]
Copy the code
Set up the
Set the index ‘yu’ of sequence S to 9
s['we'] = 9
s
Copy the code
Sky 1 Earth 3 Xuan 5 Huang 7 Yu 9 dtype: int64
Delete (dropping)
Remove values from rows (Axis = 0)
s.drop(['day'.'to'])
Copy the code
Hyun 5 Yellow 7 Yu 9 Dtype: int64
Remove values from columns (Axis = 1)
df.drop('name', axis=1)
Copy the code
Sorting and ranking
Sort by axis label
df.sort_index()
Copy the code
Sort by axis value
df.sort_values(by='age')
Copy the code
Subscripts for sorting from smallest to largest
df.rank()
Copy the code
Retrieve Series/DataFrame information
The basic information
df = pd.DataFrame([[1.2], [4.5], [7.8]],
index=['cobra'.'viper'.'sidewinder'],
columns=['max_speed'.'shield'])
Copy the code
(Row, column)
df.shape
Copy the code
(3, 2)
Describe the index
df.index
Copy the code
Index([‘cobra’, ‘viper’, ‘sidewinder’], dtype=’object’)
Describes DataFrame column information
df.columns
Copy the code
Index([‘max_speed’, ‘shield’], dtype=’object’)
DataFrame information
df.info()
Copy the code
The number of non-Na values
df.count()
Copy the code
max_speed 3 shield 3 dtype: int64
Abstract
The sum of the
df.sum()
Copy the code
max_speed 12 shield 15 dtype: int64
The cumulative value
df.cumsum()
Copy the code
The minimum value
df.min()
Copy the code
max_speed 1 shield 2 dtype: int64
The maximum
df.max()
Copy the code
max_speed 7 shield 8 dtype: int64
Minimum index value
df.idxmin()
Copy the code
max_speed cobra shield cobra dtype: object
Maximum index value
df.idxmax()
Copy the code
max_speed sidewinder shield sidewinder dtype: object
In this paper, the statistical
Df.describe () mean python df.mean()Copy the code
Max_speed 4.0 Shield 5.0 DTYPE: FLOAT64
The median
df.median()
Copy the code
Max_speed 4.0 Shield 5.0 DTYPE: FLOAT64
Application functions
f = lambda x: x*2
Copy the code
Application functions
df.apply(f)
Copy the code
Apply functions by element
df.applymap(f)
Copy the code
The data aligned
Internal data alignment
The value NA is introduced in non-overlapping indexes
s3 = pd.Series([7.2 -.3], index=['the xuan'.'yellow'.'we'])
Copy the code
s + s3
Copy the code
Dtype: float64
The arithmetic operation of the fill method
Do your own internal data alignment with a fill method
s.add(s3, fill_value=0)
Copy the code
Earth 3.0 sky 1.0 woo 12.0 Xuan 12.0 Yellow 5.0DTYPE: float64
s.sub(s3, fill_value=2)
Copy the code
Dtype: Float64
s.div(s3, fill_value=4)
Copy the code
Ground 0.750000 day 0.250000 yu 3.000000 Hyun 0.714286 Yellow-3.500000 DTYPE: FLOAT64
s.mul(s3, fill_value=3)
Copy the code
Ground 9.0 day 3.0 yu 27.0 Hyun 35.0 Yellow-14.0 DTYPE: Float64
Ipynb check out: github.com/iOSDevLog/A…