Pandas basis

Introduction of Pandas

The Pandas library, built on NumPy, provides easy-to-use data structures and data analysis tools for the Python programming language.

Import Pandas using the following conventions

import pandas as pd
Copy the code

help

help(pd.Series.loc)
Copy the code

Pandas data structure

Series

A one-dimensional token array that can hold any data type

s = pd.Series([1.3.5.7], index=['day'.'to'.'the xuan'.'yellow'])
Copy the code

The left column index
s
Copy the code

Sky 1 Earth 3 Xuan 5 Yellow 7 dtype: int64

Data box (DataFrame)

A two-dimensional labeled data structure for different types of columns, similar to an Excel spreadsheet

The above column name

The left column index

–	The surname	The name	national	Name don’t	age
1	jia	Small arms	han	male	3
2	jia	Little long	han	male	1
3	zhang	The duckling	han	female	–

data = {'name': ['jia'.'jia'.'张'].'name': ['small arms'.'the little long'.'the duckling ́'].'national': ['han'.'han'.'han'].'age': [3.1.None]}
Copy the code

data
Copy the code

{‘ name ‘: [‘ jia’, ‘jia’, ‘a’], ‘name’ : [‘ small arms’, ‘little long’, ‘the duckling ́],’ national ‘: [‘ han’, ‘han’, ‘han’], ‘ages’ : (3, 1, None]}

df = pd.DataFrame(data, columns=['name'.'name'.'age'])
Copy the code

df
Copy the code

–	The surname	The name	national	Name don’t	age
1	jia	Small arms	han	male	3
2	jia	Little long	han	male	1
3	zhang	The duckling	han	female	–

File I/O

Read and write the CSV

pd.read_csv('file.csv', header=None, nrows=5)
df.to_csv('myDataFrame.csv')
Copy the code

Read and write Excel

pd.read_excel('file.xlsx')
pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')
xlsx = pd.ExcelFile('file.xls')
df = pd.read_excel(xlsx, 'Sheet1')
Copy the code

Reading database

from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
pd.read_sql("SELECT * FROM my_table;", engine)
pd.read_sql_table('my_table', engine)
pd.read_sql_query("SELECT * FROM my_table;", engine)
Copy the code

Read_sql () is a convenient wrapper for read_SQL_table () and read_SQL_query ()

pd.to_sql('myDf', engine)
Copy the code

choose

To obtain

Get 1 data
s['day']
Copy the code

Get a subset of the DataFrame
df[1:]
Copy the code

Select, Boolean index & Settings

location

Select individual values by row and column

df.iloc[[0], [1]]
Copy the code

df.iat[0.1]
Copy the code

‘small arms’

The label

Select individual values by row and column labels

df.loc[0.'name']
Copy the code

‘jia’

df.at[0.'name']
Copy the code

‘jia’

Boolean indexing

s[~(s > 1)]
Copy the code

Day 1 dtype: int64

s[(s < - 1) | (s > 2)]
Copy the code

Di 3 Xuan 5 Yellow 7 dtype: int64

df[df['age'] >1]
Copy the code

Set up the

Set the index ‘yu’ of sequence S to 9

s['we'] = 9
s
Copy the code

Sky 1 Earth 3 Xuan 5 Huang 7 Yu 9 dtype: int64

Delete (dropping)

Remove values from rows (Axis = 0)

s.drop(['day'.'to'])
Copy the code

Hyun 5 Yellow 7 Yu 9 Dtype: int64

Remove values from columns (Axis = 1)

df.drop('name', axis=1)
Copy the code

Sorting and ranking

Sort by axis label

df.sort_index()
Copy the code

Sort by axis value

df.sort_values(by='age')
Copy the code

Subscripts for sorting from smallest to largest

df.rank()
Copy the code

Retrieve Series/DataFrame information

The basic information

df = pd.DataFrame([[1.2], [4.5], [7.8]],
                  index=['cobra'.'viper'.'sidewinder'],
                  columns=['max_speed'.'shield'])
Copy the code

(Row, column)

df.shape
Copy the code

(3, 2)

Describe the index

df.index
Copy the code

Index([‘cobra’, ‘viper’, ‘sidewinder’], dtype=’object’)

Describes DataFrame column information

df.columns
Copy the code

Index([‘max_speed’, ‘shield’], dtype=’object’)

DataFrame information

df.info()
Copy the code

The number of non-Na values

df.count()
Copy the code

max_speed 3 shield 3 dtype: int64

Abstract

The sum of the

df.sum()
Copy the code

max_speed 12 shield 15 dtype: int64

The cumulative value

df.cumsum()
Copy the code

The minimum value

df.min()
Copy the code

max_speed 1 shield 2 dtype: int64

The maximum

df.max()
Copy the code

max_speed 7 shield 8 dtype: int64

Minimum index value

df.idxmin()
Copy the code

max_speed cobra shield cobra dtype: object

Maximum index value

df.idxmax()
Copy the code

max_speed sidewinder shield sidewinder dtype: object

In this paper, the statistical

Df.describe () mean python df.mean()Copy the code

Max_speed 4.0 Shield 5.0 DTYPE: FLOAT64

The median

df.median()
Copy the code

Max_speed 4.0 Shield 5.0 DTYPE: FLOAT64

Application functions

f = lambda x: x*2
Copy the code

Application functions

df.apply(f)
Copy the code

Apply functions by element

df.applymap(f)
Copy the code

The data aligned

Internal data alignment

The value NA is introduced in non-overlapping indexes

s3 = pd.Series([7.2 -.3], index=['the xuan'.'yellow'.'we'])
Copy the code

s + s3
Copy the code

Dtype: float64

The arithmetic operation of the fill method

Do your own internal data alignment with a fill method

s.add(s3, fill_value=0)
Copy the code

Earth 3.0 sky 1.0 woo 12.0 Xuan 12.0 Yellow 5.0DTYPE: float64

s.sub(s3, fill_value=2)
Copy the code

Dtype: Float64

s.div(s3, fill_value=4)
Copy the code

Ground 0.750000 day 0.250000 yu 3.000000 Hyun 0.714286 Yellow-3.500000 DTYPE: FLOAT64

s.mul(s3, fill_value=3)
Copy the code

Ground 9.0 day 3.0 yu 27.0 Hyun 35.0 Yellow-14.0 DTYPE: Float64

Ipynb check out: github.com/iOSDevLog/A…

Introduction of Pandas

help

Pandas data structure

Series

Data box (DataFrame)

File I/O

Read and write the CSV

Read and write Excel

Reading database

choose

To obtain

Select, Boolean index & Settings

location

The label

Boolean indexing

Set up the

Delete (dropping)

Sorting and ranking

Retrieve Series/DataFrame information

The basic information

Abstract

Application functions

The data aligned

Internal data alignment

The arithmetic operation of the fill method

Related Posts

【OpenVino CPU model acceleration (ii) 】 Using OpenVino acceleration reasoning

Min-hash algorithm series

Study Notes CB010: Recursive neural Networks, LSTM, automatic captioning