The article is very long, high and low to endure, if not endure, then collect it, always used
Radish brother also thoughtful made a PDF, at the end of the article to get!
How do I create Series using lists and dictionaries
Create a Series using a list
import pandas as pd
ser1 = pd.Series([1.5.2.5.3.4.5.5.0.6])
print(ser1)
Copy the code
Output:
0 1.5
1 2.5
2 3.0
3 4.5
4 5.0
5 6.0
dtype: float64
Copy the code
Create Series with the name parameter
import pandas as pd
ser2 = pd.Series(["India"."Canada"."Germany"], name="Countries")
print(ser2)
Copy the code
Output:
0 India
1 Canada
2 Germany
Name: Countries, dtype: object
Copy the code
Create a Series using a list of abbreviations
import pandas as pd
ser3 = pd.Series(["A"] *4)
print(ser3)
Copy the code
Output:
0 A
1 A
2 A
3 A
dtype: object
Copy the code
Create Series using dictionaries
import pandas as pd
ser4 = pd.Series({"India": "New Delhi"."Japan": "Tokyo"."UK": "London"})
print(ser4)
Copy the code
Output:
India New Delhi
Japan Tokyo
UK London
dtype: object
Copy the code
How to create a Series using Numpy functions
import pandas as pd
import numpy as np
ser1 = pd.Series(np.linspace(1.10.5))
print(ser1)
ser2 = pd.Series(np.random.normal(size=5))
print(ser2)
Copy the code
Output:
0 1.00 1 3.25 2 5.50 3 7.75 4 10.00 DType: float64 0-1.694452 1-1.570006 2 1.713794 3 0.338292 4 0.803511 Dtype: float64Copy the code
How do I get the index and value of a Series
import pandas as pd
import numpy as np
ser1 = pd.Series({"India": "New Delhi"."Japan": "Tokyo"."UK": "London"})
print(ser1.values)
print(ser1.index)
print("\n")
ser2 = pd.Series(np.random.normal(size=5))
print(ser2.index)
print(ser2.values)
Copy the code
Output:
['New Delhi' 'Tokyo' 'London'] Index(['India', 'Japan', 'UK'], dtype='object') RangeIndex(start=0, stop=5, Step =1) [0.66265478-0.72222211 0.3608642 1.40955436 1.3096732] step=1) [0.66265478-0.72222211 0.3608642 1.40955436 1.3096732]Copy the code
How do I specify an index when I create a Series
import pandas as pd
values = ["India"."Canada"."Australia"."Japan"."Germany"."France"]
code = ["IND"."CAN"."AUS"."JAP"."GER"."FRA"]
ser1 = pd.Series(values, index=code)
print(ser1)
Copy the code
Output:
IND India
CAN Canada
AUS Australia
JAP Japan
GER Germany
FRA France
dtype: object
Copy the code
How do I get the size and shape of a Series
import pandas as pd
values = ["India"."Canada"."Australia"."Japan"."Germany"."France"]
code = ["IND"."CAN"."AUS"."JAP"."GER"."FRA"]
ser1 = pd.Series(values, index=code)
print(len(ser1))
print(ser1.shape)
print(ser1.size)
Copy the code
Output:
6 (6)Copy the code
How do I get the first or last rows of a Series
Head()
import pandas as pd
values = ["India"."Canada"."Australia"."Japan"."Germany"."France"]
code = ["IND"."CAN"."AUS"."JAP"."GER"."FRA"]
ser1 = pd.Series(values, index=code)
print("-----Head()-----")
print(ser1.head())
print("\n\n-----Head(2)-----")
print(ser1.head(2))
Copy the code
Output:
-----Head()-----
IND India
CAN Canada
AUS Australia
JAP Japan
GER Germany
dtype: object
-----Head(2)-----
IND India
CAN Canada
dtype: object
Copy the code
Tail()
import pandas as pd
values = ["India"."Canada"."Australia"."Japan"."Germany"."France"]
code = ["IND"."CAN"."AUS"."JAP"."GER"."FRA"]
ser1 = pd.Series(values, index=code)
print("-----Tail()-----")
print(ser1.tail())
print("\n\n-----Tail(2)-----")
print(ser1.tail(2))
Copy the code
Output:
-----Tail()-----
CAN Canada
AUS Australia
JAP Japan
GER Germany
FRA France
dtype: object
-----Tail(2)-----
GER Germany
FRA France
dtype: object
Copy the code
Take()
import pandas as pd
values = ["India"."Canada"."Australia"."Japan"."Germany"."France"]
code = ["IND"."CAN"."AUS"."JAP"."GER"."FRA"]
ser1 = pd.Series(values, index=code)
print("-----Take()-----")
print(ser1.take([2.4.5]))
Copy the code
Output:
-----Take()-----
AUS Australia
GER Germany
FRA France
dtype: object
Copy the code
Get a subset of a Series using slices
import pandas as pd
num = [000, 100, 200, 300, 400, 500, 600, 700, 800, 900]
idx = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
series = pd.Series(num, index=idx)
print("\n [2:2] \n")
print(series[2:4])
print("\n [1:6:2] \n")
print(series[1:6:2])
print("\n [:6] \n")
print(series[:6])
print("\n [4:] \n")
print(series[4:])
print("\n [:4:2] \n")
print(series[:4:2])
print("\n [4::2] \n")
print(series[4::2])
print("\n [::-1] \n")
print(series[::-1])
Copy the code
Output
[2:2]
C 200
D 300
dtype: int64
[1:6:2]
B 100
D 300
F 500
dtype: int64
[:6]
A 0
B 100
C 200
D 300
E 400
F 500
dtype: int64
[4:]
E 400
F 500
G 600
H 700
I 800
J 900
dtype: int64
[:4:2]
A 0
C 200
dtype: int64
[4::2]
E 400
G 600
I 800
dtype: int64
[::-1]
J 900
I 800
H 700
G 600
F 500
E 400
D 300
C 200
B 100
A 0
dtype: int64
Copy the code
How do I create a DataFrame
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001', 'Emp00'],
'Name': ['John Doe', 'William Spark'],
'Occupation': ['Chemist', 'Statistician'],
'Date Of Join': ['2018-01-25', '2018-01-26'],
'Age': [23, 24]})
print(employees)
Copy the code
Output:
Age Date Of Join EmpCode Name Occupation
0 23 2018-01-25 Emp001 John Doe Chemist
1 24 2018-01-26 Emp00 William Spark Statistician
Copy the code
How to set DataFrame index and column information
import pandas as pd
employees = pd.DataFrame(
data={'Name': ['John Doe'.'William Spark'].'Occupation': ['Chemist'.'Statistician'].'Date Of Join': ['2018-01-25'.'2018-01-26'].'Age': [23.24]},
index=['Emp001'.'Emp002'],
columns=['Name'.'Occupation'.'Date Of Join'.'Age'])
print(employees)
Copy the code
Output
Name Occupation Date Of Join Age
Emp001 John Doe Chemist 2018-01-25 23
Emp002 William Spark Statistician 2018-01-26 24
Copy the code
How to rename DataFrame column names
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp00'].'Name': ['John Doe'.'William Spark'].'Occupation': ['Chemist'.'Statistician'].'Date Of Join': ['2018-01-25'.'2018-01-26'].'Age': [23.24]})
employees.columns = ['EmpCode'.'EmpName'.'EmpOccupation'.'EmpDOJ'.'EmpAge']
print(employees)
Copy the code
Output:
EmpCode EmpName EmpOccupation EmpDOJ EmpAge
0 23 2018-01-25 Emp001 John Doe Chemist
1 24 2018-01-26 Emp00 William Spark Statistician
Copy the code
How do I select or filter rows from the DataFrame based on the values in the Pandas column
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Statistician'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
print("\nUse == operator\n")
print(employees.loc[employees['Age'] = =23])
print("\nUse < operator\n")
print(employees.loc[employees['Age'] < 30])
print("\nUse ! = operator\n")
print(employees.loc[employees['Occupation'] != 'Statistician'])
print("\nMultiple Conditions\n")
print(employees.loc[(employees['Occupation'] != 'Statistician') &
(employees['Name'] = ='John')])
Copy the code
Output:
Use == operator Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist Use < operator Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 1 24 2018-01-26 Emp002 Doe Statistician 3 29 2018-02-26 Emp004 Spark Statistician Use ! = operator Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 4 40 2018-03-16 Emp005 Mark Programmer Multiple Conditions Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John ChemistCopy the code
Use “isin” in DataFrame to filter multiple lines
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Statistician'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
print("\nUse isin operator\n")
print(employees.loc[employees['Occupation'].isin(['Chemist'.'Programmer']])print("\nMultiple Conditions\n")
print(employees.loc[(employees['Occupation'] = ='Chemist') |
(employees['Name'] = ='John') &
(employees['Age'] < 30)])
Copy the code
Output:
Use isin operator
Age Date Of Join EmpCode Name Occupation
0 23 2018-01-25 Emp001 John Chemist
4 40 2018-03-16 Emp005 Mark Programmer
Multiple Conditions
Age Date Of Join EmpCode Name Occupation
0 23 2018-01-25 Emp001 John Chemist
Copy the code
Iterate over rows and columns of the DataFrame
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Statistician'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
print("\n Example iterrows \n")
for index, col in employees.iterrows():
print(col['Name']."--", col['Age'])
print("\n Example itertuples \n")
for row in employees.itertuples(index=True, name='Pandas') :print(getattr(row, "Name"), "--".getattr(row, "Age"))
Copy the code
Output:
Example iterrows
John -- 23
Doe -- 24
William -- 34
Spark -- 29
Mark -- 40
Example itertuples
John -- 23
Doe -- 24
William -- 34
Spark -- 29
Mark -- 40
Copy the code
How do I delete DataFrame columns by name or index
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Statistician'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
print(employees)
print("\n Drop Column by Name \n")
employees.drop('Age', axis=1, inplace=True)
print(employees)
print("\n Drop Column by Index \n")
employees.drop(employees.columns[[0.1]], axis=1, inplace=True)
print(employees)
Copy the code
Output:
Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 1 24 2018-01-26 Emp002 Doe Statistician 2 34 2018-01-26 Emp003 William Statistician 3 29 2018-02-26 Emp004 Spark Statistician 4 40 2018-03-16 Emp005 Mark Programmer Drop Column by Name Date Of Join EmpCode Name Occupation 0 2018-01-25 Emp001 John Chemist 1 2018-01-26 Emp002 Doe Statistician 2 2018-01-26 Emp003 William Statistician 3 2018-02-26 Emp004 Spark Statistician 4 2018-03-16 Emp005 Mark Programmer Drop Column by Index Name Occupation 0 John Chemist 1 Doe Statistician 2 William Statistician 3 Spark Statistician 4 Mark ProgrammerCopy the code
Add columns to the DataFrame
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Statistician'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
employees['City'] = ['London'.'Tokyo'.'Sydney'.'London'.'Toronto']
print(employees)
Copy the code
Output:
Age Date Of Join EmpCode Name Occupation City 0 23 2018-01-25 Emp001 John Chemist London 1 24 2018-01-26 Emp002 Doe Statistician Tokyo 2 34 2018-01-26 Emp003 William Statistician Sydney 3 29 2018-02-26 Emp004 Spark Statistician London 4 40 2018-03-16 Emp005 Mark Programmer TorontoCopy the code
How do I get a list of column headers from a DataFrame
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Statistician'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
print(list(employees))
print(list(employees.columns.values))
print(employees.columns.tolist())
Copy the code
Output:
['Age', 'Date Of Join', 'EmpCode', 'Name', 'Occupation']
['Age', 'Date Of Join', 'EmpCode', 'Name', 'Occupation']
['Age', 'Date Of Join', 'EmpCode', 'Name', 'Occupation']
Copy the code
How do I randomly generate dataframes
import pandas as pd
import numpy as np
np.random.seed(5)
df_random = pd.DataFrame(np.random.randint(100, size=(10.6)),
columns=list('ABCDEF'),
index=['Row-{}'.format(i) for i in range(10)])
print(df_random)
Copy the code
Output:
A B C D E F
Row-0 99 78 61 16 73 8
Row-1 62 27 30 80 7 76
Row-2 15 53 80 27 44 77
Row-3 75 65 47 30 84 86
Row-4 18 9 41 62 1 82
Row-5 16 78 5 58 0 80
Row-6 4 36 51 27 31 2
Row-7 68 38 83 19 18 7
Row-8 30 62 11 67 65 55
Row-9 3 91 78 27 29 33
Copy the code
How do I select multiple columns of a DataFrame
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Statistician'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
df = employees[['EmpCode'.'Age'.'Name']]
print(df)
Copy the code
Output:
EmpCode Age Name
0 Emp001 23 John
1 Emp002 24 Doe
2 Emp003 34 William
3 Emp004 29 Spark
4 Emp005 40 Mark
Copy the code
How do I convert a dictionary to a DataFrame
import pandas as pd
data = ({'Age': [30.20.22.40.32.28.39].'Color': ['Blue'.'Green'.'Red'.'White'.'Gray'.'Black'.'Red'].'Food': ['Steak'.'Lamb'.'Mango'.'Apple'.'Cheese'.'Melon'.'Beans'].'Height': [165.70.120.80.180.172.150].'Score': [4.6.8.3.9.0.3.3.1.8.9.5.2.2].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']})print(data)
df = pd.DataFrame(data)
print(df)
Copy the code
Output:
{'Height': [165, 70, 120, 80, 180, 172, 150], 'Food': ['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'], 'Age': [30, 20, 22, 40, 32, 28, 39], 'Sco re': [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2], 'Color: [' Blue', 'Green', 'Red', 'Whi te', 'Gray', 'Black', 'Red'], 'the State: ['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']} Age Color Food Height Score State 0 30 Blue Steak 165 4.6 NY 120 Green Lamb 70 8.3 TX 2 22 Red Mango 120 9.0 FL 3 40 White Apple 80 3.3 AL 4 32 Gray Cheese 180 1.8 AK 5 28 Black Melon 172 9.5 TX 6 39 Red Beans 150 2.2 TXCopy the code
Slice using IOC
import pandas as pd
df = pd.DataFrame({'Age': [30.20.22.40.32.28.39].'Color': ['Blue'.'Green'.'Red'.'White'.'Gray'.'Black'.'Red'].'Food': ['Steak'.'Lamb'.'Mango'.'Apple'.'Cheese'.'Melon'.'Beans'].'Height': [165.70.120.80.180.172.150].'Score': [4.6.8.3.9.0.3.3.1.8.9.5.2.2].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print("\n -- Selecting a single row with .loc with a string -- \n")
print(df.loc['Penelope'])
print("\n -- Selecting multiple rows with .loc with a list of strings -- \n")
print(df.loc[['Cornelia'.'Jane'.'Dean']])
print("\n -- Selecting multiple rows with .loc with slice notation -- \n")
print(df.loc['Aaron':'Dean'])
Copy the code
Output:
-- Selecting a single row with. Loc with a string -- Age 40 Color White Food Apple Height 80 Score 3.3 State AL Name: Penelope, dtype: object -- Selecting multiple rows with .loc with a list of strings -- Age Color Food Height Score State Cornelia 39 Red Beans 150 2.2 TX Jane 30 Blue Steak 165 4.6 NY Dean 32 Gray Cheese 180 1.8 AK -- Selecting Multiple Rows with.loc with Slice Notation -- Age Color Food Height Score State Aaron 22 Red Mango 120 9.0 FL Penelope 40 White Apple 80 3.3 AL Dean 32 Gray Cheese 180 1.8AKCopy the code
Check if the DataFrame is empty
import pandas as pd
df = pd.DataFrame()
if df.empty:
print('DataFrame is empty! ')
Copy the code
Output:
DataFrame is empty!
Copy the code
Specify index and column names when creating a DataFrame
import pandas as pd
values = ["India"."Canada"."Australia"."Japan"."Germany"."France"]
code = ["IND"."CAN"."AUS"."JAP"."GER"."FRA"]
df = pd.DataFrame(values, index=code, columns=['Country'])
print(df)
Copy the code
Output:
Country
IND India
CAN Canada
AUS Australia
JAP Japan
GER Germany
FRA France
Copy the code
Sections were performed using ILOC
import pandas as pd
df = pd.DataFrame({'Age': [30.20.22.40.32.28.39].'Color': ['Blue'.'Green'.'Red'.'White'.'Gray'.'Black'.'Red'].'Food': ['Steak'.'Lamb'.'Mango'.'Apple'.'Cheese'.'Melon'.'Beans'].'Height': [165.70.120.80.180.172.150].'Score': [4.6.8.3.9.0.3.3.1.8.9.5.2.2].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print("\n -- Selecting a single row with .iloc with an integer -- \n")
print(df.iloc[4])
print("\n -- Selecting multiple rows with .iloc with a list of integers -- \n")
print(df.iloc[[2, -2]])
print("\n -- Selecting multiple rows with .iloc with slice notation -- \n")
print(df.iloc[:5:3])
Copy the code
Output:
-- Selecting a single row with .iloc with an integer --
Age 32
Color Gray
Food Cheese
Height 180
Score 1.8
State AK
Name: Dean, dtype: object
-- Selecting multiple rows with .iloc with a list of integers --
Age Color Food Height Score State
Aaron 22 Red Mango 120 9.0 FL
Christina 28 Black Melon 172 9.5 TX
-- Selecting multiple rows with .iloc with slice notation --
Age Color Food Height Score State
Jane 30 Blue Steak 165 4.6 NY
Penelope 40 White Apple 80 3.3 AL
Copy the code
Difference between ILOC and LOC
- The LOC indexer can also perform Boolean selection, for example, if we want to find all rows with Age less than 30 and return only Color and Height columns, we can do the following. We can copy it with ILOC, but we can’t pass it to a Boolean, we have to convert the Boolean to a NUMpy array
- Loc retrieves a row (or column) with a specific label from the index
- Iloc retrieves rows (or columns) at specific places in the index (so it only needs integers)
import pandas as pd
df = pd.DataFrame({'Age': [30.20.22.40.32.28.39].'Color': ['Blue'.'Green'.'Red'.'White'.'Gray'.'Black'.'Red'].'Food': ['Steak'.'Lamb'.'Mango'.'Apple'.'Cheese'.'Melon'.'Beans'].'Height': [165.70.120.80.180.172.150].'Score': [4.6.8.3.9.0.3.3.1.8.9.5.2.2].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print("\n -- loc -- \n")
print(df.loc[df['Age'] < 30['Color'.'Height']])
print("\n -- iloc -- \n")
print(df.iloc[(df['Age'] < 30).values, [1.3]])
Copy the code
Output:
-- loc --
Color Height
Nick Green 70
Aaron Red 120
Christina Black 172
-- iloc --
Color Height
Nick Green 70
Aaron Red 120
Christina Black 172
Copy the code
Create an empty DataFrame using the time index
import datetime
import pandas as pd
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date, periods=10, freq='D')
columns = ['A'.'B'.'C']
df = pd.DataFrame(index=index, columns=columns)
df = df.fillna(0)
print(df)
Copy the code
Output:
A B C 2018-09-30 0 0 0 2018-10-01 0 0 0 2018-10-02 0 0 0 2018-10-03 0 0 0 2018-10-04 0 0 0 2018-10-05 0 0 0 2018-10-06 0 0 0 2018-10-07 0 0 2018-10-08 0 0 0 2018-10-09 0 0 0 0 0 0Copy the code
How do I change the ordering of DataFrame columns
import pandas as pd
df = pd.DataFrame({'Age': [30.20.22.40.32.28.39].'Color': ['Blue'.'Green'.'Red'.'White'.'Gray'.'Black'.'Red'].'Food': ['Steak'.'Lamb'.'Mango'.'Apple'.'Cheese'.'Melon'.'Beans'].'Height': [165.70.120.80.180.172.150].'Score': [4.6.8.3.9.0.3.3.1.8.9.5.2.2].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print("\n -- Change order using columns -- \n")
new_order = [3.2.1.4.5.0]
df = df[df.columns[new_order]]
print(df)
print("\n -- Change order using reindex -- \n")
df = df.reindex(['State'.'Color'.'Age'.'Food'.'Score'.'Height'], axis=1)
print(df)
Copy the code
Output:
-- Change order using columns --
Height Food Color Score State Age
Jane 165 Steak Blue 4.6 NY 30
Nick 70 Lamb Green 8.3 TX 20
Aaron 120 Mango Red 9.0 FL 22
Penelope 80 Apple White 3.3 AL 40
Dean 180 Cheese Gray 1.8 AK 32
Christina 172 Melon Black 9.5 TX 28
Cornelia 150 Beans Red 2.2 TX 39
-- Change order using reindex --
State Color Age Food Score Height
Jane NY Blue 30 Steak 4.6 165
Nick TX Green 20 Lamb 8.3 70
Aaron FL Red 22 Mango 9.0 120
Penelope AL White 40 Apple 3.3 80
Dean AK Gray 32 Cheese 1.8 180
Christina TX Black 28 Melon 9.5 172
Cornelia TX Red 39 Beans 2.2 150
Copy the code
Check the data type of the DataFrame column
import pandas as pd
df = pd.DataFrame({'Age': [30.20.22.40.32.28.39].'Color': ['Blue'.'Green'.'Red'.'White'.'Gray'.'Black'.'Red'].'Food': ['Steak'.'Lamb'.'Mango'.'Apple'.'Cheese'.'Melon'.'Beans'].'Height': [165.70.120.80.180.172.150].'Score': [4.6.8.3.9.0.3.3.1.8.9.5.2.2].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print(df.dtypes)
Copy the code
Output:
Age int64
Color object
Food object
Height int64
Score float64
State object
dtype: object
Copy the code
Changes the data type of the DataFrame column
import pandas as pd
df = pd.DataFrame({'Age': [30.20.22.40.32.28.39].'Color': ['Blue'.'Green'.'Red'.'White'.'Gray'.'Black'.'Red'].'Food': ['Steak'.'Lamb'.'Mango'.'Apple'.'Cheese'.'Melon'.'Beans'].'Height': [165.70.120.80.180.172.150].'Score': [4.6.8.3.9.0.3.3.1.8.9.5.2.2].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print(df.dtypes)
df['Age'] = df['Age'].astype(str)
print(df.dtypes)
Copy the code
Output:
Age int64
Color object
Food object
Height int64
Score float64
State object
dtype: object
Age object
Color object
Food object
Height int64
Score float64
State object
dtype: object
Copy the code
How do I convert the data type of a column to DateTime
import pandas as pd
df = pd.DataFrame({'DateOFBirth': [1349720105.1349806505.1349892905.1349979305.1350065705.1349792905.1349730105].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print("\n----------------Before---------------\n")
print(df.dtypes)
print(df)
df['DateOFBirth'] = pd.to_datetime(df['DateOFBirth'], unit='s')
print("\n----------------After----------------\n")
print(df.dtypes)
print(df)
Copy the code
Output:
----------------Before---------------
DateOFBirth int64
State object
dtype: object
DateOFBirth State
Jane 1349720105 NY
Nick 1349806505 TX
Aaron 1349892905 FL
Penelope 1349979305 AL
Dean 1350065705 AK
Christina 1349792905 TX
Cornelia 1349730105 TX
----------------After----------------
DateOFBirth datetime64[ns]
State object
dtype: object
DateOFBirth State
Jane 2012-10-08 18:15:05 NY
Nick 2012-10-09 18:15:05 TX
Aaron 2012-10-10 18:15:05 FL
Penelope 2012-10-11 18:15:05 AL
Dean 2012-10-12 18:15:05 AK
Christina 2012-10-09 14:28:25 TX
Cornelia 2012-10-08 21:01:45 TX
Copy the code
Convert DataFrame columns from floats to INTs
Import pandas as pd df = pd.DataFrame({'DailyExp': [75.7, 56.69, 55.69, 96.5, 84.9, 110.5, 58.9], 'State': ['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX'] }, index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia']) print("\n----------------Before---------------\n") print(df.dtypes) print(df) df['DailyExp'] = df['DailyExp'].astype(int) print("\n----------------After----------------\n") print(df.dtypes) print(df)Copy the code
Output:
----------------Before--------------- DailyExp float64 State object dtype: Object DailyExp State Jane 75.70 NY Nick 56.69 TX Aaron 55.69 FL Penelope 96.50 AL Dean 84.90 AK Christina 110.50 TX Cornelia 58.90 TX -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - After -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- DailyExp int32 State object dtype: object DailyExp State Jane 75 NY Nick 56 TX Aaron 55 FL Penelope 96 AL Dean 84 AK Christina 110 TX Cornelia 58 TXCopy the code
How do I convert the Dates column to DateTime
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11'.'1999-05-12'.'1976-01-01'.'1986-06-01'.'1983-06-04'.'1990-03-07'.'1999-07-09'].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print("\n----------------Before---------------\n")
print(df.dtypes)
df['DateOfBirth'] = df['DateOfBirth'].astype('datetime64')
print("\n----------------After----------------\n")
print(df.dtypes)
Copy the code
Output:
----------------Before---------------
DateOfBirth object
State object
dtype: object
----------------After----------------
DateOfBirth datetime64[ns]
State object
dtype: object
Copy the code
Add two Dataframes
import pandas as pd
df1 = pd.DataFrame({'Age': [30.20.22.40].'Height': [165.70.120.80].'Score': [4.6.8.3.9.0.3.3].'State': ['NY'.'TX'.'FL'.'AL']},
index=['Jane'.'Nick'.'Aaron'.'Penelope'])
df2 = pd.DataFrame({'Age': [32.28.39].'Color': ['Gray'.'Black'.'Red'].'Food': ['Cheese'.'Melon'.'Beans'].'Score': [1.8.9.5.2.2].'State': ['AK'.'TX'.'TX']},
index=['Dean'.'Christina'.'Cornelia'])
df3 = df1.append(df2, sort=True)
print(df3)
Copy the code
Output:
Age Color Food Height Score State Jane 30 NaN NaN 165.0 4.6 NY Nick 20 NaN NaN 70.0 8.3 TX Aaron 22 NaN NaN 120.0 9.0 FL Penelope 40 NaN NaN 80.0 3.3 AL Dean 32 Gray Cheese NaN 1.8 AK Christina 28 Black Melon NaN 9.5 TX Cornelia 39 Red 2.2 the TX Beans NaNCopy the code
Add additional lines at the end of the DataFrame
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Statistician'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
print("\n------------ BEFORE ----------------\n")
print(employees)
employees.loc[len(employees)] = [45.'2018-01-25'.'Emp006'.'Sunny'.'Programmer']
print("\n------------ AFTER ----------------\n")
print(employees)
Copy the code
Output:
------------ BEFORE ---------------- Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 1 24 2018-01-26 Emp002 Doe Statistician 2 34 2018-01-26 Emp003 William Statistician 3 29 2018-02-26 Emp004 Spark Statistician 4 40 2018-03-16 Emp005 Mark Programmer ------------ AFTER ---------------- Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 1 24 2018-01-26 Emp002 Doe Statistician 2 34 2018-01-26 Emp003 William Statistician 3 29 2018-02-26 Emp004 Spark Statistician 4 40 2018-03-16 Emp005 Mark Programmer 5 45 2018-01-25 Emp006 Sunny ProgrammerCopy the code
Adds a new row to the specified index
import pandas as pd
employees = pd.DataFrame(
data={'Name': ['John Doe'.'William Spark'].'Occupation': ['Chemist'.'Statistician'].'Date Of Join': ['2018-01-25'.'2018-01-26'].'Age': [23.24]},
index=['Emp001'.'Emp002'],
columns=['Name'.'Occupation'.'Date Of Join'.'Age'])
print("\n------------ BEFORE ----------------\n")
print(employees)
employees.loc['Emp003'] = ['Sunny'.'Programmer'.'2018-01-25'.45]
print("\n------------ AFTER ----------------\n")
print(employees)
Copy the code
Output:
------------ BEFORE ----------------
Name Occupation Date Of Join Age
Emp001 John Doe Chemist 2018-01-25 23
Emp002 William Spark Statistician 2018-01-26 24
------------ AFTER ----------------
Name Occupation Date Of Join Age
Emp001 John Doe Chemist 2018-01-25 23
Emp002 William Spark Statistician 2018-01-26 24
Emp003 Sunny Programmer 2018-01-25 45
Copy the code
How do I add rows using a for loop
import pandas as pd
cols = ['Zip']
lst = []
zip = 32100
for a in range(10):
lst.append([zip])
zip = zip + 1
df = pd.DataFrame(lst, columns=cols)
print(df)
Copy the code
Output:
Zip
0 32100
1 32101
2 32102
3 32103
4 32104
5 32105
6 32106
7 32107
8 32108
9 32109
Copy the code
Add a line at the top of the DataFrame
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp002'.'Emp003'.'Emp004'].'Name': ['John'.'Doe'.'William'].'Occupation': ['Chemist'.'Statistician'.'Statistician'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'].'Age': [23.24.34]})
print("\n------------ BEFORE ----------------\n")
print(employees)
# New line
line = pd.DataFrame({'Name': 'Dean'.'Age': 45.'EmpCode': 'Emp001'.'Date Of Join': '2018-02-26'.'Occupation': 'Chemist'
}, index=[0])
# Concatenate two dataframe
employees = pd.concat([line,employees.ix[:]]).reset_index(drop=True)
print("\n------------ AFTER ----------------\n")
print(employees)
Copy the code
Output:
------------ BEFORE ----------------
Age Date Of Join EmpCode Name Occupation
0 23 2018-01-25 Emp002 John Chemist
1 24 2018-01-26 Emp003 Doe Statistician
2 34 2018-01-26 Emp004 William Statistician
------------ AFTER ----------------
Age Date Of Join EmpCode Name Occupation
0 45 2018-02-26 Emp001 Dean Chemist
1 23 2018-01-25 Emp002 John Chemist
2 24 2018-01-26 Emp003 Doe Statistician
3 34 2018-01-26 Emp004 William Statistician
Copy the code
How do I dynamically add rows to a DataFrame
import pandas as pd
df = pd.DataFrame(columns=['Name'.'Age'])
df.loc[1.'Name'] = 'Rocky'
df.loc[1.'Age'] = 23
df.loc[2.'Name'] = 'Sunny'
print(df)
Copy the code
Output:
Name Age
1 Rocky 23
2 Sunny NaN
Copy the code
Insert rows at any position
import pandas as pd
df = pd.DataFrame(columns=['Name'.'Age'])
df.loc[1.'Name'] = 'Rocky'
df.loc[1.'Age'] = 21
df.loc[2.'Name'] = 'Sunny'
df.loc[2.'Age'] = 22
df.loc[3.'Name'] = 'Mark'
df.loc[3.'Age'] = 25
df.loc[4.'Name'] = 'Taylor'
df.loc[4.'Age'] = 28
print("\n------------ BEFORE ----------------\n")
print(df)
line = pd.DataFrame({"Name": "Jack"."Age": 24}, index=[2.5])
df = df.append(line, ignore_index=False)
df = df.sort_index().reset_index(drop=True)
df = df.reindex(['Name'.'Age'], axis=1)
print("\n------------ AFTER ----------------\n")
print(df)
Copy the code
Output:
------------ BEFORE ----------------
Name Age
1 Rocky 21
2 Sunny 22
3 Mark 25
4 Taylor 28
------------ AFTER ----------------
Name Age
0 Rocky 21
1 Sunny 22
2 Jack 24
3 Mark 25
4 Taylor 28
Copy the code
Add rows to the DataFrame using a timestamp index
import pandas as pd
df = pd.DataFrame(columns=['Name'.'Age'])
df.loc['the 2014-05-01 18:47:05'.'Name'] = 'Rocky'
df.loc['the 2014-05-01 18:47:05'.'Age'] = 21
df.loc['the 2014-05-02 18:47:05'.'Name'] = 'Sunny'
df.loc['the 2014-05-02 18:47:05'.'Age'] = 22
df.loc['the 2014-05-03 18:47:05'.'Name'] = 'Mark'
df.loc['the 2014-05-03 18:47:05'.'Age'] = 25
print("\n------------ BEFORE ----------------\n")
print(df)
line = pd.to_datetime("The 2014-05-01 18:50:05".format="%Y-%m-%d %H:%M:%S")
new_row = pd.DataFrame([['Bunny'.26]], columns=['Name'.'Age'], index=[line])
df = pd.concat([df, pd.DataFrame(new_row)], ignore_index=False)
print("\n------------ AFTER ----------------\n")
print(df)
Copy the code
Output:
------------ BEFORE ----------------
Name Age
2014-05-01 18:47:05 Rocky 21
2014-05-02 18:47:05 Sunny 22
2014-05-03 18:47:05 Mark 25
------------ AFTER ----------------
Name Age
2014-05-01 18:47:05 Rocky 21
2014-05-02 18:47:05 Sunny 22
2014-05-03 18:47:05 Mark 25
2014-05-01 18:50:05 Bunny 26
Copy the code
Fill in missing values for different rows
import pandas as pd
a = {'A': 10.'B': 20}
b = {'B': 30.'C': 40.'D': 50}
df1 = pd.DataFrame(a, index=[0])
df2 = pd.DataFrame(b, index=[1])
df = pd.DataFrame()
df = df.append(df1)
df = df.append(df2).fillna(0)
print(df)
Copy the code
Output:
A B C D 0 10.0 20 0.0 10.0 30 40.0 50.0Copy the code
Append, concat, and combine_first examples
import pandas as pd
a = {'A': 10.'B': 20}
b = {'B': 30.'C': 40.'D': 50}
df1 = pd.DataFrame(a, index=[0])
df2 = pd.DataFrame(b, index=[1])
d1 = pd.DataFrame()
d1 = d1.append(df1)
d1 = d1.append(df2).fillna(0)
print("\n------------ append ----------------\n")
print(d1)
d2 = pd.concat([df1, df2]).fillna(0)
print("\n------------ concat ----------------\n")
print(d2)
d3 = pd.DataFrame()
d3 = d3.combine_first(df1).combine_first(df2).fillna(0)
print("\n------------ combine_first ----------------\n")
print(d3)
Copy the code
Output:
-- -- -- -- -- -- -- -- -- -- -- -- append -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - A B C D 0 20 1 0.0 0.0 0.0 40.0 10.0 50.0 -- -- -- -- -- -- -- -- -- -- -- -- concat -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - A B C D 0 20 1 0.0 0.0 0.0 40.0 10.0 50.0 -- -- -- -- -- -- -- -- -- -- -- -- combine_first -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - A B C D 10.0 20.0 0.0 0.0 0 to 1 0.0 30.0 40.0 50.0Copy the code
Get the average of rows and columns
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [5.5.0.0]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'])
df['Mean Basket'] = df.mean(axis=1)
df.loc['Mean Fruit'] = df.mean()
print(df)
Copy the code
Output:
Apple Orange Banana Pear Mean Basket
Basket1 10.000000 20.0 30.0 40.000000 25.0
Basket2 7.000000 14.0 21.0 28.000000 17.5
Basket3 5.000000 5.0 0.0 0.000000 2.5
Mean Fruit 7.333333 13.0 17.0 22.666667 15.0
Copy the code
Calculates the sum of rows and columns
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [5.5.0.0]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'])
df['Sum Basket'] = df.sum(axis=1)
df.loc['Sum Fruit'] = df.sum(a)print(df)
Copy the code
Output:
Apple Orange Banana Pear Sum Basket Basket1 10 20 30 40 100 Basket2 7 14 21 28 70 Basket3 5 5 0 0 10 Sum Fruit 22 39 51 68, 180,Copy the code
To connect two columns
import pandas as pd
df = pd.DataFrame(columns=['Name'.'Age'])
df.loc[1.'Name'] = 'Rocky'
df.loc[1.'Age'] = 21
df.loc[2.'Name'] = 'Sunny'
df.loc[2.'Age'] = 22
df.loc[3.'Name'] = 'Mark'
df.loc[3.'Age'] = 25
df.loc[4.'Name'] = 'Taylor'
df.loc[4.'Age'] = 28
print('\n------------ BEFORE ----------------\n')
print(df)
df['Employee'] = df['Name'].map(str) + The '-' + df['Age'].map(str)
df = df.reindex(['Employee'], axis=1)
print('\n------------ AFTER ----------------\n')
print(df)
Copy the code
Output:
------------ BEFORE ----------------
Name Age
1 Rocky 21
2 Sunny 22
3 Mark 25
4 Taylor 28
------------ AFTER ----------------
Employee
1 Rocky - 21
2 Sunny - 22
3 Mark - 25
4 Taylor - 28
Copy the code
Filter lines that contain a string
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11'.'1999-05-12'.'1976-01-01'.'1986-06-01'.'1983-06-04'.'1990-03-07'.'1999-07-09'].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print(df)
print("\n---- Filter with State contains TX ----\n")
df1 = df[df['State'].str.contains("TX")]
print(df1)
Copy the code
Output:
DateOfBirth State Jane 1986-11-11 NY Nick 1999-05-12 TX Aaron 1976-01-01 FL Penelope 1986-06-01 AL Dean 1983-06-04 AK Christina 1990-03-07 TX Cornelia 1999-07-09 TX ---- Filter with State contains TX ---- DateOfBirth State Nick 1999-05-12 TX Christina 1990-03-07 TX Cornelia 1999-07-09 TXCopy the code
Filter the rows in the index that contain a certain string
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11'.'1999-05-12'.'1976-01-01'.'1986-06-01'.'1983-06-04'.'1990-03-07'.'1999-07-09'].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Pane'.'Aaron'.'Penelope'.'Frane'.'Christina'.'Cornelia'])
print(df)
print("\n---- Filter Index contains ane ----\n")
df.index = df.index.astype('str')
df1 = df[df.index.str.contains('ane')]
print(df1)
Copy the code
Output:
DateOfBirth State
Jane 1986-11-11 NY
Pane 1999-05-12 TX
Aaron 1976-01-01 FL
Penelope 1986-06-01 AL
Frane 1983-06-04 AK
Christina 1990-03-07 TX
Cornelia 1999-07-09 TX
---- Filter Index contains ane ----
DateOfBirth State
Jane 1986-11-11 NY
Pane 1999-05-12 TX
Frane 1983-06-04 AK
Copy the code
Use the AND operator to filter lines that contain a particular string value
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11'.'1999-05-12'.'1976-01-01'.'1986-06-01'.'1983-06-04'.'1990-03-07'.'1999-07-09'].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Pane'.'Aaron'.'Penelope'.'Frane'.'Christina'.'Cornelia'])
print(df)
print("\n---- Filter DataFrame using & ----\n")
df.index = df.index.astype('str')
df1 = df[df.index.str.contains('ane') & df['State'].str.contains("TX")]
print(df1)
Copy the code
Output:
DateOfBirth State
Jane 1986-11-11 NY
Pane 1999-05-12 TX
Aaron 1976-01-01 FL
Penelope 1986-06-01 AL
Frane 1983-06-04 AK
Christina 1990-03-07 TX
Cornelia 1999-07-09 TX
---- Filter DataFrame using & ----
DateOfBirth State
Pane 1999-05-12 TX
Copy the code
Find all lines that contain a string
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11'.'1999-05-12'.'1976-01-01'.'1986-06-01'.'1983-06-04'.'1990-03-07'.'1999-07-09'].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Pane'.'Aaron'.'Penelope'.'Frane'.'Christina'.'Cornelia'])
print(df)
print("\n---- Filter DataFrame using & ----\n")
df.index = df.index.astype('str')
df1 = df[df.index.str.contains('ane') | df['State'].str.contains("TX")]
print(df1)
Copy the code
Output:
DateOfBirth State
Jane 1986-11-11 NY
Pane 1999-05-12 TX
Aaron 1976-01-01 FL
Penelope 1986-06-01 AL
Frane 1983-06-04 AK
Christina 1990-03-07 TX
Cornelia 1999-07-09 TX
---- Filter DataFrame using & ----
DateOfBirth State
Jane 1986-11-11 NY
Pane 1999-05-12 TX
Frane 1983-06-04 AK
Christina 1990-03-07 TX
Cornelia 1999-07-09 TX
Copy the code
If the value in the row contains a string, create another column equal to the string
import pandas as pd
import numpy as np
df = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Accountant'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
df['Department'] = pd.np.where(df.Occupation.str.contains("Chemist"), "Science",
pd.np.where(df.Occupation.str.contains("Statistician"), "Economics",
pd.np.where(df.Occupation.str.contains("Programmer"), "Computer"."General")))
print(df)
Copy the code
Output:
Age Date Of Join EmpCode Name Occupation Department
0 23 2018-01-25 Emp001 John Chemist Science
1 24 2018-01-26 Emp002 Doe Accountant General
2 34 2018-01-26 Emp003 William Statistician Economics
3 29 2018-02-26 Emp004 Spark Statistician Economics
4 40 2018-03-16 Emp005 Mark Programmer Computer
Copy the code
Calculates the number of rows for each group in the pandas Group
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [5.5.0.0],
[6.6.6.6], [8.8.8.8], [5.5.0.0]],
columns=['Apple'.'Orange'.'Rice'.'Oil'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print(df)
print("\n ----------------------------- \n")
print(df[['Apple'.'Orange'.'Rice'.'Oil']].
groupby(['Apple']).agg(['mean'.'count']))
Copy the code
Output:
Apple Orange Rice Oil Basket1 10 20 30 40 Basket2 7 14 21 28 Basket3 5 5 0 0 Basket4 6 6 6 6 Basket5 8 8 8 8 Basket6 5 5 0 0 ----------------------------- Orange Rice Oil mean count mean count mean count Apple 5 5 2 0 2 0 2 6 6 1 6 1 6 1 7 14 1 21 1 28 1 8 8 1 8 1 10 20 1 30 1 40 1Copy the code
Check if the string is in the DataFrme
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11'.'1999-05-12'.'1976-01-01'.'1986-06-01'.'1983-06-04'.'1990-03-07'.'1999-07-09'].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Pane'.'Aaron'.'Penelope'.'Frane'.'Christina'.'Cornelia'])
if df['State'].str.contains('TX').any() :print("TX is there")
Copy the code
Output:
TX is there
Copy the code
Gets a unique row value from the DataFrame column
import pandas as pd
df = pd.DataFrame({'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print(df)
print("\n----------------\n")
print(df["State"].unique())
Copy the code
Output:
State
Jane NY
Nick TX
Aaron FL
Penelope AL
Dean AK
Christina TX
Cornelia TX
----------------
['NY' 'TX' 'FL' 'AL' 'AK']
Copy the code
Evaluates different values for the DataFrame column
import pandas as pd
df = pd.DataFrame({'Age': [30.20.22.40.20.30.20.25].'Height': [165.70.120.80.162.72.124.81].'Score': [4.6.8.3.9.0.3.3.4.8.9.3].'State': ['NY'.'TX'.'FL'.'AL'.'NY'.'TX'.'FL'.'AL']},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Jaane'.'Nicky'.'Armour'.'Ponting'])
print(df.Age.value_counts())
Copy the code
Output:
20 3
30 2
25 1
22 1
40 1
Name: Age, dtype: int64
Copy the code
Deletes rows with duplicate indexes
import pandas as pd
df = pd.DataFrame({'Age': [30.30.22.40.20.30.20.25].'Height': [165.165.120.80.162.72.124.81].'Score': [4.6.4.6.9.0.3.3.4.8.9.3].'State': ['NY'.'NY'.'FL'.'AL'.'NY'.'TX'.'FL'.'AL']},
index=['Jane'.'Jane'.'Aaron'.'Penelope'.'Jaane'.'Nicky'.'Armour'.'Ponting'])
print("\n -------- Duplicate Rows ----------- \n")
print(df)
df1 = df.reset_index().drop_duplicates(subset='index',
keep='first').set_index('index')
print("\n ------- Unique Rows ------------ \n")
print(df1)
Copy the code
Output:
-------- Duplicate Rows -----------
Age Height Score State
Jane 30 165 4.6 NY
Jane 30 165 4.6 NY
Aaron 22 120 9.0 FL
Penelope 40 80 3.3 AL
Jaane 20 162 4.0 NY
Nicky 30 72 8.0 TX
Armour 20 124 9.0 FL
Ponting 25 81 3.0 AL
------- Unique Rows ------------
Age Height Score State
index
Jane 30 165 4.6 NY
Aaron 22 120 9.0 FL
Penelope 40 80 3.3 AL
Jaane 20 162 4.0 NY
Nicky 30 72 8.0 TX
Armour 20 124 9.0 FL
Ponting 25 81 3.0 AL
Copy the code
Delete rows with duplicate values in some columns
import pandas as pd
df = pd.DataFrame({'Age': [30.40.30.40.30.30.20.25].'Height': [120.162.120.120.120.72.120.81].'Score': [4.6.4.6.9.0.3.3.4.8.9.3].'State': ['NY'.'NY'.'FL'.'AL'.'NY'.'TX'.'FL'.'AL']},
index=['Jane'.'Jane'.'Aaron'.'Penelope'.'Jaane'.'Nicky'.'Armour'.'Ponting'])
print("\n -------- Duplicate Rows ----------- \n")
print(df)
df1 = df.reset_index().drop_duplicates(subset=['Age'.'Height'],
keep='first').set_index('index')
print("\n ------- Unique Rows ------------ \n")
print(df1)
Copy the code
Output:
-------- Duplicate Rows ----------- Age Height Score State Jane 30 120 4.6 NY Jane 40 162 4.6 NY Aaron 30 120 9.0 FL Penelope 40 120 3.3 AL Jaane 30 120 4.0 NY Nicky 30 72 8.0 TX Armour 20 120 9.0 FL Ponting 25 81 3.0 AL ------- Unique Rows ------------ Age Height Score State index Jane 30 120 4.6 NY Jane 40 162 4.6 NY Penelope 40 120 3.3 AL Nicky 30 72 8.0 TX Armour 20 120 9.0 FL Ponting 25 81 3.0alCopy the code
Gets the value from the DataFrame cell
import pandas as pd
df = pd.DataFrame({'Age': [30.40.30.40.30.30.20.25].'Height': [120.162.120.120.120.72.120.81].'Score': [4.6.4.6.9.0.3.3.4.8.9.3].'State': ['NY'.'NY'.'FL'.'AL'.'NY'.'TX'.'FL'.'AL']},
index=['Jane'.'Jane'.'Aaron'.'Penelope'.'Jaane'.'Nicky'.'Armour'.'Ponting'])
print(df.loc['Nicky'.'Age'])
Copy the code
Output:
30
Copy the code
Use the conditional index in the DataFrame to get the scalar value on the cell
import pandas as pd
df = pd.DataFrame({'Age': [30.40.30.40.30.30.20.25].'Height': [120.162.120.120.120.72.120.81].'Score': [4.6.4.6.9.0.3.3.4.8.9.3].'State': ['NY'.'NY'.'FL'.'AL'.'NY'.'TX'.'FL'.'AL']},
index=['Jane'.'Jane'.'Aaron'.'Penelope'.'Jaane'.'Nicky'.'Armour'.'Ponting'])
print("\nGet Height where Age is 20")
print(df.loc[df['Age'] = =20.'Height'].values[0])
print("\nGet State where Age is 30")
print(df.loc[df['Age'] = =30.'State'].values[0])
Copy the code
Output:
Get Height where Age is 20
120
Get State where Age is 30
NY
Copy the code
Sets the specific cell value of the DataFrame
import pandas as pd
df = pd.DataFrame({'Age': [30.40.30.40.30.30.20.25].'Height': [120.162.120.120.120.72.120.81]},
index=['Jane'.'Jane'.'Aaron'.'Penelope'.'Jaane'.'Nicky'.'Armour'.'Ponting'])
print("\n--------------Before------------\n")
print(df)
df.iat[0.0] = 90
df.iat[0.1] = 91
df.iat[1.1] = 92
df.iat[2.1] = 93
df.iat[7.1] = 99
print("\n--------------After------------\n")
print(df)
Copy the code
Output:
--------------Before------------
Age Height
Jane 30 120
Jane 40 162
Aaron 30 120
Penelope 40 120
Jaane 30 120
Nicky 30 72
Armour 20 120
Ponting 25 81
--------------After------------
Age Height
Jane 90 91
Jane 40 92
Aaron 30 93
Penelope 40 120
Jaane 30 120
Nicky 30 72
Armour 20 120
Ponting 25 99
Copy the code
Gets the cell value from the DataFrame row
import pandas as pd
df = pd.DataFrame({'Age': [30.40.30.40.30.30.20.25].'Height': [120.162.120.120.120.72.120.81]},
index=['Jane'.'Jane'.'Aaron'.'Penelope'.'Jaane'.'Nicky'.'Armour'.'Ponting'])
print(df.loc[df.Age == 30.'Height'].tolist())
Copy the code
Output:
[120, 120, 120, 72]
Copy the code
Replace the values in the DataFrame column with dictionaries
import pandas as pd
df = pd.DataFrame({'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print(df)
dict = {"NY": 1."TX": 2."FL": 3."AL": 4."AK": 5}
df1 = df.replace({"State": dict})
print("\n\n")
print(df1)
Copy the code
Output:
State
Jane NY
Nick TX
Aaron FL
Penelope AL
Dean AK
Christina TX
Cornelia TX
State
Jane 1
Nick 2
Aaron 3
Penelope 4
Dean 5
Christina 2
Cornelia 2
Copy the code
Statistics on values based on a column of a column
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11'.'1999-05-12'.'1976-01-01'.'1986-06-01'.'1983-06-04'.'1990-03-07'.'1999-07-09'].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Nick'.'Aaron'.'Penelope'.'Dean'.'Christina'.'Cornelia'])
print(df.groupby('State').DateOfBirth.nunique())
Copy the code
Output:
State
AK 1
AL 1
FL 1
NY 1
TX 3
Name: DateOfBirth, dtype: int64
Copy the code
Handle missing values in DataFrame
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [5,]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'])
print("\n--------- DataFrame ---------\n")
print(df)
print("\n--------- Use of isnull() ---------\n")
print(df.isnull())
print("\n--------- Use of notnull() ---------\n")
print(df.notnull())
Copy the code
Output:
--------- DataFrame --------- Apple Orange Banana Pear Basket1 10 20.0 30.0 40.0 Basket2 7 14.0 21.0 28.0 Basket3 5 NaN NaN NaN --------- Use of isnull() --------- Apple Orange Banana Pear Basket1 False False False False Basket2 False False False False Basket3 False True True True --------- Use of notnull() --------- Apple Orange Banana Pear Basket1 True True True True Basket2 True True True True Basket3 True False False FalseCopy the code
Deletes rows containing any missing data
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [5,]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'])
print("\n--------- DataFrame ---------\n")
print(df)
print("\n--------- Use of dropna() ---------\n")
print(df.dropna())
Copy the code
Output:
--------- DataFrame --------- Apple Orange Banana Pear Basket1 10 20.0 30.0 40.0 Basket2 7 14.0 21.0 28.0 Basket3 5 NaN NaN NaN --------- Use of Dropna () --------- Apple Orange Banana Pear Basket1 10 20.0 30.0 40.0 Basket2 7 14.0 21.0 28.0Copy the code
Removes missing columns from the DataFrame
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [5,]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'])
print("\n--------- DataFrame ---------\n")
print(df)
print("\n--------- Drop Columns) ---------\n")
print(df.dropna(1))
Copy the code
Output:
--------- DataFrame ---------
Apple Orange Banana Pear
Basket1 10 20.0 30.0 40.0
Basket2 7 14.0 21.0 28.0
Basket3 5 NaN NaN NaN
--------- Drop Columns) ---------
Apple
Basket1 10
Basket2 7
Basket3 5
Copy the code
Sort index values in descending order
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11'.'1999-05-12'.'1976-01-01'.'1986-06-01'.'1983-06-04'.'1990-03-07'.'1999-07-09'].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Pane'.'Aaron'.'Penelope'.'Frane'.'Christina'.'Cornelia'])
print(df.sort_index(ascending=False))
Copy the code
Output:
DateOfBirth State
Penelope 1986-06-01 AL
Pane 1999-05-12 TX
Jane 1986-11-11 NY
Frane 1983-06-04 AK
Cornelia 1999-07-09 TX
Christina 1990-03-07 TX
Aaron 1976-01-01 FL
Copy the code
Sort the columns in descending order
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Statistician'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
print(employees.sort_index(axis=1, ascending=False))
Copy the code
Output:
Occupation Name EmpCode Date Of Join Age
0 Chemist John Emp001 2018-01-25 23
1 Statistician Doe Emp002 2018-01-26 24
2 Statistician William Emp003 2018-01-26 34
3 Statistician Spark Emp004 2018-02-26 29
4 Programmer Mark Emp005 2018-03-16 40
Copy the code
Use the rank method to find the rank of the elements in the DataFrame
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [5.5.0.0]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'])
print("\n--------- DataFrame Values--------\n")
print(df)
print("\n--------- DataFrame Values by Rank--------\n")
print(df.rank())
Copy the code
Output:
--------- DataFrame Values-------- Apple Orange Banana Pear Basket1 10 20 30 40 Basket2 7 14 21 28 Basket3 5 5 0 0 --------- DataFrame Values by Rank-------- Apple Orange Banana Pear Basket1 3.0 3.0 3.0 Basket2 2.0 2.0 2.0 Basket3 1.0 1.0 1.0 1.0 1.0Copy the code
Set indexes on multiple columns
import pandas as pd
employees = pd.DataFrame({
'EmpCode': ['Emp001'.'Emp002'.'Emp003'.'Emp004'.'Emp005'].'Name': ['John'.'Doe'.'William'.'Spark'.'Mark'].'Occupation': ['Chemist'.'Statistician'.'Statistician'.'Statistician'.'Programmer'].'Date Of Join': ['2018-01-25'.'2018-01-26'.'2018-01-26'.'2018-02-26'.'2018-03-16'].'Age': [23.24.34.29.40]})
print("\n --------- Before Index ----------- \n")
print(employees)
print("\n --------- Multiple Indexing ----------- \n")
print(employees.set_index(['Occupation'.'Age']))
Copy the code
Output:
Date Of Join EmpCode Name
Occupation Age
Chemist 23 2018-01-25 Emp001 John
Statistician 24 2018-01-26 Emp002 Doe
34 2018-01-26 Emp003 William
29 2018-02-26 Emp004 Spark
Programmer 40 2018-03-16 Emp005 Mark
Copy the code
Determine the periodic index and column of the DataFrame
import pandas as pd
values = ["India"."Canada"."Australia"."Japan"."Germany"."France"]
pidx = pd.period_range('2015-01-01', periods=6)
df = pd.DataFrame(values, index=pidx, columns=['Country'])
print(df)
Copy the code
Output:
Country
2015-01-01 India
2015-01-02 Canada
2015-01-03 Australia
2015-01-04 Japan
2015-01-05 Germany
2015-01-06 France
Copy the code
Import CSV to specify specific indexes
import pandas as pd
df = pd.read_csv('test.csv', index_col="DateTime")
print(df)
Copy the code
Output:
Wheat Rice Oil
DateTime
10/10/2016 10.500 12.500 16.500
10/11/2016 11.250 12.750 17.150
10/12/2016 10.000 13.150 15.500
10/13/2016 12.000 14.500 16.100
10/14/2016 13.000 14.825 15.600
10/15/2016 13.075 15.465 15.315
10/16/2016 13.650 16.105 15.030
10/17/2016 14.225 16.745 14.745
10/18/2016 14.800 17.385 14.460
10/19/2016 15.375 18.025 14.175
Copy the code
Write the DataFrame to CSV
import pandas as pd
df = pd.DataFrame({'DateOfBirth': ['1986-11-11'.'1999-05-12'.'1976-01-01'.'1986-06-01'.'1983-06-04'.'1990-03-07'.'1999-07-09'].'State': ['NY'.'TX'.'FL'.'AL'.'AK'.'TX'.'TX']
},
index=['Jane'.'Pane'.'Aaron'.'Penelope'.'Frane'.'Christina'.'Cornelia'])
df.to_csv('test.csv', encoding='utf-8', index=True)
Copy the code
Output:
Checking local filesCopy the code
Use Pandas to read the specific columns of the CSV file
import pandas as pd
df = pd.read_csv("test.csv", usecols = ['Wheat'.'Oil'])
print(df)
Copy the code
Pandas gets the list of the CSV columns
import pandas as pd
cols = list(pd.read_csv("test.csv", nrows =1))
print(cols)
Copy the code
Output:
['DateTime', 'Wheat', 'Rice', 'Oil']
Copy the code
Find the row with the largest column value
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'])
print(df.ix[df['Apple'].idxmax()])
Copy the code
Output:
Apple 55
Orange 15
Banana 8
Pear 12
Name: Basket3, dtype: int64
Copy the code
Use query methods for complex condition selection
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'])
print(df)
print("\n ----------- Filter data using query method ------------- \n")
df1 = df.ix[df.query('Apple > 50 & Orange <= 15 & Banana < 15 & Pear == 12').index]
print(df1)
Copy the code
Output:
Apple Orange Banana Pear
Basket1 10 20 30 40
Basket2 7 14 21 28
Basket3 55 15 8 12
----------- Filter data using query method -------------
Apple Orange Banana Pear
Basket3 55 15 8 12
Copy the code
Check for columns in Pandas
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'])
if 'Apple' in df.columns:
print("Yes")
else:
print("No")
if set(['Apple'.'Orange']).issubset(df.columns):
print("Yes")
else:
print("No")
Copy the code
Find n-largest and n-smallest values from the DataFrame for a specific column
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12],
[15.14.1.8], [7.1.1.8], [5.4.9.2]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n----------- nsmallest -----------\n")
print(df.nsmallest(2['Apple']))
print("\n----------- nlargest -----------\n")
print(df.nlargest(2['Apple']))
Copy the code
Output:
----------- nsmallest -----------
Apple Orange Banana Pear
Basket6 5 4 9 2
Basket2 7 14 21 28
----------- nlargest -----------
Apple Orange Banana Pear
Basket3 55 15 8 12
Basket4 15 14 1 8
Copy the code
Find the minimum and maximum values for all columns from the DataFrame
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12],
[15.14.1.8], [7.1.1.8], [5.4.9.2]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n----------- Minimum -----------\n")
print(df[['Apple'.'Orange'.'Banana'.'Pear']].min())
print("\n----------- Maximum -----------\n")
print(df[['Apple'.'Orange'.'Banana'.'Pear']].max())
Copy the code
Output:
----------- Minimum -----------
Apple 5
Orange 1
Banana 1
Pear 2
dtype: int64
----------- Maximum -----------
Apple 55
Orange 20
Banana 30
Pear 40
dtype: int64
Copy the code
Find the index location of the minimum and maximum values in the DataFrame
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12],
[15.14.1.8], [7.1.1.8], [5.4.9.2]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n----------- Minimum -----------\n")
print(df[['Apple'.'Orange'.'Banana'.'Pear']].idxmin())
print("\n----------- Maximum -----------\n")
print(df[['Apple'.'Orange'.'Banana'.'Pear']].idxmax())
Copy the code
Output:
----------- Minimum -----------
Apple Basket6
Orange Basket5
Banana Basket4
Pear Basket6
dtype: object
----------- Maximum -----------
Apple Basket3
Orange Basket1
Banana Basket1
Pear Basket1
dtype: object
Copy the code
Compute the cumulative product and cumulative sum of DataFrame Columns
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12],
[15.14.1.8], [7.1.1.8], [5.4.9.2]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n----------- Cumulative Product -----------\n")
print(df[['Apple'.'Orange'.'Banana'.'Pear']].cumprod())
print("\n----------- Cumulative Sum -----------\n")
print(df[['Apple'.'Orange'.'Banana'.'Pear']].cumsum())
Copy the code
Output:
----------- Cumulative Product -----------
Apple Orange Banana Pear
Basket1 10 20 30 40
Basket2 70 280 630 1120
Basket3 3850 4200 5040 13440
Basket4 57750 58800 5040 107520
Basket5 404250 58800 5040 860160
Basket6 2021250 235200 45360 1720320
----------- Cumulative Sum -----------
Apple Orange Banana Pear
Basket1 10 20 30 40
Basket2 17 34 51 68
Basket3 72 49 59 80
Basket4 87 63 60 88
Basket5 94 64 61 96
Basket6 99 68 70 98
Copy the code
Summary statistics
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12],
[15.14.1.8], [7.1.1.8], [5.4.9.2]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n----------- Describe DataFrame -----------\n")
print(df.describe())
print("\n----------- Describe Column -----------\n")
print(df[['Apple']].describe())
Copy the code
Output:
----------- Describe DataFrame ----------- Apple Orange Banana Pear Count 6.000000 6.000000 6.000000 mean 16.500000 11.333333 11.666667 16.333333 STD 19.180719 7.257180 11.587349 14.555640 min 5.000000 1.000000 1.000000 2.000000 25% 7.000000 6.500000 2.750000 8.000000 50% 8.500000 14.000000 8.500000 10.000000 75% 13.750000 14.750000 18.000000 24.000000 Max 55.000000 20.000000 30.000000 40.000000 ----------- Describe Column ----------- Apple count 6.000000 mean 16.500000 STD 19.180719 min 5.000000 25% 7.000000 50% 8.500000 75% 13.750000 Max 55.000000Copy the code
Find the mean, median, and mode of the DataFrame
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12],
[15.14.1.8], [7.1.1.8], [5.4.9.2]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n----------- Calculate Mean -----------\n")
print(df.mean())
print("\n----------- Calculate Median -----------\n")
print(df.median())
print("\n----------- Calculate Mode -----------\n")
print(df.mode())
Copy the code
Output:
----------- Calculate Mean -----------
Apple 16.500000
Orange 11.333333
Banana 11.666667
Pear 16.333333
dtype: float64
----------- Calculate Median -----------
Apple 8.5
Orange 14.0
Banana 8.5
Pear 10.0
dtype: float64
----------- Calculate Mode -----------
Apple Orange Banana Pear
0 7 14 1 8
Copy the code
Measure the variance and standard deviation of DataFrame columns
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12],
[15.14.1.8], [7.1.1.8], [5.4.9.2]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n----------- Calculate Mean -----------\n")
print(df.mean())
print("\n----------- Calculate Median -----------\n")
print(df.median())
print("\n----------- Calculate Mode -----------\n")
print(df.mode())
Copy the code
Output:
----------- Measure Variance -----------
Apple 367.900000
Orange 52.666667
Banana 134.266667
Pear 211.866667
dtype: float64
----------- Standard Deviation -----------
Apple 19.180719
Orange 7.257180
Banana 11.587349
Pear 14.555640
dtype: float64
Copy the code
Calculates the covariance between DataFrame columns
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12],
[15.14.1.8], [7.1.1.8], [5.4.9.2]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n----------- Calculating Covariance -----------\n")
print(df.cov())
print("\n----------- Between 2 columns -----------\n")
# Covariance of Apple vs Orange
print(df.Apple.cov(df.Orange))
Copy the code
Output:
----------- improved Covariance ----------- Apple Orange Banana Pear Apple 367.9 Orange 47.6 52.666667 54.333333 77.866667 banana-40.2 54.333333 134.266667 154.933333 Pear -35.0 77.866667 154.933333 211.866667 -- -- -- -- -- -- -- -- -- -- - Between 2 columns -- -- -- -- -- -- -- -- -- -- - 47.60000000000001Copy the code
Calculates the correlation between two DataFrame objects in Pandas
import pandas as pd
df1 = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12],
[15.14.1.8], [7.1.1.8], [5.4.9.2]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n------ Calculating Correlation of one DataFrame Columns -----\n")
print(df1.corr())
df2 = pd.DataFrame([[52.54.58.41], [14.24.51.78], [55.15.8.12],
[15.14.1.8], [7.17.18.98], [15.34.29.52]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n----- Calculating correlation between two DataFrame -------\n")
print(df2.corrwith(other=df1))
Copy the code
Output:
------ improved Correlation of one DataFrame Columns ----- Apple Orange Banana Pear Apple 1.000000 0.341959-0.180874 -0.125364 Orange 0.341959 1.000000 0.646122 0.737144 Banana -0.180874 0.646122 1.000000 0.918606 Pear -0.125364 0.918606 1.000000 ----- correlation between two DataFrame ------- Apple 0.678775 Orange 0.354993 Banana 0.920872 Pear 0.076919 DTYPE: Float64Copy the code
Calculates the percentage change for each cell in the DataFrame column
import pandas as pd
df = pd.DataFrame([[10.20.30.40], [7.14.21.28], [55.15.8.12],
[15.14.1.8], [7.1.1.8], [5.4.9.2]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n------ Percent change at each cell of a Column -----\n")
print(df[['Apple']].pct_change()[:3])
print("\n------ Percent change at each cell of a DataFrame -----\n")
print(df.pct_change()[:5])
Copy the code
Output:
------ Percent change at each cell of a Column ----- Apple Basket1 NaN basket2-0.300000 Basket3 6.857143 ------ Percent Change at each cell of a DataFrame ----- Apple Orange Banana Pear Basket1 NaN NaN NaN basket2-0.300000 -0.300000 Basket3 6.857143 0.071429-0.619048-0.571429 Basket4 0.727273-0.066667-0.875000-0.333333 Basket5-0.533333-0.928571 0.000000 0.000000Copy the code
The missing values for the DataFrame column are filled forward and backward in Pandas
import pandas as pd
df = pd.DataFrame([[10.30.40], [], [15.8.12],
[15.14.1.8], [7.8], [5.4.1]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n------ DataFrame with NaN -----\n")
print(df)
print("\n------ DataFrame with Forward Filling -----\n")
print(df.ffill())
print("\n------ DataFrame with Forward Filling -----\n")
print(df.bfill())
Copy the code
Output:
------ DataFrame with NaN ----- Apple Orange Banana Pear Basket1 10.0 30.0 40.0 NaN Basket2 NaN NaN NaN Basket3 15.0 8.0 12.0 NaN Basket4 15.0 14.0 1.0 8.0 Basket5 7.0 8.0 NaN NaN Basket6 5.0 4.0 1.0 NaN ------ DataFrame with Forward Filling ----- Apple Orange Banana Pear Basket1 10.0 30.0 40.0 NaN Basket2 10.0 30.0 40.0 NaN Basket3 15.0 8.0 12.0 NaN Basket4 15.0 14.0 1.0 8.0 Basket5 7.0 8.0 1.0 8.0 Basket6 5.0 4.0 1.0 8.0 ------ DataFrame with Forward Filling ----- Apple Orange Banana Pear Basket1 10.0 30.0 40.0 8.0 Basket2 15.0 8.0 12.0 8.0 Basket3 15.0 8.0 12.0 8.0 Basket4 15.0 14.0 1.0 8.0 Basket5 7.0 8.0 1.0 NaN Basket6 5.0 4.0 1.0 NaNCopy the code
Use Stacking in Pandas for non-hierarchical indexes
import pandas as pd
df = pd.DataFrame([[10.30.40], [], [15.8.12],
[15.14.1.8], [7.8], [5.4.1]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n------ DataFrame-----\n")
print(df)
print("\n------ Stacking DataFrame -----\n")
print(df.stack(level=-1))
Copy the code
Output:
------ DataFrame----- Apple Orange Banana Pear Basket1 10.0 30.0 40.0 NaN Basket2 NaN NaN NaN Basket3 15.0 8.0 12.0 NaN Basket4 15.0 14.0 1.0 8.0 Basket5 7.0 8.0 NaN NaN Basket6 5.0 4.0 1.0 NaN ------ Stacking DataFrame ----- Basket1 Apple 10.0 Orange 30.0 Banana 40.0 Basket3 Apple 15.0 Orange 8.0 Banana 12.0 Basket4 Apple 15.0 Orange 14.0 Banana 1.0 Pear 8.0 Basket5 Apple 7.0 Orange 8.0 Basket6 Apple 5.0 Orange 4.0 Banana 1.0 DType: Float64Copy the code
Pandas is split using a hierarchical index
import pandas as pd
df = pd.DataFrame([[10.30.40], [], [15.8.12],
[15.14.1.8], [7.8], [5.4.1]],
columns=['Apple'.'Orange'.'Banana'.'Pear'],
index=['Basket1'.'Basket2'.'Basket3'.'Basket4'.'Basket5'.'Basket6'])
print("\n------ DataFrame-----\n")
print(df)
print("\n------ Unstacking DataFrame -----\n")
print(df.unstack(level=-1))
Copy the code
Output:
------ DataFrame----- Apple Orange Banana Pear Basket1 10.0 30.0 40.0 NaN Basket2 NaN NaN NaN Basket3 15.0 8.0 12.0 NaN Basket4 15.0 14.0 1.0 8.0 Basket5 7.0 8.0 NaN NaN Basket6 5.0 4.0 1.0 NaN ------ Unstacking DataFrame ----- Apple Basket1 10.0 Basket2 NaN Basket3 15.0 Basket4 15.0 Basket5 7.0 Basket6 5.0 Orange Basket1 30.0 Basket2 NaN Basket3 8.0 Basket4 14.0 Basket5 8.0 Basket6 4.0 Banana Basket1 40.0 Basket2 NaN Basket3 12.0 Basket4 1.0 Basket5 NaN Basket6 1.0 Pear Basket1 NaN Basket2 NaN Basket3 NaN Basket4 8.0 Basket5 NaN Basket6 NaN DType: float64Copy the code
Pandas Obtains the table data of the HTML page
import pandas as pd
df pd.read_html("url")
Copy the code
Pandas – PDF Download the full PDF file