requirements
1. Master read and write operations of common file formats
2. Understand and become familiar with important properties and methods of Series and DataFrame
3. Master various kinds of sorting (index sorting and value sorting, single-level sorting and multi-level sorting)
import pandas as pd
import numpy as np
Copy the code
1. View the Pandas version
pd.__version__
Copy the code
I. File reading and writing
Read 1.
(a) CSV format
df = pd.read_csv('data/table.csv')
df.head()
Copy the code
(b) TXT format
df_txt = pd.read_table('data/table.txt') Sep separator parameter can be set
df_txt.head()
Copy the code
(c) XLS or XLSX format
XLRD package needs to be installed
df_excel = pd.read_excel('data/table.xlsx')
df_excel.head()
Copy the code
2. Write
(a) CSV format
df.to_csv('data/new_table.csv')
#df.to_csv('data/new_table.csv', index=False
Copy the code
(b) XLS or XLSX format
Openpyxl needs to be installed
df.to_excel('data/new_table2.xlsx', sheet_name='Sheet1')
Copy the code
2. Basic data structure
1. Series
(a) Create a Series
For a Series, the most common attributes are values, index, name, and type.
s = pd.Series(np.random.randn(5),index=['a'.'b'.'c'.'d'.'e'],name = 'This is a Series.',dtype='float64')
s
Copy the code
(b) Access the Series property
s.values
Copy the code
s.name
Copy the code
s.index
Copy the code
(c) Take an element
The use of indexes will be discussed in more detail in Chapter 2, but this is an overview
s['a']
Copy the code
(d) Call methods
s.mean()
Copy the code
print([attr for attr in dir(s) if not attr.startswith(The '-')])
Copy the code
2. DataFrame
(a) Create a DataFrame
df = pd.DataFrame({'col1':list('abcde'),'col2':range(5.10),'col3': [1.3.2.5.3.6.4.6.5.8]},
index = list(One, two, three, four, five.))
df
Copy the code
(b) Fetch a Series column from DataFrame
df['col1']
Copy the code
type(df)
Copy the code
type(df['col3'])
Copy the code
(c) Modify row or column names
df.rename(index={'一':'one'},columns={'col1':'new_col1'})
Copy the code
(d) Invoke properties and methods
df.index
Copy the code
df.columns
Copy the code
df.values
Copy the code
df.shape
Copy the code
df.mean()
Copy the code
(e) Index alignment features
This is a very powerful feature in Pandas, and failure to understand it can sometimes cause problems
df1 = pd.DataFrame({'A': [1.2.3]},index=[1.2.3])
df2 = pd.DataFrame({'A': [1.2.3]},index=[3.1.2])
df1 - df2
Copy the code
(f) Deletion and addition of columns
For deletion, you can use the drop function or del or POP
df.drop(index='five',columns='col1')# Setting inplace=True will be changed directly in the original DataFrame
Copy the code
df['col1'] = [1.2.3.4.5]
del df['col1']
df
Copy the code
df['col1'] = [1.2.3.4.5]
df.pop('col1')
Copy the code
df
Copy the code
You can add new columns directly or use the assign method
df1['B']=list('abc')
df1.assign(C=pd.Series(list('def')))
Copy the code
df1
Copy the code
(g) Select columns by type
df.select_dtypes(include=['number']).head()
Copy the code
df.select_dtypes(include=['float']).head()
Copy the code
(h) Convert Series to DataFrame
s = df.mean()
s.name='to_DataFrame'
s
Copy the code
s.to_frame()
Copy the code
s.to_frame().T
Copy the code
Common basic functions
We will use this virtual data set starting from below and throughout the following chapters
df = pd.read_csv('data/table.csv')
Copy the code
1. The head and tail
df.head()
Copy the code
df.tail()
Copy the code
df.head(3)
Copy the code
2. Unique and nunique
df['Physics'].nunique()
Copy the code
df['Physics'].unique()
Copy the code
3. Count and value_counts
Count returns the number of elements with non-missing values
df['Physics'].count()
Copy the code
df['Physics'].value_counts()
Copy the code
4. Describe and info
The info function returns which columns there are, how many non-missing values there are, and the type of each column
df.info()
Copy the code
Describe various statistics for numerical data by default
df.describe()
Copy the code
df.describe(percentiles=[.05.25..75..95.])
Copy the code
The describe function can also be used for non-numerical types
df['Physics'].describe()
Copy the code
5. Idxmax and nlargest
The idxmax function returns the maximum value and is particularly useful in some cases. Idxmin works similarly
df['Math'].idxmax()
Copy the code
df['Math'].nlargest(3)
Copy the code
6. Clip and replace
Clip and replace are two types of replacement functions. Clip truncates numbers that exceed or fall below certain values
df['Math'].head()
Copy the code
df['Math'].clip(33.80).head()
Copy the code
df['Math'].mad()
Copy the code
df['Address'].head()
Copy the code
df['Address'].replace(['street_1'.'street_2'], ['one'.'two']).head()
Copy the code
df.replace({'Address': {'street_1':'one'.'street_2':'two'}}).head()
Copy the code
7. The apply function
Apply is a function with a high degree of freedom, and we will also mention in Chapter 3 that for Series, it iterates over the value operations of each column:
df['Math'].apply(lambda x:str(x)+'! ').head() Lambda expressions can be used, or functions can be used
Copy the code
For DataFrame, it can iterate over each column operation:
df.apply(lambda x:x.apply(lambda x:str(x)+'! ')).head() # This is a slightly more complex example to help you understand what Apply does
Copy the code
Four, sorting,
1. Index sort
df.set_index('Math').head() The #set_index function sets the index, which is covered in the next chapter
Copy the code
df.set_index('Math').sort_index().head() # You can set the ascending parameter, which defaults to True
Copy the code
2. The value of order
df.sort_values(by='Class').head()
Copy the code
Sorting of multiple values, that is, the first layer is sorted and the second layer is sorted if the first layer is the same
df.sort_values(by=['Address'.'Height']).head()
Copy the code
Code and data address: github.com/XiangLinPro…
All coincidences are either destined by god or a person secretly working on it.
Personal wechat public account, focusing on learning resources, notes sharing, welcome to follow. We grew and learned together. Have been pure, kind, warm love life, if you feel a little useful, please do not be stingy with your hands of the power to like, thank my dear readers.
-Fear not that the life shall come to an end, “Fear not that the life shall come to an end, but rather fear that it shall never have a beginning.” — J.H. Newman
Github is a Github with a lot of dry stuff on it, which is guaranteed to satisfy you:Github.com/XiangLinPro…
About Datawhale
Datawhale is an open source organization focusing on the field of data science and AI. It gathers excellent learners from universities and well-known enterprises in various fields, and gathers a group of team members with open source spirit and exploration spirit. Datawhale, with the vision of “for the learner, grow with learners”, encourages true self-presentation, openness and inclusiveness, mutual trust and mutual assistance, dare to trial and error and dare to take responsibility. At the same time, Datawhale explores open source content, open source learning and open source solutions with the concept of open source, enabling talent cultivation, facilitating talent growth, and establishing connections between people, people and knowledge, people and enterprises, and people and the future.