requirements

1. Master read and write operations of common file formats

2. Understand and become familiar with important properties and methods of Series and DataFrame

3. Master various kinds of sorting (index sorting and value sorting, single-level sorting and multi-level sorting)

import pandas as pd
import numpy as np
Copy the code

1. View the Pandas version

pd.__version__
Copy the code

I. File reading and writing

Read 1.

(a) CSV format

df = pd.read_csv('data/table.csv')
df.head()
Copy the code

(b) TXT format

df_txt = pd.read_table('data/table.txt')   Sep separator parameter can be set
df_txt.head()
Copy the code

(c) XLS or XLSX format

XLRD package needs to be installed
df_excel = pd.read_excel('data/table.xlsx')
df_excel.head()
Copy the code

2. Write

(a) CSV format

df.to_csv('data/new_table.csv')
#df.to_csv('data/new_table.csv', index=False
Copy the code

(b) XLS or XLSX format

Openpyxl needs to be installed
df.to_excel('data/new_table2.xlsx', sheet_name='Sheet1')
Copy the code

2. Basic data structure

1. Series

(a) Create a Series

For a Series, the most common attributes are values, index, name, and type.

s = pd.Series(np.random.randn(5),index=['a'.'b'.'c'.'d'.'e'],name = 'This is a Series.',dtype='float64')
s
Copy the code

(b) Access the Series property

s.values
Copy the code

s.name
Copy the code

s.index
Copy the code

(c) Take an element

The use of indexes will be discussed in more detail in Chapter 2, but this is an overview

s['a']
Copy the code

(d) Call methods

s.mean()
Copy the code

print([attr for attr in dir(s) if not attr.startswith(The '-')])
Copy the code

2. DataFrame

(a) Create a DataFrame

df = pd.DataFrame({'col1':list('abcde'),'col2':range(5.10),'col3': [1.3.2.5.3.6.4.6.5.8]},
                 index = list(One, two, three, four, five.))
df
Copy the code

(b) Fetch a Series column from DataFrame

df['col1']
Copy the code

type(df)
Copy the code

type(df['col3'])
Copy the code

(c) Modify row or column names

df.rename(index={'一':'one'},columns={'col1':'new_col1'})
Copy the code

(d) Invoke properties and methods

df.index
Copy the code

df.columns
Copy the code

df.values
Copy the code

df.shape
Copy the code

df.mean()
Copy the code

(e) Index alignment features

This is a very powerful feature in Pandas, and failure to understand it can sometimes cause problems

df1 = pd.DataFrame({'A': [1.2.3]},index=[1.2.3])
df2 = pd.DataFrame({'A': [1.2.3]},index=[3.1.2])
df1 - df2
Copy the code

(f) Deletion and addition of columns

For deletion, you can use the drop function or del or POP

df.drop(index='five',columns='col1')# Setting inplace=True will be changed directly in the original DataFrame
Copy the code

df['col1'] = [1.2.3.4.5]
del df['col1']
df
Copy the code

df['col1'] = [1.2.3.4.5]
df.pop('col1')
Copy the code

df
Copy the code

You can add new columns directly or use the assign method

df1['B']=list('abc')
df1.assign(C=pd.Series(list('def')))
Copy the code

df1
Copy the code

(g) Select columns by type

df.select_dtypes(include=['number']).head()
Copy the code

df.select_dtypes(include=['float']).head()
Copy the code

(h) Convert Series to DataFrame

s = df.mean()
s.name='to_DataFrame'
s
Copy the code

s.to_frame()
Copy the code

s.to_frame().T
Copy the code

Common basic functions

We will use this virtual data set starting from below and throughout the following chapters

df = pd.read_csv('data/table.csv')
Copy the code

1. The head and tail

df.head()
Copy the code

df.tail()
Copy the code

df.head(3)
Copy the code

2. Unique and nunique

df['Physics'].nunique()
Copy the code

df['Physics'].unique()
Copy the code

3. Count and value_counts

Count returns the number of elements with non-missing values

df['Physics'].count()
Copy the code

df['Physics'].value_counts()
Copy the code

4. Describe and info

The info function returns which columns there are, how many non-missing values there are, and the type of each column

df.info()
Copy the code

Describe various statistics for numerical data by default

df.describe()
Copy the code

df.describe(percentiles=[.05.25..75..95.])
Copy the code

The describe function can also be used for non-numerical types

df['Physics'].describe()
Copy the code

5. Idxmax and nlargest

The idxmax function returns the maximum value and is particularly useful in some cases. Idxmin works similarly

df['Math'].idxmax()
Copy the code

df['Math'].nlargest(3)
Copy the code

6. Clip and replace

Clip and replace are two types of replacement functions. Clip truncates numbers that exceed or fall below certain values

df['Math'].head()
Copy the code

df['Math'].clip(33.80).head()
Copy the code

df['Math'].mad()
Copy the code

df['Address'].head()
Copy the code

df['Address'].replace(['street_1'.'street_2'], ['one'.'two']).head()
Copy the code

df.replace({'Address': {'street_1':'one'.'street_2':'two'}}).head()
Copy the code

7. The apply function

Apply is a function with a high degree of freedom, and we will also mention in Chapter 3 that for Series, it iterates over the value operations of each column:

df['Math'].apply(lambda x:str(x)+'! ').head() Lambda expressions can be used, or functions can be used
Copy the code

For DataFrame, it can iterate over each column operation:

df.apply(lambda x:x.apply(lambda x:str(x)+'! ')).head() # This is a slightly more complex example to help you understand what Apply does
Copy the code

Four, sorting,

1. Index sort

df.set_index('Math').head() The #set_index function sets the index, which is covered in the next chapter
Copy the code

df.set_index('Math').sort_index().head() # You can set the ascending parameter, which defaults to True
Copy the code

2. The value of order

df.sort_values(by='Class').head()
Copy the code

Sorting of multiple values, that is, the first layer is sorted and the second layer is sorted if the first layer is the same

df.sort_values(by=['Address'.'Height']).head()
Copy the code

Code and data address: github.com/XiangLinPro…

All coincidences are either destined by god or a person secretly working on it.

Personal wechat public account, focusing on learning resources, notes sharing, welcome to follow. We grew and learned together. Have been pure, kind, warm love life, if you feel a little useful, please do not be stingy with your hands of the power to like, thank my dear readers.

-Fear not that the life shall come to an end, “Fear not that the life shall come to an end, but rather fear that it shall never have a beginning.” — J.H. Newman

Github is a Github with a lot of dry stuff on it, which is guaranteed to satisfy you:Github.com/XiangLinPro…

About Datawhale

Datawhale is an open source organization focusing on the field of data science and AI. It gathers excellent learners from universities and well-known enterprises in various fields, and gathers a group of team members with open source spirit and exploration spirit. Datawhale, with the vision of “for the learner, grow with learners”, encourages true self-presentation, openness and inclusiveness, mutual trust and mutual assistance, dare to trial and error and dare to take responsibility. At the same time, Datawhale explores open source content, open source learning and open source solutions with the concept of open source, enabling talent cultivation, facilitating talent growth, and establishing connections between people, people and knowledge, people and enterprises, and people and the future.