preface

Pandas is often used for data cleaning and processing. I am not familiar with Pandas, so some processing procedures are unnecessarily complicated.

I often look over and over again at the document because I can’t think of a Pandans use.

So, here is a brief note of the process you usually use.

There is no description for Pandas. It will continue to be updated. It is only a personal record

Using the environment

jupyter notebook

Importing required packages

import pandas as pd
import numpy as np
Copy the code

Build Data 1

# Build data student information
data = {'number': ['01'.'02'.'03'.'04'].'name': ['Ming'.'little red'.'xiao LAN'.'zhang'].'mathematics': [80, 90, 60, 90],
       'Chinese': [70, 80, 90, 70]}
df = pd.DataFrame(data)
Copy the code

1. Change the value of a column as a whole

All students are now required to raise their Chinese by 10 points
df_ = df.copy()
df_['Chinese'] = df_['Chinese'] + 10
df_
Copy the code

2. Modify the value of a column that meets the condition

Subtract 10 points from 90 or more in Chinese
df_ = df.copy()
df_.loc[df_['Chinese'] > = 90,'Chinese'] = df_['Chinese']-10
df_
Copy the code

3. Add a column

# Add the average score of each student
df_ = df.copy()
df_['Average score'] = df_[['mathematics'.'Chinese']].mean(axis=1)
df_
Copy the code

Assign the new column to the value of the other column, such as a score between 60 and 70 is passing... 90-100 is good
def score_level(x):
    if x < 60:
        return 'Fail'
    elif x < 70:
        return 'pass'
    elif x < 80:
        return 'medium'
    elif x < 90:
        return 'good'
    elif x <= 100:
        return 'good'
    else:
        raise Exception(f'{x}' this score is wrong ')
df_ = df.copy()
df_['Math level'] = df_['mathematics'].apply(score_level)
df_['Language Levels'] = df_['Chinese'].apply(score_level)
df_
Copy the code

Note: here you may feel that “math” should be followed by “math level”, just adjust the following order, there are two ways to do this

# the first
df_ = df_.reindex(columns = ['number'.'name'.'mathematics'.'Math level'.'Chinese'.'Language Levels'])
# the second
# df_ = df_ [[' number ', 'name', 'mathematics',' math grades, 'language', 'language level]]
df_
Copy the code

4. Add a row

# Add the average score of each subject
df_ = df.copy()
df_.loc['Average score'] = df[['mathematics'.'Chinese']].mean()
df_
Copy the code

5, column change line

df_col = df.set_index(['number'.'name'])\
    .stack()\
    .reset_index()\
    .rename({'level_2': 'subjects'Zero:'scores'}, axis=1)
df_col
Copy the code

1. set_index() : sets’ number ‘, ‘name’ as index; 2. Stack () : set index to series (‘ number ‘, ‘name’ as index), set column value to value; 3. Reset_index () : resets index, DataFrame, level_2 and 0; 4. Rename () : Changes the name of a column index.

6, row to column

df_row = df_col.set_index(['number'.'name'.'subjects'[])'scores']\
    .unstack()\
    .rename_axis(columns=None)\
    .reset_index()
Copy the code

1. set_index() : sets’ number ‘, ‘name ‘,’ subject ‘as indexes; 2. Then extract ‘score’, data type Series; 3. Unstack () : set the last index of the Series index to the column index, set the column value to the value, and set the data type DataFrame; Rename_axis (columns=None) Set the name of the column index to None; 5. Reset_index () resets a row index to a column index.

How to use stack and unstack

Note: 1) Not all Series can use unstack and DataFrame can use stack and unstack, see “value combination”. 0 Unstack contains duplicate entries ValueError: Index contains duplicate entries, 0 0 2) Unstack () can also pass in an index level ordinal or name to split a different level.

7. Pivot row to column

Pivot () : Transforms one column into multiple columns in the new DataFrame.

The datafame.pivot (index=None, columns=None, values=None) method is equivalent to creating a tiered index using set_index() and then calling unstack().

  • Index: What is the index name of the new table being reshaped?
  • Columns: What are the column names of the new table being reshaped?
  • Values: Which column to use to populate the new table.

Error: Execute the following code:

df_col.pivot(['number'.'name'].'subjects'.'scores')
Copy the code

ValueError: Length of passed values is 8, index implies 2.

Note: If there is a way to solve with pivot line, please leave a comment ~ thanks ~

7.1. Implement row to column conversion of single row index
df_col.pivot('name'.'subjects'.'scores')
Copy the code

7.2. Implement row turn column of multi-row index
df_col.pivot_table(index=['number'.'name'], columns='subjects', values='scores')
Copy the code

Note: Of course, a single column can also be assigned to pivot_table

Melt column shift

The reverse operation of pivot() is melt() : merge columns into a single column, producing a new table.

  • Id_vars: optional. Column names that do not need to be converted are used as row indexes after conversion.
  • Value_vars: Optional, existing columns that need to be converted. If not specified, all columns except ID_vars are converted.
  • Var_name: default value variable, custom column name name;
  • Value_name: default value value, user-defined column name.
  • Col_level: optional. This level is used if the column is MultiIndex.
df.melt(['number'.'name'], var_name='subjects', value_name='scores')
Copy the code

So row and row transitions are easy with pivot() and melt()!