This article is participating in Python Theme Month. See the link to the event for more details

The Excel screenshot used in this article is as follows:

Numerical substitution is the substitution of the value A for the value B, and can be used in outlier processing, missing value filling processing. It mainly includes one to one substitution, many to one substitution and many to many substitution.

Take a look at the original data structure before the replacement begins

Import pandas as pd df = pd.read_excel(r 'c :\Users\admin\Desktop\ test.xlsx ') print(df)Copy the code

result:

Name age 0 Xiaoming 23.0 1 Xiaogang 98.0 2 Xiaohong 876.0 3 Li Hua 65.0 4 Xiaomei-6.0 5 Zhang SAN 34.0 6 Li Si 13.0 7 Wang Wu 98.5Copy the code

We can see that 876, -6, and 98.5 in the age column are not valid arrays, so let’s replace them

1. One-to-one substitution

One to one substitution is to replace all values in a certain area with another value.

Df = pd.read_excel(r'C:\Users\admin\Desktop\ test.xlsx ') print(df[' age '].replace(-6, 6))Copy the code

result:

0 23.0 1 98.0 2 876.0 3 65.0 4 6.0 5 34.0 6 13.0 7 98.5 Name: age, dType: float64Copy the code

For regular replacement objects, you can use the replace method for batch replacement.

eg:

Import pandas as pd df = pd.read_excel(r'C:\Users\admin\Desktop\ test.xlsx ', sheet_name='Sheet2') print(df)Copy the code

result:

Name age 0 Xiao Ming 23.0 1 Xiao Gang NaN 2 Xiao Hong 876.0 3 Li Hua 65.0 4 Xiao Mei NaN 5 Zhang SAN 34.0 6 Li Si NaN 7 Wang Wu 98.5Copy the code

After the replacement:

Import pandas as pd import numpy as np df = pd.read_excel(r 'c :\Users\admin\Desktop\ test.xlsx ', sheet_name='Sheet2') print(df.replace(np.NaN, 0))Copy the code

result:

Name age 0 Xiaoming 23.0 1 Xiaogang 0.0 2 Xiaohong 876.0 3 Li Hua 65.0 4 Xiaomei 0.0 5 Zhang SAN 34.0 6 Li Si 0.0 7 Wang Wu 98.5Copy the code

Sometimes it is necessary to replace a full table, and the replace method is equivalent to the fillna method used to fill in the previous missing values (you can see this method in the article on manipulation of missing values in Pandas | Python topic month).

Note: Np. NaN is a Python representation for missing values

2. Many-to-one substitution

Again, replace 876, -6, and 98.5 in the age column

Df = pd.read_excel(r'C:\Users\admin\Desktop\ test.xlsx ') print(df. Replace ([876, -6, 98.5], 25))Copy the code

result:

Name age 0 Xiao Ming 23.0 1 Xiao Gang 98.0 2 Xiao Hong 25.0 3 Li Hua 65.0 4 Xiao Mei 25.0 5 Zhang SAN 34.0 6 Li Si 13.0 7 Wang Wu 25.0Copy the code

Replace [876, -6, 98.5] with 25

3. Many-to-many substitution

Many-to-many substitutions are just multiple one-to-one substitutions in a region. For example, if 876 is divisible by 10, negative 6 becomes negative, and 98.5 is left as an integer

The Boolean selection of the table selects rows, and then selects columns by normal indexes

Import pandas as pd df = pd.read_excel(r 'c :\Users\admin\Desktop\ test.xlsx ') print(df.replace({876:87, -6:6, 98.5:98}))Copy the code

result:

Name age 0 Xiao Ming 23.0 1 Xiao Gang 98.0 2 Xiao Hong 87.0 3 Li Hua 65.0 4 Xiao Mei 6.0 5 Zhang SAN 34.0 6 Li Si 13.0 7 Wang Wu 98.0Copy the code