This is the 25th day of my participation in the August Genwen Challenge.More challenges in August
The operations that pandas performs on the DataFrame are described here
Add a header to the table
If the table has no header, be sure to add header = None, otherwise the first line will be used as the header
-
Rename when reading the contents of the file
Df = pd read_excel (‘ file path ‘names = name ([‘ 1’, ‘column 2’]))
-
Name after read
df.columns = name
Resets the index to overwrite the original data
df.reset_index(drop = True, inplace = True)
To delete a column
- Del df[‘ column name ‘]
- Df.drop (‘ column ‘, axis = 1) – does not change the original data
- Df.drop (‘ column ‘, axis = 1, inplace = True) – Overwrite the original data
Get rid of all empty rows
df.dropna(how=’all’, inplace = True)
Handle outliers – delete/mean/high frequency values
- Average – df [‘ column ‘] fillna (df [‘ column ‘] scheme (), inplace = True)
- High frequency value -df [‘ column name ‘].value_counts().index[0] -Highest frequency value
Convert a row format
-
Find all data to convert – add a judgment column to the table
df[‘rows_with_lbs’] = df[‘weight’].str.contains(‘lbs’).fillna(False) df[‘weight’].str.contains(‘lbs’) – Find the weight column containing the LBS row fillna(False) – assign NaN to False
-
For I, lbs_row in df[rows_with_lbs].iterrows()
-
Weight = int(float(lbs_row[‘weight’][:-3])/2.2)
-
-df.at (I, ‘weight’) df.at(I, ‘weight’) = ‘{} KGS ‘. Format (weight)
Go unless ASCII characters
df[‘first_name’].replace({ r'[\x00 – \x7F]+’ : }, regex = True, inplace = True) replace(old value, new value, regex = True: regular expression support) r – Remove escape characters, used for regular expressions [\x00-\x7F] is equivalent to [x00-x7f]: ASCII characters ranging from 0 to 127
Uniqueness: Split a column with multiple parameters – split(expand = True) – expand refers to split the contents as one column
df[[‘first_name’, ‘last_name’]] = df[‘name’].str.split(expand = True)
Delete duplicate data rows – df.duplicates()
df.drop_duplicates([‘first_name’, ‘last_name’], inplace = True)