Data analysis - Pandas Basic operations

This is the 25th day of my participation in the August Genwen Challenge.More challenges in August

The operations that pandas performs on the DataFrame are described here

Add a header to the table

If the table has no header, be sure to add header = None, otherwise the first line will be used as the header

Rename when reading the contents of the file

Df = pd read_excel (‘ file path ‘names = name ([‘ 1’, ‘column 2’]))
Name after read

df.columns = name

Resets the index to overwrite the original data

df.reset_index(drop = True, inplace = True)

To delete a column

Del df[‘ column name ‘]
Df.drop (‘ column ‘, axis = 1) – does not change the original data
Df.drop (‘ column ‘, axis = 1, inplace = True) – Overwrite the original data

Get rid of all empty rows

df.dropna(how=’all’, inplace = True)

Handle outliers – delete/mean/high frequency values

Average – df [‘ column ‘] fillna (df [‘ column ‘] scheme (), inplace = True)
High frequency value -df [‘ column name ‘].value_counts().index[0] -Highest frequency value

Convert a row format

Find all data to convert – add a judgment column to the table

df[‘rows_with_lbs’] = df[‘weight’].str.contains(‘lbs’).fillna(False) df[‘weight’].str.contains(‘lbs’) – Find the weight column containing the LBS row fillna(False) – assign NaN to False
For I, lbs_row in df[rows_with_lbs].iterrows()
Weight = int(float(lbs_row[‘weight’][:-3])/2.2)
-df.at (I, ‘weight’) df.at(I, ‘weight’) = ‘{} KGS ‘. Format (weight)

Go unless ASCII characters

df[‘first_name’].replace({ r'[\x00 – \x7F]+’ : }, regex = True, inplace = True) replace(old value, new value, regex = True: regular expression support) r – Remove escape characters, used for regular expressions [\x00-\x7F] is equivalent to [x00-x7f]: ASCII characters ranging from 0 to 127

Uniqueness: Split a column with multiple parameters – split(expand = True) – expand refers to split the contents as one column

df[[‘first_name’, ‘last_name’]] = df[‘name’].str.split(expand = True)

Delete duplicate data rows – df.duplicates()

df.drop_duplicates([‘first_name’, ‘last_name’], inplace = True)

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Data analysis – Pandas Basic operations

Add a header to the table

Resets the index to overwrite the original data

To delete a column

Get rid of all empty rows

Handle outliers – delete/mean/high frequency values

Convert a row format

Go unless ASCII characters

Uniqueness: Split a column with multiple parameters – split(expand = True) – expand refers to split the contents as one column

Delete duplicate data rows – df.duplicates()

Data analysis – Pandas Basic operations

Add a header to the table

Resets the index to overwrite the original data

To delete a column

Get rid of all empty rows

Handle outliers – delete/mean/high frequency values

Convert a row format

Go unless ASCII characters

Uniqueness: Split a column with multiple parameters – split(expand = True) – expand refers to split the contents as one column

Delete duplicate data rows – df.duplicates()

Related Posts

Recommendation system related terms know how much

Facebook unveils an ultra-hardcore ‘Brain of the Max’

Lyft’s marketing automation platform Symphony