Hello, I’m Peter

Here I wish everyone a happy Qixi ❤️

The three treasure functions in Pandas

When manipulating data in Pandas, it is often necessary to perform the same operation on a row or column of data, or even on all elements.

Map, Apply, and ApplyMap in Pandas address most of these data processing needs without having to redo them. This article combines concrete examples to show you how to use the three treasure trove functions.

Pandas serialized articles

Seventeen articles have been serialized, of which 1-16 are part of the first edition of Pandas Data Analysis in Depth, starting with pivottables and crossheets in Article 17.

Simulated data

The following is the data of the main simulation, the personal information of several students (the data is for study only)

import numpy as np
import pandas as pd

data = pd.DataFrame({"name": ["Xiao Ming"."Little red"."Zhang"."Chou"."Note"."Wang"]."sex": ["Male"."Female"."Female"."Male"."Male"."Male"]."birthday": ["2003-07-07"."1993-08-09"."1999-03-05"."1995-08-19"."2002-11-18"."1996-07-01"]."address": ["Nanshan District, Shenzhen"."Guangzhou Yuexiu District"."Hangzhou, Zhejiang"."Shanghai"."Beijing Haidian"."Wuchang, Wuhan city, Hubei Province"]."age": [18.28.22.26.19.25]."height": [189.178.167.172.182.185]."weight": [89.72.62.68.79.81]
                    })
data
Copy the code

Let’s look at the data types: the first four are characters, followed by three numeric types

map

Let’s say we now have a requirement to replace the male column with a 1 and the female column with a 0 because sometimes we have to use numerical data.

How do you do that?

Method 1: loop

If you don’t want to get too complicated: figure out the gender of a man or a woman through a loop, and then assign it directly.

Before each operation, we make a copy of the simulated data without destroying the original data

Write a loop to assign:

Method 2: Map implementation

The way to loop is easier to understand, easier to write; But when we have too much data, the loop is too slow. How is this implemented using Map?

Or Sir Into a copy:

1. Mapping through the dictionary

2. Write a function to map

Map uses summary: if you pass a dictionary or function to the map method, it will pass the data as arguments to the dictionary or function one by one, and then get the value of the map

apply

The apply method is similar to map in its use, except that apply is more comprehensive and powerful, and can pass in more complex functions.

parameter

DataFrame.apply(
  func, # Function to be executed
  axis=0.The default is 0-index,1-column
  raw=False.Numpy = ndarray; default: false
  result_type=None.# expand ', 'reduce', 'broadcast', None # When Axis =1
  args=(), # Two optional arguments
  **kwargs)
Copy the code

Make a copy of data3 before the operation is performed:

Pass in different functions

In the apply method we can pass in various functions:

  • Custom function
  • Anonymous Functions in Python
  • Python built-in Functions
  • Functions in Pandas

1. Custom functions

We pass in the custom function: the function above that changes the gender representation

So let’s say that all of these ages are real years, and we want to look at their imaginary years, so we’re adding 1 to age plus 1, what do we do?

2. Anonymous Python functions: lambda

3, Python built-in functions

We pass in python’s len function, which evaluates the length of each string:

4. Pandas has built-in functions

When we simulated the data, the birthday field was a character type, and now we use pandas’ built-in function to convert it to a time related data type:

Before the

After the transformation

Specify the shaft

The axis argument can be specified to indicate which axis to operate on. The default is Axis =0, in column direction

To illustrate this parameter, we are simulating a simple data:

From the default, axis=0, axis=1:

Look at this in Excel:

result_type

The column name of the newly generated DataFrame can only be operated on axis=1. There are three values:

  • expand
  • broadcast
  • reduce

Result_type =”expand”

Result_type =”broadcast”;

The column names remain unchanged

Result_type =”reduce”

The result is a Series of data

applymap

The use of ApplyMap is limited by the fact that it performs the same operation for each piece of data in the DataFrame.

DataFrame.applymap(func, na_action=None, **kwargs)
Copy the code

To illustrate, simulate a simple piece of data:

As you can see, the data above is all of type FLOAT64.

Each of these numbers is increased by 1

Three decimal places are reserved for each data

Format output data: Retain three decimal places for each data

Changing the data type

The data type df is float64, now convert them to STR:

Missing value processing

If there is a missing value in the data, use the na_action parameter to handle it:

conclusion

It is very common to perform the same operation on the rows and columns of data. This article combines various examples to explain:

  • Map: Satisfies the same operation for most Series data
  • Apply: Map functions can be implemented, more flexible, can pass a variety of complex or built-in functions for data processing
  • Applymap: Performs the same operation on the data in the DataFrame