There are several ways to apply either custom functions to Pandas or third-party functions to Pandas. Which method to use depends on whether the object is a DataFrame or a Series; Is it a row, column, or element?

1. Performance level function application: pipe()

2. Line-level function application: apply()

3. Aggregation API: AGG () and Transform ()

4. Applymap ()

Presentation level function applications

While it is possible to pass DataFrame and Series to functions, it is best to use the pipe() method for chained calls to functions. Compare the following two ways:

The following code is equivalent to the above code:

Pandas recommends the second approach, the chained approach. Pipe is just as easy to call custom functions or third-party support library functions in chained methods as it is in Pandas’ own methods.

In the example above, the f, G, and H functions all take DataFrame as the first argument. What can we do if we want data as a second parameter? In this case, pipe is a tuple (callable, data_keyword). .pipe takes the DataFrame as the parameter specified in the tuple.

The following example uses StatModels to fit the regression. The API first receives a notation and DataFrame is the second parameter, data. To pass the function, pipe receives the keyword pair (sm.ols, ‘data’).

Unix’s Pipe and later dplyr and Magrittr gave rise to the pipe method, which introduced the R operator (%>%) for reading pipe. Pipe’s implementation is very clear, as if it were Python native.

Row and column level function application

The apply() method applies functions along the axis of the DataFrame, such as descriptive statistics methods, which support axis arguments.

The Apply () method also supports calling functions from function name strings.

By default, the type returned by the function called by apply() affects the type of the DataFrame output structure.

  • When the function returns Series, the final output is DataFrame. The output column matches the Series index returned by the function.
  • When the function returns any other type, the output is Series.

Result_type overwrites the default behavior. This parameter can be reduce, broadcast, or expand.

These options determine whether the column phenotype return value is extended to DataFrame.

When we use apply() well, we can learn a lot about the data set. For example, we can extract the date corresponding to the maximum value of each column:

You can also pass additional arguments and keyword arguments to the apply() method. For example, this function is applied in the following example:

This function can be applied in the following ways:

The ability to execute the Series method for each row or case is also very useful:

Apply () has a raw parameter, which defaults to False, that can be used to convert each row or column to Series before applying the function. When this parameter is True, the function passed receives an Ndarray object, which can significantly improve performance if indexing is not required.

The aggregation API

The aggregation API enables you to perform multiple aggregation operations quickly and succinctly. Pandas supports multiple similar apis, such as the GroupBY API, Window Functions API, and resample API. The aggregate function is datafame.aggregate (), also called datafame.agg ().

Here we use a DataFrame similar to the above example:

This is equivalent to apply() when applying a single function, where the aggregate function name can also be represented as a string. The following aggregate function outputs the result Series:

Series A single aggregate operation returns a scalar value:

Multifunction polymerization

You can also pass multiple aggregation functions as lists. Each function is displayed as a line in the output DataFrame, which is the function name of each aggregate function.

Multiple functions output multiple lines:

Returns the same result as Series, indexed by the function name:

When passing a lambda function, print a line named:

When a custom function is applied, the function name is the line name of the output result:

Use dictionaries for aggregation

When you specify which aggregate functions to apply to which columns, you pass a dictionary containing column names and scalars (or lists of scalars) to datafame.agg.

Note, however, that the output order is not fixed; to match the output order with the input order, we can use OrderedDict.

When the output parameter is a list, the output result is a DataFrame, and the calculation results of all aggregate functions are displayed in matrix form, and the output result consists of all unique functions. Columns that have not been aggregated output NaN values:

Multiple data Types (DTypes)

Similar to groupby’s. Agg operation, when a DataFrame contains data types that cannot be aggregated,. Agg calculates only aggregable columns:

Customize the Describe

.agg() can create custom Describe functions that are similar to the built-in Describe functions.

Transfrom API

The transform method returns the same index as the original data, of the same size. Similar to the. Agg API, this API supports multiple operations at the same time, rather than one at a time.

First, create a DataFrame:

Here the entire DataFrame is converted. .transform() supports NumPy functions, string functions, and custom functions.

Here the transform() accepts a single function; Equivalent to uFUNc.

.transform passes a single function to a Series and returns a single Series.

Multiple function Transfrom

Transform () generates a multilevel index DataFrame when calling multiple functions. The first layer is the column name of the original data set; The second layer is the function name of the transform() call.

When multiple functions are applied to Series, the output is DataFrame and the column name is the function name of the transform() call.

Perform the transfrom operation with a dictionary

The function dictionary can perform the specified transform() operation for each column.

When the argument to transform() is a list dictionary, the DataFrame is generated with the multilevel index name of the function called by transform().

Element level function application

Not all functions can be vectorized, that is, they take a NumPy array and return another array or value, the applyMap () of DataFrame and the map() of Series support any Python function that takes a single value and returns a single value.

Series.map() also has the ability to “wire” or “map” values defined by the second Series. This is closely related to the joining/merging function:

The Pandas function is used for Pandas.

If you want to learn Python, but can’t find a path or resources to learn it, welcome to programming at your fingertips.

Learn Python efficiently and interactively online, faster and better!