Public account: You and the cabin by: Peter Editor: Peter

Pandas series _DataFrame filter

This article describes how to filter and view data in PANDAS. Because pandas has a variety of tricks for data filtering, this article covers the basics of comparison.

Further reading

For an introduction to PANDAS and how to create Series and DataFrame data in PANDAS, read:

1. Create Series type data

2. 10 ways to create DataFrame data

3. It all starts with the explosive function

Simulated data

The examples in this article are based on simulated data, with some missing values introduced during the creation of the data, generated by the Numpy library:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "name": ['Ming'.'wang'.'zhang fei'.'GuanYu'.Sun Xiaoxiao.'Wang Jianguo'.'pei liu']."sex": ['male'.'woman'.'woman'.'male'.'woman'.'male'.'woman']."age": [20.23.18.21.25.21.24]."score":[np.nan,600.550,np.nan,610.580.634].Two pieces of data are missing
    "address": [Nanshan District, Shenzhen City, Guangdong Province,
               np.nan,  # Data missing
               "Yuhua District, Changsha City, Hunan Province".Dongcheng District, Beijing."Baiyun District, Guangzhou City, Guangdong Province"."Jiangxia District, Wuhan City, Hubei Province"."Longhua District, Shenzhen City, Guangdong Province"]
})

df
Copy the code

Let’s look at the data types of each field: three character types, one int64, and one float64

Mind mapping

Let’s start with the various methods for filtering data:

Check the head and tail data

Check the head and tail data using the head and tail methods:

head

This method defaults to the first five lines

You can specify how many rows to look at:

tail

Tail is used similarly:

  • The default tail is 5 lines
  • Specifies the number of rows to view

Stochastic screening

The sample method is used. The default is to view a row of data, and you can specify how many rows to view:

Numerical data filtering

A single condition

1. Screening of numerical data is generally carried out according to size comparison:

Multiple conditions

2. When we have multiple comparison conditions, we need to pay attention to:

  • Instead of using and, use vertical lines|
  • Use parentheses for each condition

Here’s the correct way to write it:

Using numerical functions

Commonly used numerical comparison functions are as follows:

df.eq()    # == ==
df.ne()    # does not equal! =
df.le()    < = >=
df.lt()    # < <
df.ge()    # >= >=
df.gt()    # more than >
Copy the code

1. Filter using a single numeric function

2. Use multiple numerical functions to filter;

Character data filtering

Character type data is filtered through python and PANDAS;

  • Contains: STR. The contains
  • Start: STR. Startswith
  • End: STR. Endswith

The three examples below illustrate the use of the above three functions:

The fields used in the above example do not have null values themselves. What if there are null values in the fields? For example, we want to select a student whose address contains “Shenzhen” :

Solution 1: Add parameters

Solution 2: Judge by comparing Boolean values

Specifies data value filtering

Filter data by specifying a specific value for a field:

Numeric and character are used together

A combination of numeric size comparison conditions and character dependent conditions:

  • And: &
  • Or: |

Indexed access

To fetch a number directly from an index, this is rarely used:

Slice access

The section fetch in pandas is the same as in Python:

  • The left index counts from 0, and the right index counts from -1
  • Slicing rules:start:stop:step, respectively represent the start position, stop position and step size step (can be positive or negative)

Elements that do not contain closing index positions: header and no tail, remember the important rule of index slicing!!

Take the number using a single value of the slice:

A variety of cases using slice counting:

Let’s take a look at the slice number in this case:

If step size is not 1 and index is negative:

Missing value filtering

The case missing values used in this paper are:

Viewing missing Values

df.isnull()
Copy the code

View missing field values

df25 = df.isnull().any(a)Is there a null value in the # column
df25
Copy the code

Locks the row where the missing value exists

df26 = df[df.isnull().values==True]
df26
Copy the code

The column attribute takes the number

Specify the attribute name

In the first case, we specify the name of the column attribute directly, in which case we pull out the Series data

The second case is to retrieve DataFram E type data:

Specifies the type of field attributes

The data field type of this case is:

Select * from data containing type object;

If you want to retrieve data containing multiple types:

Select * from ‘object’;

conclusion

There are a wide variety of ways to fetch numbers in pandas. There are many ways to fetch the numbers we want. In this article, we will introduce some basic techniques for retrieving numbers in pandas, such as header and tail data, conditional filtering, and slice filtering.