Pandas’ 10 indexes
Public account: You and the cabin by: Peter Editor: Peter
Hello, I’m Peter
There are 10 indexes you must Learn about Pandas.
Indexes are actually quite common in our daily life, like:
- A book has its own table of contents and specific chapters. When we want to find a certain knowledge point, we can turn to the corresponding chapter.
- For example, books in the library are classified into literature, history, technology, fiction, etc., plus the number of books, we can quickly find the books we want.
- An a la carte menu for eating out, ranging from staples, drinks/soups, cold dishes, etc., to specific dish names, etc
Each of the different usages above can be considered a specific indexing application.
Therefore, indexes created based on actual requirements are of great guiding significance to our business work. Creating an appropriate index in Pandas will facilitate our data processing.
The study’s website address: pandas.pydata.org/docs/refere…
Here are 10 common indexes in Pandas and how to create them.
pd.Index
Index is a common Index function used in Pandas to build various types of indexes.
pandas.Index(
data=None.One-dimensional arrays or array-like data
dtype=None.# NumPy data type (default: object)
copy=False.Whether to generate a copy
name=None.# index name
tupleize_cols=True.# If True, try to create MultiIndex if possible
**kwargs
)
Copy the code
Import the two required libraries:
import pandas as pd
import numpy as np
Copy the code
The default data type is INT64
In [2]:
Create pd.index ([1,2,3,4])Copy the code
Out[2]:
Int64Index([1, 2, 3, 4], dtype='int64')
Copy the code
At creation time, you can also specify the data type directly:
In [3]:
Index([1,2,3,4], dtype="float64")Copy the code
Out[3]:
Float64Index([1.0, 2.0, 3.0, 4.0], dType ='float64')Copy the code
Specify name and data type dtype at creation time:
In [4]:
Pd. Index([1,2,3,4], dtype="float64", name="Peter")Copy the code
Out[4]:
Float64Index([1.0, 2.0, 3.0, 4.0], dType ='float64', name='Peter')Copy the code
In [5]:
Pd. Index(list("ABCD"))Copy the code
Out[5]:
Index(['A', 'B', 'C', 'D'], dtype='object')
Copy the code
Use tuples to create:
In [6]:
# create pd.Index(("a","b","c","d"))Copy the code
Out[6]:
Index(['a', 'b', 'c', 'd'], dtype='object')
Copy the code
Use collections to create. The set itself is unordered, so the final result is not necessarily in the given order of elements:
In [7]:
Pd. Index({"x","y","z"})Copy the code
Out[7]:
Index(['z', 'x', 'y'], dtype='object')
Copy the code
pd.RangeIndex
Generates an index within an interval, mainly based on the Python range function, with the syntax:
pandas.RangeIndex(
start=None.The default value is 0
stop=None.# end value
step=None.Step size, default is 1
dtype=None.# type
copy=False.Whether to generate a copy
name=None) # the name
Copy the code
Here are several examples:
In [8]:
Pd.rangeindex (8) # default start is 0, step is 1Copy the code
The default value is 0, the end value is 8 (not included), and the step is 1:
Out[8]:
RangeIndex(start=0, stop=8, step=1)
Copy the code
In [9]:
Pd.rangeindex (0,8) #Copy the code
Out[9]:
RangeIndex(start=0, stop=8, step=1)
Copy the code
Change the step size to 2:
In [10]:
Pd. RangeIndex (0,8,2)Copy the code
Out[10]:
RangeIndex(start=0, stop=8, step=2)
Copy the code
In [11]:
The list (pd) RangeIndex,8,2 (0))Copy the code
Display the result as a list without the stop value 8:
Out[11]:
[0, 2, 4, 6]
Copy the code
Change the step size to -1 in the following example:
In [12]:
Pd. RangeIndex (8, 0, 1)Copy the code
Out[12]:
RangeIndex(start=8, stop=0, step=-1)
Copy the code
In [13]:
The list (pd) RangeIndex (8, 0, 1))Copy the code
Out[13]:
[8, 7, 6, 5, 4, 3, 2, 1Copy the code
pd.Int64Index
Specifies that the data type is an int64 integer
pandas.Int64Index(
data=None.Generate index data
dtype=None.The default index type is INT64
copy=False.Whether to generate a copy
name=None) # use name
Copy the code
In [14]:
Pd. Int64Index ([1, 2, 3, 4])Copy the code
Out[14]:
Int64Index([1, 2, 3, 4], dtype='int64')
Copy the code
In [15]:
Pd.int64index ([1,2.0,3,4]Copy the code
Out[15]:
Int64Index([1, 2, 3, 4], dtype='int64')
Copy the code
In [16]:
Pd. Int64Index ([1, 2, 3, 4], name = "Peter")Copy the code
Out[16]:
Int64Index([1, 2, 3, 4], dtype='int64', name='Peter')
Copy the code
An error is reported if the data contains decimals:
In [17]:
# pd.Int64Index([1,2,3,4.4]Copy the code
pd.UInt64Index
The data type is an unsigned UInt64
pandas.UInt64Index(
data=None,
dtype=None,
copy=False,
name=None
)
Copy the code
In [18]:
pd.UInt64Index([1, 2, 3, 4])
Copy the code
Out[18]:
UInt64Index([1, 2, 3, 4], dtype='uint64')
Copy the code
In [19]:
UInt64Index([1, 2, 3, 4],name="Tom") #Copy the code
Out[19]:
UInt64Index([1, 2, 3, 4], dtype='uint64', name='Tom')
Copy the code
In [20]:
Pd. UInt64Index ([1, 2.0, 3, 4], name = "Tom")Copy the code
Out[20]:
UInt64Index([1, 2, 3, 4], dtype='uint64', name='Tom')
Copy the code
UInt64Index([1, 2.4, 3, 4],name="Tom")Copy the code
pd.Float64Index
The data type is Float64, allowing decimals:
pandas.Float64Index(
data=None.# data
dtype=None.# type
copy=False.Whether to generate a copy
name=None # index name
)
Copy the code
In [22]:
pd.Float64Index([1, 2, 3, 4])
Copy the code
Out[22]:
Float64Index([1.0, 2.0, 3.0, 4.0], dType ='float64')Copy the code
In [23]:
Pd. Float64Index ([1.5, 2.4, 3.7, 4.9])Copy the code
Out[23]:
Float64Index([1.5, 2.4, 3.7, 4.9], dType ='float64')Copy the code
In [24]:
Pd. Float64Index ([1.5, 2.4, 3.7, 4.9], name = "Peter")Copy the code
Out[24]:
Float64Index([1.5, 2.4, 3.7, 4.9], dType ='float64', name=' Peter ')Copy the code
Note: in Pandas1.4.0, all three functions are unified as pd.NumericIndex methods.
pd.IntervalIndex
pd.IntervalIndex(
data, # Data to be indexed (one dimension)
closed=None.{' left ', 'right', 'both', 'neither'}, default 'right'
dtype=None.# data type
copy=False.# create a copy
name=None.The name of the index
verify_integrity=True # Determine if it matches
)
Copy the code
A new IntervalIndex is usually constructed using the interval_range() function.
In [24]:
pd.interval_range(start=0, end=6)
Copy the code
Out[24]:
IntervalIndex ([[0, 1], (1, 2], (2, 3], (3, 4), (4, 5), (5, 6]], closed = 'right', # by default is off on the right dtype = 'interval [int64]')Copy the code
In [25]:
Pd. interval_range(start=0, end=6, closed="neither"Copy the code
Out[25]:
IntervalIndex([(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6)],
closed='neither',
dtype='interval[int64]')
Copy the code
In [26]:
Pd.interval_range (start=0, end=6, closed="both"Copy the code
Out[26]:
IntervalIndex([[0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6]],
closed='both',
dtype='interval[int64]')
Copy the code
In [27]:
Pd.interval_range (start=0, end=6, closed="left"Copy the code
Out[27]:
IntervalIndex([[0, 1), [1, 2), [2, 3), [3, 4), [4, 5), [5, 6)],
closed='left',
dtype='interval[int64]')
Copy the code
In [28]:
pd.interval_range(start=0, end=6, name="peter")
Copy the code
Out[28]:
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6]],
closed='right',
name='peter',
dtype='interval[int64]')
Copy the code
pd.CategoricalIndex
pandas.CategoricalIndex(
data=None.# data
categories=None.# Classified data
ordered=None.# sort
dtype=None.# data type
copy=False.A copy of the #
name=None) # the name
Copy the code
In the following example, we take a batch of clothing sizes as simulation data:
In [29]:
C1 = # specified data pd. CategoricalIndex ([" S ", "M", "L", "XS", "M", "L", "S", "M", "L", "XL"]) c1Copy the code
Out[29]:
CategoricalIndex(
# data
['S'.'M'.'L'.'XS'.'M'.'L'.'S'.'M'.'L'.'XL'].# Different elements appear
categories=['L'.'M'.'S'.'XL'.'XS'].# sort by default
ordered=False.# data type
dtype='category'
)
Copy the code
In [30]:
C2 = pd. CategoricalIndex ([" S ", "M", "L", "XS", "M", "L", "S", "M", "L", "XL"], # specified data classification categories = [" XS ", "S", "M", "L", "XL"]) c2Copy the code
Out[30]:
CategoricalIndex(
['S', 'M', 'L', 'XS', 'M', 'L', 'S', 'M', 'L', 'XL'],
categories=['XS', 'S', 'M', 'L', 'XL'],
ordered=False,
dtype='category'
)
Copy the code
In [31]:
c3 = pd.CategoricalIndex(
# data
["S"."M"."L"."XS"."M"."L"."S"."M"."L"."XL"].# category name
categories=["XS"."S"."M"."L"."XL"].# select sort
ordered=True
)
c3
Copy the code
Out[31]:
CategoricalIndex(
['S'.'M'.'L'.'XS'.'M'.'L'.'S'.'M'.'L'.'XL'],
categories=['XS'.'S'.'M'.'L'.'XL'],
ordered=True.# already sorted
dtype='category')
Copy the code
In [32]:
C4 = pd. CategoricalIndex (# to sort the data [" S ", "M", "L", "XS", "M", "L", "S", "M", "L", "XL"]. Categories =["XS","S","M","L","XL"], # ordered=True, # ordered name="category") c4Copy the code
Out[32]:
CategoricalIndex(
['S', 'M', 'L', 'XS', 'M', 'L', 'S', 'M', 'L', 'XL'],
categories=['XS', 'S', 'M', 'L', 'XL'],
ordered=True,
name='category',
dtype='category'
)
Copy the code
An index object can also be instantiated from the Categorical() method:
In [33]:
c5 = pd.Categorical(["a", "b", "c", "c", "b", "c", "a"])
pd.CategoricalIndex(c5)
Copy the code
Out[33]:
CategoricalIndex( ['a', 'b', 'c', 'c', 'b', 'c', 'a'], categories=['a', 'b', 'c'], ordered=False, Dtype ='category')Copy the code
In [34]:
Pd.CategoricalIndex(c5, ordered=True) #Copy the code
Out[34]:
CategoricalIndex ([' a ', 'b', 'c', 'c', 'b', 'c', 'a'], categories = [' a ', 'b', 'c'], ordered = True, # sort dtype = 'category')Copy the code
pd.DatetimeIndex
Date_range (date_range, date_range, date_range, date_range, date_range)
pd.DatetimeIndex(
data=None.# data
freq=NoDefault.no_default, # frequency
tz=None.# time zone
normalize=False.# normalize
closed=None.# Whether the interval is closed
# 'infer', bool-ndarray, 'NaT', default 'raise'
ambiguous='raise',
dayfirst=False.The first day #
yearfirst=False.# in the first year
dtype=None.# data type
copy=False.A copy of the #
name=None # the name
)
Copy the code
The date_range function uses time and date as the index, as shown in the following example:
In [35]:
Pd.date_range ("2022-01-01",periods=6)Copy the code
Out[35]:
DatetimeIndex( ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06'], Dtype ='datetime64[ns]', freq='D' #Copy the code
In [36]:
Date_range ("2022-01-01", periods=6, freq="D") d1Copy the code
Out[36]:
DatetimeIndex(
['2022-01-01', '2022-01-02',
'2022-01-03', '2022-01-04',
'2022-01-05', '2022-01-06'],
dtype='datetime64[ns]',
freq='D')
Copy the code
In [37]:
Date_range ("2022-01-01",periods=6, freq="H")Copy the code
Out[37]:
DatetimeIndex(
['2022-01-01 00:00:00', '2022-01-01 01:00:00',
'2022-01-01 02:00:00', '2022-01-01 03:00:00',
'2022-01-01 04:00:00', '2022-01-01 05:00:00'],
dtype='datetime64[ns]',
freq='H')
Copy the code
In [38]:
Date_range ("2022-01-01",periods=6, freq="3M")Copy the code
Out[38]:
DatetimeIndex(
['2022-01-31', '2022-04-30',
'2022-07-31','2022-10-31',
'2023-01-31', '2023-04-30'],
dtype='datetime64[ns]',
freq='3M')
Copy the code
In [39]:
Date_range ("2022-01-01",periods=6, freq="Q")Copy the code
The results are shown in the frequency of one quarter to three months:
Out[39]:
DatetimeIndex(
['2022-03-31', '2022-06-30',
'2022-09-30','2022-12-31',
'2023-03-31', '2023-06-30'],
dtype='datetime64[ns]',
freq='Q-DEC')
Copy the code
In [40]:
Periods =6, tz="Asia/Calcutta")Copy the code
Out[40]:
DatetimeIndex(
['2022-01-01 00:00:00+05:30', '2022-01-02 00:00:00+05:30',
'2022-01-03 00:00:00+05:30', '2022-01-04 00:00:00+05:30',
'2022-01-05 00:00:00+05:30', '2022-01-06 00:00:00+05:30'],
dtype='datetime64[ns, Asia/Calcutta]', freq='D')
Copy the code
pd.PeriodIndex
PeriodIndex is an index dedicated to periodic data, which is convenient for processing data with a certain period. Its usage is as follows:
pd.PeriodIndex(
data=None.# data
ordinal=None.# ordinal
freq=None.# frequency
dtype=None.# data type
copy=False.A copy of the #
name=None.# the name
**fields
)
Copy the code
Mode 1 of generating the Pd. PeriodIndex object: Specify the start time and period frequency
In [41]:
pd.period_range('2022-01-01 09:00', periods=5, freq='H')
Copy the code
Out[41]:
PeriodIndex(
['2022-01-01 09:00', '2022-01-01 10:00',
'2022-01-01 11:00','2022-01-01 12:00', '2022-01-01 13:00'],
dtype='period[H]', freq='H')
Copy the code
In [42]:
pd.period_range('2022-01-01 09:00', periods=6, freq='2D')
Copy the code
Out[42]:
PeriodIndex(
['2022-01-01', '2022-01-03',
'2022-01-05', '2022-01-07',
'2022-01-09', '2022-01-11'],
dtype='period[2D]',
freq='2D')
Copy the code
In [43]:
pd.period_range('2022-01', periods=5, freq='M')
Copy the code
Out[43]:
PeriodIndex(
['2022-01', '2022-02',
'2022-03', '2022-04', '2022-05'],
dtype='period[M]', freq='M')
Copy the code
In [44]:
p1 = pd.DataFrame( {"name":["xiaoming","xiaohong","Peter","Mike","Jimmy"]}, Period_range ('2022-01-01 09:00', periods=5, freq='3H')) p1Copy the code
Method 2 of generating the Pd. PeriodIndex object: Use the pd.PeriodIndex method directly
In [45]:
pd.PeriodIndex(
['2022-01-01', '2022-01-02',
'2022-01-03', '2022-01-04'],
freq = '2H')
Copy the code
Out[45]:
PeriodIndex(
['2022-01-01 00:00', '2022-01-02 00:00',
'2022-01-03 00:00','2022-01-04 00:00'],
dtype='period[2H]', freq='2H')
Copy the code
In [46]:
pd.PeriodIndex(
['2022-01', '2022-02',
'2022-03', '2022-04'],
freq = 'M')
Copy the code
Out[46]:
PeriodIndex(
['2022-01', '2022-02',
'2022-03', '2022-04'],
dtype='period[M]',
freq='M')
Copy the code
In [47]:
pd.PeriodIndex(['2022-01', '2022-07'], freq = 'Q')
Copy the code
Out[47]:
PeriodIndex(
['2022Q1', '2022Q3'],
dtype='period[Q-DEC]',
freq='Q-DEC')
Copy the code
Method 3 of generating pd.PeriodIndex object: Use the date_range function to become a DatetimeIndex object
In [48]:
data = pd.date_range("2022-01-01",periods=6)
data
Copy the code
Out[48]:
DatetimeIndex(
['2022-01-01', '2022-01-02',
'2022-01-03', '2022-01-04',
'2022-01-05', '2022-01-06'],
dtype='datetime64[ns]',
freq='D')
Copy the code
In [49]:
pd.PeriodIndex(data=data)
Copy the code
Out[49]:
PeriodIndex(
['2022-01-01', '2022-01-02',
'2022-01-03', '2022-01-04',
'2022-01-05', '2022-01-06'],
dtype='period[D]', freq='D')
Copy the code
In [50]:
DataFrame(np.random. Randn (400, 1), columns=['number'], columns= pd.period_range('2021-01-01 8:00', periods=400, freq='D')) p2Copy the code
pd.TimedeltaIndex
pd.TimedeltaIndex(
data=None.# data
unit=None.# Minimum unit
freq=NoDefault.no_default, # frequency
closed=None.# specify the location to close
dtype=dtype('<m8[ns]'), # data type
copy=False.A copy of the #
name=None # the name
)
Copy the code
Creation method 1: Specify data and minimum unit
In [51]:
pd.TimedeltaIndex([12, 24, 36, 48], unit='s')
Copy the code
Out[51]:
TimedeltaIndex(
['0 days 00:00:12', '0 days 00:00:24',
'0 days 00:00:36','0 days 00:00:48'],
dtype='timedelta64[ns]',
freq=None)
Copy the code
In [52]:
Pd.TimedeltaIndex([1, 2, 3, 4], unit='h') #Copy the code
Out[52]:
TimedeltaIndex(
['0 days 01:00:00', '0 days 02:00:00',
'0 days 03:00:00','0 days 04:00:00'],
dtype='timedelta64[ns]',
freq=None)
Copy the code
In [53]:
pd.TimedeltaIndex([12, 24, 36, 48], unit='h')
Copy the code
Out[53]:
TimedeltaIndex( ['0 days 12:00:00', '1 days 00:00:00', '1 days 12:00:00','2 days 00:00:00'], dtype='timedelta64[ns]', # data type freq=None)Copy the code
In [54]:
pd.TimedeltaIndex([12, 24, 36, 48], unit='D')
Copy the code
Out[54]:
TimedeltaIndex(
['12 days', '24 days', '36 days', '48 days'],
dtype='timedelta64[ns]', freq=None)
Copy the code
Creation method 2: Use the timedelta_range function to generate data indirectly
In [55]:
data1 = pd.timedelta_range(start='1 day', periods=4)
data1
Copy the code
Out[55]:
TimedeltaIndex(['1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq='D')
Copy the code
In [56]:
pt1 = pd.TimedeltaIndex(data1)
pt1
Copy the code
Out[56]:
TimedeltaIndex(
['1 days', '2 days', '3 days', '4 days'],
dtype='timedelta64[ns]', freq='D')
Copy the code
In [57]:
data2 = pd.timedelta_range(start='1 day', end='3 days', freq='6H')
data2
Copy the code
Out[57]:
TimedeltaIndex(
['1 days 00:00:00', '1 days 06:00:00', '1 days 12:00:00',
'1 days 18:00:00', '2 days 00:00:00', '2 days 06:00:00',
'2 days 12:00:00', '2 days 18:00:00', '3 days 00:00:00'],
dtype='timedelta64[ns]', freq='6H')
Copy the code
In [58]:
pt2 = pd.TimedeltaIndex(data2)
pt2
Copy the code
Out[58]: