Public account: Youerhuts author: Peter editor: Peter
My name is Peter, and I’m here to illustrate two important functions in Pandas: stack and unstack.
Stack and unstack are two ways to rearrange the shafts of pandas, and they are mutually inverse:
- Stack: rotates data columns into row index
- Unstack: Rotates the data row index into columns
- Both default to the innermost layer
Pandas serialized articles
This is the 16th article to be updated by Pandas.
Here are some detailed examples of how to use both
stack
The main function of the stack function is to turn the original column into the innermost row index, after the transformation is multi-level index. Official text:
Stack the prescribed level(s) from columns to index.
Use method:
pd.stack(level=-1, dropna=True)
Copy the code
- Level means that the transformation is the innermost level
- Dropna represents the processing of missing values
This is illustrated by a diagram on the website: the column property AB becomes the row index AB
Stack the single-layer DataFrame
import pandas as pd
import numpy as np
Copy the code
Take a look at the default:
We find that the index of DF2 has also become a multilevel index:
One more feature: When we stack a single-layer DataFrame, it becomes a Series:
Stack the multilayer DataFrame
First we generate a multilevel column number type
Simulate a multilevel column of attributes:
Look at the analog data DF3 for more information:
type(df3)
pandas.core.frame.DataFrame
df3.index
Index(['Ming'.'little red'], dtype='object')
df3.columns
MultiIndex([('information'.'sex'),
('information'.'weight') ",Copy the code
Look at the data after the stack:
The comparison
Compare the original data with the generated new data:
1. Index comparison
2. Column attribute comparison
3. Data type comparison
Parameter level
Level controls the stacking of one or more attributes; You can use numeric indexes or name indexes.
Simulate a multilevel column attribute data:
multicol2 = pd.MultiIndex.from_tuples([('weight'.'kg'), # Multilevel column properties
('height'.'m')],
name=["col"."unit"])
data1 = pd.DataFrame([[1.0.2.0], [3.0.4.0]],
index=['cat'.'dog'],
columns=multicol2
)
data1
Copy the code
We can see that data1’s column properties are multilayered:
data1.columns
# the results
MultiIndex([('weight'.'kg'),
('height'.'m')],
names=['col'.'unit'])
Copy the code
We can also stack using the name of the number of columns:
Do the same for another “col” :
You can also operate on more than one at a time, specifying a name or index number:
Parameter dropna
What do we do if there are missing values in the original data? To simulate a piece of data with missing values:
data2 = pd.DataFrame([[None.2.0].# Introduce a missing value
[4.0.6.0]],
index=['cat'.'dog'],
columns=multicol2)
data2
Copy the code
The default value is True, which removes both missing values:
If we change this to False, we will keep data that is also NaN:
unstack
Change the innermost row index to a column: that is, the row index AB becomes the column property AB
Method of use
Unstack is the inverse of stack, turning the innermost row index into a column
unstack(level=- 1, fill_value=None)
Copy the code
- Level: indicates the index level at which the operation is performed. It can be a name
- Fill_value: If missing values are generated when we operate, we can fill them with the specified values
Parameter level
Unstack (1) unstack(1) unstack(1)
We use the previous generated data, and then we operate on DF5.
1. Default operations on the unstack: The default operation is on the innermost layer
2, we change the row index to 0, and we can also use the row name as the parameter value
Parameter fill_value
The purpose of this parameter is to fill the missing value with the specified data when we operate on the unstack.
We use the previous DF6 data box to do this:
Using the default of unstack: produces two null values
Fill in the resulting missing values:
- The default is used
- Use the name
- Use index number
conclusion
Stack and unstack are used to stack and unstack data of the Series or DataFrame type. By default, they are used to stack and unstack data of the Series or DataFrame type. They are used to invert each other.