What is a PivotTable
A PivotTable is made up of three parts: <1> Index <2> Columns <3> Values A PivotTable is a table format that allows data to be dynamically arranged and summarized by category. Most of you have used pivot_table in Excel and appreciate its power. It is called Pivot_table in PANDAS. PivotTable: Groups the data and calculates the data within the group. Groups can have 2 dimensions. While the pivot_table() function is useful, I often find myself needing to remember its syntax in order to output what I need. Pivot_table () syntax Pivot_table (data_df, index=None, #) pivot_table(data_df, index=None, #) Pivot_table (data_df, index=None, # Aggfunc =np.mean, fill_value=None, fill_value=None, The margins on the columns property are set to "True", and the margins on the columns property are set to "True". Margins_name ='All') # margins_name default is "All", the name can be changed as desired from this function aggfunc defaults to counting the margins, the margins are False, the margins are not aggregated by default. Margins_name defaults to 'ALL', of course we can change the nameCopy the code
Reset_index () and reset_index (drop = True)
Reset_index () # reset_index(drop=True) # reset_index(drop=True) # reset_index(drop=True) Delete the original indexCopy the code
The margins argument to pivot_table()
Margins =1 margins_name=" margins" <2> <3> When both index and columns are used, there are both columns and columns in the column column column. This only determines whether there are columns and columns in the column column. But we can't determine how many totals there are, and how many totals there are depends on the value of value. There is 1 value, there is 1 total, there is 2 values, there is 2 total, there is always 1 total.Copy the code
Pd. Concat () function
Pd. concat([df1,df2],sort=True) # If sort=True, select * from df1, select * from df1, select * from df1, select * from df1, select * from df1 Setting sort=False is recommended, as this reduces the computation overheadCopy the code
Pandas is a data type that consists of a fixed and finite number of variables. It is discrete and generally variable in some degree or order. For example: <1> Good, good, better, poor, poor # This is a change in degree <2> Development, joint tuning, testing, Df ["Status"] = df["Status"].astype("category") Df ["Status"].cat.set_categories(["won","pending"," alideclined "],inplace=True) Since Status is a somewhat changed field, we can't sort the field as before (sort_values()), so we need to change the field to category and then use cat.set_categories() to determine the order. So if we use Status as index in the PivotTable, the results will appear in our order. Working with the data Since we make pivottables for the data, I think the easiest way is to take it one step at a time. Add projects and review each step to verify that you are getting the desired results step by step. Don't be afraid to deal with order and variables in order to see what looks best for your needs. The simplest PivotTable must have an index, index, and a data frame, VALUES. Index can have only one field, or values can have multiple fields. Similarly, in this case, we will use the Name column as our index. In the following three diagrams, only index is specified, not values. In this case, the pivot_table() function defaults to the columns other than index as values and calculates the pivot_table meanCopy the code
# In the figure below, there is a parameter called fill_value that has been rarely used before. In pivottables, there are sometimes empty values, we can fill these values, but before filling, we need to know the data, which value is appropriate to fill.Copy the code
The one argument that margins is the one that margins=True, and the one that margins= 'XXX', is to create a nameCopy the code
The values parameter can have multiple columns. If we want to do different processing for different columns, we need to pass in a dictionary to do different processing for different columns via KV.Copy the code
conclusion
Finally, a diagram summarizes the pd.pivot_table() functionCopy the code
pivot_table vs. groupby
Pd. pivot_table(df,index= field 1,values= field 2,aggfunc=[function],fill_value=0) <2> Field 2. Agg (function).fillna(0) The above two functions are completely equivalent. The groupby function with columns and margin functions is more flexible than groupby.Copy the code