First, teaching content
Multivariate analysis of variance (ANOVA) is used to study whether a dependent variable is influenced by multiple independent variables (also known as factors). It tests whether there are significant differences between different combinations of the value levels of multiple factors and the mean values of dependent variables. Multivariate analysis of variance can analyze the effect of single factor (main effect), the interaction between factors (interaction effect), covariance analysis, and the interaction between each factor variable and covariable. According to the number of observed variables (i.e., dependent variables), multivariate anOVA can be divided into univariate multivariate anOVA (also called univariate multivariate anOVA) and multivariate multivariate anOVA (i.e., multivariate multivariate anOVA). This paper will focus on unary multi-factor anOVA, and the next paper will elaborate on multivariate multi-factor ANOVA. Multivariate analysis of variance: There is only one dependent variable, and the influence of multiple independent variables on the dependent variable is investigated. For example, when analyzing the effects of different varieties and fertilizer application amount on crop yield, crop yield can be used as observation variable, while variety and fertilizer application amount can be used as control variable. Using the method of multivariate analysis of variance, this paper studied how different varieties and different amount of fertilizer affect crop yield, and further studied which varieties and which level of fertilizer amount is the best combination to improve crop yield. 01 Analysis Principle F test is performed by calculating F statistics. The F statistic is the ratio of the sum of squares between the mean groups to the sum of squares within the mean groups.
Here, the sum of the squares of total influence is denoted as SST, which is divided into two parts, one is the deviation caused by control variables, denoted as SSA(sum of squares of inter-group deviations), the other is SSE(sum of squares of intra-group deviations) caused by random variables. The SST = SSA + SSE. The sum of the squares of the deviations between groups is the sum of the squares of the deviations between the mean of each level and the mean of the population, reflecting the influence of control variables. The sum of the squares of intra-group deviations is the sum of the squares of the deviations between each data and the mean value of the level group, which reflects the magnitude of the data sampling error. It can be seen from F value that if different levels of control variables have a significant impact on the observed variables, then the sum of squares of deviations between groups of the observed variables will be large, and so will F value. On the contrary, if the different levels of the control variables have no significant effect on the observed variables, the sum of squares of deviations within the group will be larger and the F value will be smaller. Meanwhile, SPSS will give corresponding associated probability value SIG according to F distribution table. If sig is less than the significance level (generally, the significance level is set as 0.05, 0.01, or 0.001), it is considered that there are significant differences between the population means at different levels of control variables; otherwise, it is not. Generally, the larger the F value, the smaller the SIG value. 02 SPSS Case Analysis There is a salary table of employees of a company, and I want to see the influence of two control variables, gender and edu, on employees’ “current salary”. Using the multivariate analysis of variance, the influence of gender and edu on current salary should be considered respectively, which is called the primary effect, and the influence of gender*edu on current salary should be considered, which is called the interaction effect. (1) Analysis steps: After importing data into SPSS, choose: analysis — general linear model — univariate
(2) “current salary” was selected as the dependent variable (i.e., the observation variable), and gender and years of education edu were selected as the fixed factor (i.e., the control variable).
(3) Select “model” of “single variable” and select “Full factor” after opening the dialog box, indicating that the model of anOVA includes the main effect of all factors as well as the interaction effect between factors. Then “move on.”
(4) Open the “Draw” dialog box of “single variable”, select “Gender” as the horizontal axis variable, select “edu” as the dividing line variable, click “Add” to display the interaction of these two factor variables, namely, “Gender *edu”. In this case, there are only two levels of “gender”, male and female; And edu comes in many levels. Therefore, if the main effect is significant, it indicates a significant difference between two or more levels of the factor. Later, the comparison of mean differences between multiple levels of the same factor can be continued, a process known as multiple comparison. But in fact, if both the main effect and the interaction effect are significant, we are more concerned about the influence of the dependent variable under the interaction of multiple factors. Therefore, if the interaction effect is significant, a simple effect test is usually required. The so-called simple effect test refers to the variation of the level of one factor in a certain level of another factor. For example, in our case, if there is a significant interaction between gender and edu, we can test the difference between each level of EDU when gender is female, which is called the simple effect of EDU on the level of “female”. And the differences between the levels of EDU on the male sex level, called the simple effect of EDU on the male sex level. The simple effect test, in fact, is to fix one of the independent variables at a particular level and investigate the influence of the other independent variable on the dependent variable. The simple effect test is implemented by using a “MANOVA” command in SPSS. Similarly, when we test the three independent variables, if the interaction between these variables is significant, we need to conduct simple simple effect test, that is, the effect of the level of one factor on the level combination of the other two factors. That is, two factors are fixed at a certain level respectively, and the influence of the third factor on the dependent variable is investigated. This is also done with the “MANOVA” command. Whether the simple effect is significant or not is observed by F value and SIG value. Generally, the SIG value is compared with a value set by us (0.05, 0.01, or 0.001). If the SIG value is larger than the value, the simple effect is not significant. On the contrary, if the SIG value is smaller than this value, it indicates that the simple effect is significant.
(5) Open the “Options” dialog box, move the three control variables on the left to the right, “Show the mean”, and select “Describe statistics” and “Compare the main effect”.
(6) After clicking “OK”, the results will be displayed in SPSS viewer. Among them, the top part of the code is the code of the steps that we do in SPSS. The table below is the result we want to draw conclusions from.
(7) From the following table of “Test of intersubjective effects”, we compare the F value and SIG value of interaction between gender, education edu, and genderedu, and see that the F value of EDU is the largest and the SIG value is the smallest, with sig<0.05. Sig values of gender and Genderedu are both greater than 0.05, indicating that: The main effect of “gender” is not significant, while the main effect of “edu” is significant, and the interaction effect of gender and edu is not significant (when the interaction effect is significant, the simple effect test results can then be carried out), so the simple effect test is not needed. Therefore, the “education level” of the employees has a significant impact on the “current salary”, while the “gender” has no obvious impact on the “current salary”.
(8) The figure below shows the mean distribution of the dependent variable employee salary under the effect of edu and gender. Generally, if the interaction effect is not significant, the factor distribution lines in the figure are parallel lines. If the interaction effect is significant, the factors in the figure are not parallel. In this figure, gender is taken as a horizontal axis variable to observe the influence of edu years on the dependent variable “current salary”.
It can be concluded from the graph that there is little difference between male and female wages when the years of education are 20 years, generally at the graduate level. The length of schooling is 14 years, generally at the level of junior college students, the wage difference between men and women is not obvious. However, the wage difference between men and women is large when the years of education are 8, 10, 12 and 17 years, especially when the years of education are 8 and 17 years.
Second, the remark
Relevant data uploaded my resources, download link https://blog.csdn.net/TIQCmatlab?spm=1011.2124.3001.5343