摘要本文聚焦于因变量为等级数据的中介效应模型,通过模拟研究比较了Logistic回归与通常线性回归的差别,并比较了这两种分析框架下常用的系数乘积法和系数差异法的优劣,同时考察了因变量类别数对估计结果的影响。研究结果表明:对因变量为等级数据的中介效应模型,应使用Logistic回归进行分析,如果使用了通常线性回归分析,会导致中介效应低估、标准误低估、置信区间对真值覆盖比例偏低等问题。对于等级因变量的中介效应估计,系数乘积法得到的结果优于系数差异法,随着等级数的增加Logistic回归与通常线性回归的差别越来越小,当因变量的类别数较多(5及以上)时,可考虑使用通常线性回归的分析方法。最后通过一个实例说明了该方法的应用。
Statistical mediation analyses have been widely used to investigate the mechanism of mediating effects, in which mediator M mediates the effect of independent variable X on dependent variable Y. For the last 25 years, the causal steps approach as described by, for example, Baron and Kenny (1986) had dominated and become the standard procedure for statistical mediation analyses. However, most of the research in these statistical mediation analyses were conducted with the dependent variable being continuous. In this article, basing on the methods of MacKinnon (1993, 2007), we examined a more appropriate procedure of categorical data analysis rather than that for continuous data in the examination of mediation models when the outcome variable is binary or ordinal. We believed that the logistic regression should be used to analyze categorical data, while the ordinal line regression is more appropriate for analyses involving continuous data. Two approaches have been usually used in the analyses of mediation effect: one involving the examination of the product of coefficient while the other involving of the comparison of the difference of the respective coefficients. In this study, therefore, we compared the performance of these two methods with the logistic regression and the ordinal line regression respectively, using the Monte Carlo simulation method. These methods were compared with respective to three factors, namely, sample size, size of mediation effects, and the number of categories in the outcome variable. These factors were systematically varied in the simulations with: i) sample size at 50, 100, 200, 500 and 1000; ii)the number of categories in the outcome variable set at 2, 3 and 5; and 3) the standard regression coefficients of a, b and c' set at 0, 0.14, 0.39 and 0.59respectively generating of 63 combinations of the coefficient combinations (the all 0.59 was dropped due to improper solution). So, a total of 5 sample size x3 categories of outcome variables x 63 regression coefficie