已有许多研究建议使用合成信度来估计测验信度,并报告其置信区间。有三种方法或途径可以计算单维测验合成信度的置信区间,包括Bootstrap法、Delta法和直接用统计软件(如LISREL)输出的标准误进行计算。本文通过模拟研究进行比较,发现Delta法与Bootstrap法得到的置信区间相当接近,但用LISREL输出的标准误计算的与Bootstrap法得到的结果相差很大。推荐用Delta法估计合成信度的置信区间(使用Mplus容易实现),但不能直接用LISREL输出的标准误来计算。举例说明了如何计算单维测验的合成信度以及用Delta法计算其置信区间。
The widely used coefficient α may underestimate or overestimate reliability when its premise assumption is violated and therefore is not a good index to evaluate reliability.Composite reliability can better estimate reliability by using confirmatory factor analysis(see e.g.,Bentler,2009;Green Yang,2009).As is well known,point estimate contains limited information about a population parameter and could not give how far it could be from the population parameter.The confidence interval of the parameter could provide more information.In evaluating the quality of a test,the confidence interval of composite reliability has received more and more attention in recent years.There are three approaches to estimate the confidence interval of composite reliability of a unidimensional test:Bootstrap method,Delta method and directly using the standard error in the output of an SEM software(e.g.,LISREL).Each of the three approaches produces a standard error of composite reliability.Then the confidence interval can be easily formed based on the standard error.Bootstrap method provides an empirical result of the standard error of composite reliability and is the most credible,but the method needs data simulation technique and is not be easily mastered by general applied researchers.Delta method computes the standard error of composite reliability by approximate calculation,and the method is much simpler than Bootstrap method.LISREL software can directly give the standard error of composite reliability,and this method is the simplest among the three methods.To evaluate the standard errors of composite reliability obtained by Delta method and LISREL software,we compared them with that obtained by Bootstrap method,because the latter can be treated as the true value in theory.A simulation study was conducted to the comparison.Four factors were considered in the simulation design:(a) the number of items on each test(k=3,6,10,and 15);(b) factor loading(high,medium and low);(c) sample size(N=100,300,500