实际应用中测验往往具有多维结构,如果仍采用单维IRT方法进行等值,会得到不准确的结果。因此对于多维结构的测验,需要使用多维IRT等值方法来实现参数的转换。基于共同题设计,文章通过模拟研究的方法,考察了不同铆测验设计下几种多维IRT等值方法的表现,同时考虑了测验长度、两个维度题目数量的比例、铆测验长度、铆测验的选择策略、两个维度之间的相关和等值群体的能力水平差异六个因素的影响。所比较的多维IRT等值方法有:均值/均值(MM)方法,均值/标准差(MS)方法,Stoking-Lord(SL)方法,Haebara(HB)方法,最小平方(LS)方法。结果显示:(1)SL,HB和LS方法得到的等值误差均方根最小,且在各条件下表现较为稳定。(2)MM和MS方法在非等组条件下呈现出很大的误差均方根。(3)铆测验设计对SL,HB和LS方法的等值结果没有显著影响。(4)在两个维度之间的相关较高,测验长度和铆测验长度较长,等值群体的能力水平没有差异的条件下,SL,HB和LS方法得到的等值误差均方根最小。
A great number of educational assessments usually measure more than one trait (Ackerman, 1992; DeMars, 2006; Reckase, 1985). In order to adjust scores on these different test forms, multidimensional item response theory (MIRT) and its linking procedures should be developed. So far, some researchers have already extended UIRT linking methods to the multidimensional structure (Davey et al., 1996; Hirsch, 1989; Li Lissitz, 2000; Min, 2003; Yon, 2006). There were numerous studies comparing MIRT linking methods in the literature. However, although choosing anchor items was of great importance in common item designs, a few of studies compared MIRT linking methods under different common item designs. It was still in doubt that, how we could choose the common items across different MIRT linking methods. The purpose of this study was to compare five MIRT linking methods under two kinds of common item choosing strategies in various situations. The study was a mixed measure design of simulation conditions (between-factors) and linking methods (within-factor). There were six between-factors: (1) 2 test lengths (40 items and 80 items); (2) 2 levels of the proportion of the number of items in one dimension to another (1:1 and 1:3); (3) 3 anchor lengths (1/20, 1/5 to 1/3 of total test); (4) 2 strategies of choosing common items (averagely choosing the items in all dimensions or choosing according to the proportions of items in every dimension); (5) 3 correlations between two ability dimensions (r=0, 0.5, 0.9); (6) 2 levels of equivalent/non-equivalent ability levels between two populations. The five MIRT linking methods we investigated were: Mean/Mean (MM) method, Mean/Sigma (MS) method, Stoking-Lord’s (SL) method, Haebara’s (HB) method and Least Square (LS) method. Under each condition, the number of examinees was fixed as I =2000, and 30 replications were generated. BMIRT (Yao, 2003) was applied to estimate item and ability par