在一个大规模教育测量中,以小学四年级和初中二年级的数学测验为例,使用基于项目反应理论(item response theory,IRT)的测验决策一致性系数作为评价测验信度的标准之一,并比较在测验中选取不同分界分数、分数量尺时决策一致性系数的差异.结果发现:相比经典测验理论(classical test theory,CTT)下的信度系数,基于IRT下的测验整体信度要高于CTT下的信度;划定的分界分数(cut score)个数越少,决策一致性系数越大;分界分数位置会影响决策一致性系数,能力水平在分界分数附近的考生更容易被划分到不同类别中;将测验原始分数转换成量表分数后,多个原始分数对应一个转换分数的规则会增大决策一致性系数.
Two real data sets of a large-scale educational assessment program were used to investigate classification consistency indices and to explore pivotal index-influencing factors.It was found that the overall reliability based on IRT was higher than when based on CTT.With decreasing number of cut score and manyto-one transformation rule,classification consistency indices were higher than under other conditions.In the future,it will be useful to apply IRT method and classification consistency indices to the actual educational measurement.