目的以全国血吸虫病疫情监测资料为例,确定该疾病监测资料中缺失值的最佳填充次数。方法采用多重填充技术中的MCMC方法对1990~2003年纵向监测资料中的缺失数据进行填充,通过填充间方差、自由度、方差相对增量、总体参数缺失部分信息的估计和填充效率等指标对填充效果进行综合评价,得到最佳的填充次数。结果在200个观测值的28个变量中,含有缺失值的变量有人群感染率、家畜感染率、春季感染螺密度、春季活螺密度和牛粪污染率,缺失比例分别为:6%、6.5%、17%、16.5%和53%,为任意缺失模式,最佳的填充次数为31次。结论MCMC模型适用于全国血吸虫病监测资料的缺失模式且填充次数为31次时填充结果最优。
Objective To determine the most appropriate number of multiple imputations (MI) on the schistosomiasis surveillance data of China. Methods Markov Chain Monte Carlo (MCMC) method of MI was used to impute the missing values in the dataset of schistosomiasis surveillance from 1990 to 2003. The indices of variance between imputa tions, degree of freedom ( DF), relative degree of variance increasing, evaluation of missing information for the population paramters and relative efficiency were adopted to assess the effects of imputation from different aspeets and then the best imputation number was determined. Results There are 200 observations and 28 variables in the whole dataset. Five variables had missing values,the infection rate in the total population(6% ) ,the livestock's infection rate ( 6. 5% ), the density of infected snail in spring ( 17 % ), the density of live snails in spring (16.5 % ) and the contaminated rate of cattle dejecta (53%). The pattern of missing data was arbitrary missing pattern and the most appropriate number of imputations was 31 for this study dataset. Conclusion MCMC method was suitable for the pattern of missing data in the schistosomiasis surveillance data of China and the best imputing result was obtained when the number of imputations was 31.