目的 探讨在医疗费用调查中针对因变量数据所存在的选择性偏倚和随机无应答问题的两阶段校正方法。方法 通过模拟得出不同程度随机缺失和非随机缺失并存时的多个数据集。首先对随机缺失(MAR)通过预测均数匹配法(PMM)、倾向性得分法(PS)、基于Bootstrap的EM算法(EMB)和马尔科夫链蒙特卡洛算法(MCMC)四种填补方法进行多重填补(第一阶段);在此基础上对选择性偏倚造成的缺失数据进行样本选择模型拟合(第二阶段);最后对多个拟合结果进行合并。对模拟出的结果以标准偏倚、均方误差的平方根和可信区间平均长度作为评价填补方法优劣的标准。结果 在任意一种缺失情况下,PS法的结果相对不理想。当非随机缺失为轻度时,不同程度随机缺失情况下的填补方法选择为:在随机缺失也为轻度时,MCMC法最好;在随机缺失为中度时,EMB法最好;在随机缺失为重度时,PMM法最好。当非随机缺失为中度时,无论随机缺失程度如何,MCMC都是最好的方法;当非随机缺失为重度时,无论随机缺失程度如何,PMM都是最好的方法。结论 PMM、EMB和MCMC法均是处理随机缺失较好的填补方法,可以根据本次研究的模拟结果有选择的将填补方法运用于各种不同缺失情况的实际调查。
Objective To explore the two-stage correction method to solve the phenomenon of missing data in the dependent variable arise from both random non-response and sample selection bias. Methods Simulating the multiple data sets, each of which has the coexistence of missing at random and not missing at random at different degree respectively. Firstly, imputing the random non-response data which are missing at random (MAR) by multiple imputation, including Predictive Mean Matching (PMM) Method, Propensity Score (PS) Method, Markov chain Monte Carlo (MCMC) method and EMB algorithm (the first stage) ;secondly, on this basis sample selection model can be used to fit the missing data which are not missing at ran- dom to calibrate selection bias (the second stage) ;finally, the mutiple fitting results of sample selection model are combined. The standardized bias ,the root-mean-square error and the average length of confidence interval are used as evaluation criterias to de- scribe the performance of the various methods. Results In any case, the result of the PS method is unsatisfactory relatively. When the level of not missing at random is mild, the chioce of imputation methods in the condition of different degree of missing at random is as follows:when the degree of missing at random is mild, the MCMC method is the best;when the degree of missing at random is moderate,the EMB method is the best;when the degree of missing at random is severe,the PMM method is the best; When the level of not missing at random is moderate, no matter what degree the missing at random is, the MCMC method is the best;When the level of not missing at random is severe, no matter what degree the missing at random is, the PMM method is the best. Conclusion The PMM, EMB and MCMC are all great imputation methods to deal with missing data which are missing at random. The imputation methods can be selectively applied to the actual investigation of different situations of missing data according to this simulation results.