目的比较基于设计和基于模型方法在复杂样本统计描述中的表现。方法以2010年中国慢性病及其危险因素监测的收缩压(SBP)和血压升高率为材料,利用多阶段随机抽样模拟抽取1000次(样本量均为2000名),同时赋予样本随年龄增加而变大的应答率,使样本年龄结构偏离目标人群。以均方误差(MSE)和95%可信区间(a)覆盖参数的概率为评价标准,比较基于设计方法,基于模型的常规方法和多水平模型对均数和率进行统计描述时的表现。结果常规方法、基于设计方法和多水平模型在估计SBP均数时,MSE分别为6.41、1.38和5.86,基于设计方法表现最好;3种方法估计的95%CI覆盖总体参数的概率分别为24.7%、97.5%和84.3%,常规方法和多水平模型均可导致统计推断I类错误概率增加。估计血压升高率时,基于设计方法的MSE为4.80,表现优于常规方法(20.9)和多水平模型(17.2);而常规方法95%CI包含总体参数的概率仅为29.4%,多水平模型为86.4%,均低于基于设计的方法(97.3%)。结论对样本结构存在系统偏差的复杂抽样数据进行统计描述时,基于设计方法在估计的无偏性和统计推断的有效性方面均优于常规方法和多水平模型,应作为首选方法。
Objective To compare design-based and model-based methods in descriptive analysis of complex sample. Methods A total of 1 000 samples were selected and a multistage random sampling design was used in the analysis of the 2010 China chronic disease and risk factors surveillance. For each simulated sample, cases with probability proportional age were randomly deleted so that sample age structure was deviated systematically from that of the target population. Mean systolic blood pressure(SBP) and prevalence of raised blood pressure, as well as their 95% confidence intervals(95% CI) were determined using design- based and model-based methods( routine method and multi-level model). For estimators generated from those 3 methods, mean squared error(MSE) was computed to evaluate their validity. To compare performance of statistical inference of these methods, the probability of 95% CI covering the true parameter( mean SBP and raised blood pressure prevalence of the population) was used. Results MSE of mean estimator for routine method, design-based analysis and multilevel model was 6.41, 1.38, and 5.86, respectively; and the probability of 95% CI covering the true parameter was 24.7%, 97.5% and 84.3%, respectively. The routine method and multi-level model probably led to an increased probability of type I error in statistical inference. MSE of prevalence estimator was 4.80 for design-based method, which was far lower than those for routine method(20.9) and multilevel model(17.2). Probability of 95% CI covering the true prevalence for routine method was only 29.4%, and 86.4% for multilevel model, both of which were lower than that for design-based method(97.3% ). Conclusion Compared to routine method and multi-level model, design- based method had the best performance both in point estimation and confidence interval construction. Design- based method should be the first choice when doing statistical description of complex samples with a systematically biased sample structure.