目的 介绍解决Logistic回归中分离问题的统计学方法并进行比较。方法 用最大似然估计、确切Logistic估计及Firth惩罚最大似然估计对75名静脉注射吸毒者感染人类免疫缺陷病毒(human immunodeficiency virus,HIV)的情况及其影响因素进行分析,并比较其结果。结果 确切Logistic回归及惩罚最大似然估计均得出有效参数估计,前者的可信区间比后者宽。结果显示民族对该市注射吸毒人群感染HIV的影响具有统计学意义,彝族的HIV感染显著高于汉族。结论 当数据出现分离现象导致最大似然估计无效的情况下,确切Logistic及惩罚最大似然估计均能得出有效值,但由于前者计算复杂,可能出现过条件及条件似然退化等问题,推荐使用后者。
Objective To introduce a comparative study of methods for Logistic regression with separated or nearly separated data. Methods Human immunodeficiency virus (HIV) infection and its influential factors, 75 drug users were analyzed by maximum Likelihood estimate, exact Logistic regression and Firth' s penalized maximum Likelihood estimate, then results of the three methods were compared. Results Both of penalized maximum Likelihood estimation and exact Logistic regression produced valid parameter estimates and confidence interval of the latter was wider. The results of penalized maximum Likelihood estimation and exact Logistic regression showed that race was significantly associated with HIV infec- tion, and Yi people with HIV infection was higher than Han people. Conclusions The maximum Likelihood estimate for separated or nearly separated data is invalid. However, exact Logistic regression and Firth' s penalized maximum Likeli- hood estimate can get valid estimates. Since exact Logistic regression have problems of complex calculations, over-condi- tioning and conditional distributions degenerated, Firth' s penalized maximum Likelihood estimate is recommended to Logistic regression with separated or nearly separated data.