对于logistic回归分析的处理办法,一直采用的都是极大似然估计的EM算法,由于计算方法的固定及计算过程的复杂性,例如,该算法对于初值的选取要求很高,否则收敛速度很慢。Gibbs抽样法作为一种高效灵活的估计方法广泛应用于广义线性回归模型,其中Probit回归模型由于联系函数为正态分布,使得回归系数的后验分布为共轭正态,从而抽样简单快捷。而Logit模型的后验分布比较复杂,无法直接抽取。本文基于增加数据的Gibbs抽样方法,通过引入Pólya-Gamma分布族的潜在变量,使得模型中的回归系数参数的满条件分布为共轭正态分布,从而回归系数的马氏链很容易构造,回归系数的估计为后验均值估计。通过一组实际数据,分别调用R语言Glm包和BayesLogit包,并对比2种方法的估计结果,二者差别不大,表明PólyaGamma潜变量Bayes估计法在处理logistic回归模型时的可用性、准确性。
For the approach to logistic regression analysis, using a maximum likelihood estimation are the EM, due to the complexity and fixity of calculation, for example, the initial value of the algorithm is demanding, otherwise the convergence rate is slow. Gibbs sampling as an efficient and flexible estimation is widely used for generalized linear regression models, due to the contact function is normal in Probit model, so that the posterior distribution of the regression coefficients is Conjugated Normality and sampling is easier. The posterior of Logit model is complex, unable to directly extract, based on Gibbs to increase data by introducing latent variables POlya-Gamma distribution families, making the regression coefficient parameters of full conditional distribution Conjugated Normality, thereby Markov chains regression coefficient is easy to construct the estimated regression coefficients for the posterior mean estimate. Through a set of actual data, respectively, calling R language package of BayesLogit and Glm, and comparing the results of the two methods, the difference is small, indicating POlya-Gamma latent variable Bayesian estimation in dealing with the accuracy of logistic regression model.