分层狄利克雷过程是一种贝叶斯无参模型,用于分析海量数据的概率主题模型,解决潜在狄利克雷分布无法解决的动态聚类的问题。从因子图的角度出发将消息传递算法与吉布斯采样算法结合用于解决贝叶斯无参模型后验概率推断问题,最终将该算法与LDA以及HDP算法在混淆度方面进行对比。实验结果表明,该算法相比HDP采样算法收敛较快,最终也能收敛到LDA模型最优主题数目下的混淆度。
As a kind of probabilistic topic model to analyse documents, hierarchical Dirichlet process is also a kind of Bayesian non-parametric model to solve the problem of the dynamical clustering that latent Dirichlet allocation unsolved. From the view of factor graph, this paper combined the belief propagation and Gibbs sampling to estimate the posterior probability of the Bayesian non-parametric model. Comparing with the belief propagation algorithm of the latent Dirichlet allocation and the Gibbs sampling algorithm of the hierarchical Dirichlet process, the sampling belief propagation algorithm converges faster, and converges to the best perplexity of the LDA model on the best number of topics.