提出了一个能有效结合待测话题信息的Dirichlet过程混合模型进行话题追踪.模型在基于Gibbs抽样进行参数推理时融入待测话题信息,得到报道和待测话题的相关度.实验结果表明,该方法不需要大规模训练数据,基于少量的种子报道就可以显著提高话题追踪的性能.
A Dirichlet process mixture model which can make use of information of known topics efficient- ly is proposed for topic tracking. Prior knowledge of known topics is combined in Gibbs sampling for mod- el inference, and similarities between new story and known topics can be gained. Experiments show that the model, without a large scale of in-domain data, can improve the performance of topic tracking signifi- cantly even with a few on-topic stories.