相较于其他聚类算法,模型聚类的实证研究结果表现出了独特的优势,越来越受到学界的关注。本文梳理了混合模型文本聚类的相关研究,根据聚类分析的技术路线,主要综述了文本建模、参数建模以及模型推理等三个主要模块,在此基础上总结了特征降维、半监督聚类以及聚类过程的系统整合等不同研究中的共性问题。最后,提出了本领域未来可能的研究方向。
Model-based clustering has attracted more and more attention, and empirical studies also showed distinct advantage. This paper reviews the status of the document clustering based on mixture models. According to the technical routes, it summarizes three main parts, such as document modeling, parameter modeling, and model inference, and analyses the common problems in different researches, including feature reduction, semi-supervised clustering and the integration of clustering process. At last it presents possible future research directions in this field.