提出一种有限混合模型对无监督文本聚类的广义方法.它将特征对各混合成员的相关性作为隐变量引入混合模型,在一个统一框架中完成混合模型的模型选择、特征选择以及参数估计.在大规模文本数据集上的实验结果表明该方法在模型选择、特征选择和聚类结果3个方面都取得较好效果.
A generalized method is presented for unsupervised text clustering. The relevance of the features to the mixture components is introduced to the mixture model as a set of latent variables.Then the model selection, feature selection and parameter estimation of the mixture model are integrated into one general framework. Experimental results on four large scale document datasets show that the proposed method achieves fine results in model selection, feature selection and clustering performance.