文本聚类是当前文本信息挖掘的基础和研究的重点。给出一种新的文本聚类方法,它将概念格和复杂网络有机地结合起来,以达到更优的聚类效果。首先计算关键词特征权值并对特征向量进行降维处理,然后根据关键词权值大小映射到形式背景中,通过本文所给出的新的相似度公式,计算出形式背景中概念相似度的大小,从而构造GN网络并应用GN算法进行文本概念聚类。最后通过实例,验证了方法的可行性。
Text clustering is a basic and important topic in text mining. This paper presents a new text clustering method which takes the advantages of concept lattice and complex network. The algorithm firstly computes the weights of the key words and processes the problem of decreasing dimension,and then the formal context is constructed in terms of key words which have the proper weight. Secondly, the similarities between concepts are computed by using of the formula proposed in this paper. Text concept clustering can be done by the construction of GN network and application of GN algorithms. At last,the experiment shows the validity of this method.