为了便于用户浏览搜索引擎返回的搜索结果,快速有效地定位有价值的Web文档,提出了基于概念分组的Web搜索结果聚类算法.首先,建立特征词同现网络,利用概念分组技术挖掘特征词之间的语义关联,形成主题概念类;然后,计算文档与各概念类之间的距离,据此实现Web搜索结果的聚类;最后,综合考虑特征词在类内和文档集中的重要性进行类别标签的选择.实验结果表明本算法具有较好的聚类性能,明显优于k-均值算法,且产生的类别标签容易理解.
In order to facilitate the browse of the search results obtained by search engines and to rapidly and effectively find valuable Web documents, this paper proposes a new clustering algorithm of Web search results based on the conceptual grouping. In this algorithm, first, the co-occurrence networks of characteristic terms are built. Next, the semantic relationships among characteristic terms are mined via the conceptual grouping to form different clusters related to the query topic. Then, the distances between the Web documents and the formed clusters are calculated for the clustering of Web search results. Finally, the cluster labels are selected according to the importance of characteristic terms in the search .results and the clusters. It is indicated by experiments that the proposed algorithm performs better than the k-means algorithm, and that the labels selected by the algorithm are apprehensible.