网络检索结果聚类是将搜索引擎的检索结果聚类为有意义的类别,并赋予标签描述,以便用户快速获得所需信息的技术。文章根据网络检索结果聚类算法的改进方向将其分为面向经典和面向标签的聚类算法两类。前者的改进主要有优化特征选择、优化聚类数K以及生成重叠聚类等;后者的改进主要有优化类计分运算、优化类合并运算、数据结构优化、候选标签选择以及基于语义的优化等。在对相关研究进行综述的基础上探讨了检索结果聚类面临的问题和未来的发展方向。
By clustering search results into meaningful clusters and giving the appropriate description labels,network search results clustering is used to help to get the information quickly. Based on the improving direction of network search results clustering algorithm,this paper divides the algorithm into classical clustering oriented algorithm and label-oriented algorithm. The improvement of classical clustering oriented algorithm mainly includes optimization of feature selection,optimization of cluster number K and generation of overlapping clustering. The improvement of label-oriented algorithm mainly includes optimization of class scoring operation,optimization of class merging operation,optimization of data structure,selection of candidate label and optimization based on semantics. On the basis of the review of relevant studies,the paper discusses the existing problems and future developing directions of search results clustering.