东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种基于文档拓扑的相似性搜索算法

ISSN号：1002-8331
期刊名称：计算机工程与应用
时间：0
页码：146-150
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]黑龙江大学计算机科学技术学院,哈尔滨150080, [2]黑龙江大学计算生物学重点实验室,哈尔滨150080
相关基金：国家自然科学基金No.60973081; 黑龙江省教育厅科学技术研究面上项目（No.11541263 No.11551352）
相关项目：自适应的中文网络意见挖掘关键技术研究

关键词：文档拓扑, 相似性搜索, 相似度, document topology, similarity search, similarity

中文摘要：

从海量文档中快速有效地搜索到相似文档是一个重要且耗时的问题。现有的文档相似性搜索算法是先找出候选文档集,再对候选文档进行相关性排序,找出最相关的文档。提出了一种基于文档拓扑的相似性搜索算法——Hub-N,将文档相似性搜索问题转化为图搜索问题,应用相应的剪枝技术,缩小了扫描文档的范围,提高了搜索效率。通过实验验证了算法的有效性和可行性。

英文摘要：

Searching for similar documents from the large number of documents quickly and efficiently is an important and time-consuming problem.The existing algorithms first find the candidate document set,and then sort them based on a document related evaluation to identify the most relevant ones.A topology-based document similarity search algorithm——Hub-N is put forward,and the document similarity search problem is transformed into graph search problem,applying the pruning techniques,reducing the scope of scanned documents,and significantly improving retrieval efficiency.It proves to be effective and feasible through experiment.

同期刊论文项目