全文检索技术能提高从海量数据中查找特定信息的效率,但传统的检索技术极大地消耗资源。以Emule、BT为代表的P2P软件实现了对文件的定位和高速下载,但它们对多种格式的中文文档解析及关键字提取能力不够,且网络路由中存在热点效应问题。提出一种基于P2P分布式网络的全文检索系统,并讲述了该系统的整体结构、关键技术、系统实现。实践证明该系统能有效地解决这些问题。
Full-text retrieval technology can improve the efficiency of finding specific information from the massive data.However, traditional retrieval techniques greatly consume system resources.P2P file-sharing software,such as Emule, BT,realize file position- ing and high-speed downloads,but they have no enough capacity to analysis variety format Chinese documents and to extract key words.At the same time,it has the existence of hot issues in network routing.This paper proposes a full-text retrieval system based on P2P network,which can effectively solve these problems.