为了改进传统的文本检索技术存在检索文件格式种类单一,索引大数据量文件速度慢,甚至造成内存溢出等问题,基于Lucene系统及相关技术,研究了基于合并因子的多种格式文件索引技术,并在此基础上构建了中文文本信息检索系统.实验分析表明,本系统有效地实现了多种格式文件检索功能,通过合并因子的设定有效提高了索引速度,系统可靠性高.
Traditional file indexing technology has many problems,such as single formats of file,low speed of indexing a mass of data and documents,and even out of memory.To tackle the above problems,this paper proposes the multiple formats file indexing technology based on merging factor.Furthermore,the Chinese text information retrieval system is built based on the improvement of the Lucene system.Experimental results show that the system effectively realizes the function of the multiple formats file indexing with high reliability.The speed of indexing is improved by the set of merging factors.