Lucene是一个优秀的开源的全文搜索技术框架,按照框架规范,扩展它的功能,可以将它很好地嵌入到搜索引擎中。研究了Lucene的索引结构和原理,通过改进增量索引、增大索引缓冲区的大小和减少往磁盘上写索引文件的频率,达到提高创建索引效率的目的。设计了全文检索实验,实验结果表明,该方法使10 000篇文档创建索引的平均效率比前人方法提高了19.5%,具有良好的应用前景。
Lucene is an excellent open-source full-text search technology framework that can be well embedded in its own search engine by expanding its functions in accordance with the framework specification. Lucene index structure and principles were studied, and the efficiency of indexing was enhanced by improving incremental indexing, increasing the size of index buffer in memory and decreasing the frequency of writing index to disk. A full-text retrieval experiments were designed. As a result, the average efficiency of creating index for 10 000 documents has been improved by 19. 5%, and the method has good prospects.