深入分析了现有语料库的构建模式和语料库应具备的功能模块,提出基于文件系统和Clucene全文检索引擎工具包的语料库建设方案.实验证明,Clucene具有丰富的接口设计和良好的扩展性,为语料库建设提供了一种较好的技术实现方式.
This paper examines deeply the constructed models of the current corpus building design and the functions corpus should have. A new corpus design based on file system and Clucene full text searching engine package is pro- posed. Experiments show that Clucene provides various types of interfaces and can be easily extended for large quantity data. These characteristics make the package a promising platform for corpus building.