提出了一种分组并具有三级索引结构的词库组织体系,给出了合适的索引密度间隔;针对系统基本词库的扩充问题,考虑了一种基于词频统计并具有过滤功能的关键词自动抽取和小词条添加方法。大量仿真实验结果表明,采用该方法可较大提高中文文本的切词速度及信息的查全查准率。
In this article, we'll give a method of organizing words library using three level index, and also give the appropriate index density interval; Aim at the expansion of words library, we consider the method of key words auto extraction and small words addition basing on word frequency statistics and having filtration function. A large number of simulation experiments show that this method can improve the speed of Chinese word segmentation and the recall ratio and precision ratio of information.