基于Web文本挖掘问题,提出了一种改进的索引结构的词库组织体系及基于该词典结构的中文分词算法。同时,加强消除歧义方面的处理,分词精度有所提高。试验结果表明,采用该方法可较大提高中文文本的切词速度及信息的查全查准率。
In the paper, we will give a improving method of organizing Chinese word library and the segment algorithm based on Web text mining. Also, we strengthen the processing of dispelling ambiguity in order to increase the segment precision. A large number of simulation experiments show that this method can greatly improve the speed of Chinese word segmentation, the recall ratio and precision of information.