中文自动分词是web文本挖掘以及其它中文信息处理应用领域的基础.蓬勃发展的中文信息处理应用对分词技术提出了更高的要求.提出了一种新的分词算法FPLS,该算法用拼音首字母作为词语表一级索引,词语的字数为二级索引构造分词词典,采用双向匹配方法,并引入规则解决歧义切分问题.与现有的快速分词算法比较,该算法分词效率高且正确率高.
Chinese automatic segmentation is the basis of web text mining and other Chinese information processing applications. Booming Chinese information processing applications put forward a higher requirement for Chinese automatic segmentation. This paper presents a new segmentation algorithm FPLS, which uses a dictionary with a first letter of the Pinyin as a first level index and words count as the secondary index structure. A bidirectional matching method and rules are employed to resolve ambiguity segmentation problem in the algorithm. Comparing with the existing algorithm,algorithm FPLS gets higher accuracy and efficiency.