本文详细介绍了一个基于词频统计的中文分词系统的设计和实现。系统选用了三种统计原理分别进行统计:互信息.N元统计模型和t-测试。论文还对这三种原理的处理结果进行比较、分析各种统计原理的统计特点、以及各自所适合应用的地方。
The paper introduces the design of Chinese word segmenta tion system, which is based on statistic the frequency of the word, and realized in detail. The segmentation system selects three kinds of statistics principles to count separately: Mutual Information, N . Gram and t-test. The paper still compares the results of the three kinds of principles, analyzes the differences of statistics characteristi cs of the three counting principle, and find each suitable situation.