提出了一种独立于具体领域的文本线性分割统计模型,其中采用多元判别分析方法定义了4种全局评价函数,实现对文本分割模式的全局评价,寻找满足分割单元内距离最小化和分割单元间距离最大化条件的最好分割方式.该模型采用遗传算法来解决新模型的高计算复杂度问题.比较性实验结果显示,新模型比TextTiling和Dotplotting算法取得了更高的Pμ肝价性能.
This paper proposes a new domain-independent statistical model. In this model, four multiple discriminant analysis (MDA) criterion functions are defined and used to achieve global optimization in finding the best segmentation by means of the smallest within-segment distance, the largest between-segment distance and segment length. To alleviate the high computational complexity problem introduced by the new model, genetic algorithms (GAs) are used. Comparative experimental results show that the methods based on MDA criterion functions have achieved higher Pμ than that of TextTiling and Dotplotting algorithms.