中国句子组的自动分区基于讲话对统计机器翻译系统很重要。这篇论文介绍一条途径给这个问题:首先,在一篇讲话的每个句子被表示为特征向量;第二,一个特殊层次聚类算法被使用作为一棵句子组树介绍一篇讲话。在这份报纸,本地出现措施被建议到关键短语的选择和关键短语的重量的评估。试验性的结果显示出我们答应的途径。
Automatic partition of Chinese sentence group is very important to the statistical machine translation system based on discourse. This paper presents an approach to this issue: first, each sentence in a discourse is expressed as a feature vector; second, a special hierarchical clustering algorithm is applied to present a discourse as a sentence group tree. In this paper, local reoccurrence measure is proposed to the selection of key phras and the evaluation of the weight of key phrases. Experimental results show our approach promising.