文摘句排序是多文档自动文摘中的一个关键技术,直接影响到文摘的流畅程度和可读性。文本时间信息处理是影响排序算法质量的瓶颈技术,由于无法获得准确的时间信息,传统的句子排序策略均回避了这一问题,而且均无法获得稳定的高质量的排序效果。对此该文从文本时间信息处理入手,首先提出了中文文本时间信息抽取、语义计算以及时序推理算法,并在此算法基础上,借鉴传统的主成分排列的思想和句子相关度计算方法,提出了基于时间信息的句子排序算法。实验表明该算法的质量要明显好于传统的主成分排列算法和时序排列算法。
Sentences ordering is a key issue in the multi-documents automatic summarization, which influences the fluency and readability of the summarization. Among them, temporal information processing is the bottleneck technology which affects the quality of the ordering algorithm. Traditional ordering methods ignore this factor because the temporal information processing is very difficult, and, as a result, they could not achieve steady and high quality ordering effects. To address this issue, this paper proposes an algorithm of Chinese text temporal information extraction, semantics computation and temporal reasoning. Then, based on the strategy of the majority ordering and the computation of sentences similarity, we propose sentences ordering algorithm based on the temporal information. The experiments show that the quality of this algorithm outperforms the calssical majority ordering algorithm and the chronological ordering algorithm.