应用图模型来研究多文档自动摘要是当前研究的一个热点,它以句子为顶点,以句子之间相似度为边的权重构造无向图结构.由于此模型没有充分考虑句子中的词项权重信息以及句子所属的文档信息,针对这个问题,该文提出了一种基于词项—句子—文档的三层图模型,该模型可充分利用句子中的词项权重信息以及句子所属的文档信息来计算句子相似度.在DUC'2003和DUC'2004数据集上的实验结果表明,基于词项—句子—文档三层图模型的方法优于LexRank模型和文档敏感图模型.
Graph model has been widely applied to document summarization by using sentence as the graph nodes, and the similarity between sentences as the weights of edge. However, the knowledge of terms and documents are neglected in this model. In this paper, we propose a tri-layer graph model based on the term, the sentence and the documentto make full use of knowledge when computing the similarity of sentences. The experimental results on the data sets of DUC'2003 and DUC'2004 show that the proposed model outperforms the state-of-the-art LexRank model and Document Sensitive Ranking model.