以概念统计为基础,以WordNet为语义资源进行语义消歧和概念归并,提出了一种概念共现图模型并把它应用于多文档自动文摘.该模型利用概念间的共现信息构造概念共现图,抽取多文档集合的主题概念,再根据主题概念构建向量空间模型并计算句子的重要性.由于对概念进行了良好的归纳,该模型能够挖掘蕴涵在文档集中的深层次主题.在DUC2005数据集上评测的结果表明,该方法取得的效果令人满意,可用于实际的应用.
A concept co-occurrence graph model was proposed and applied to automatic multi-document summarization. This model bases itself on the concept counting, disambiguating the different meanings of multi-sense words on the basis of the semantic resource -- WordNet and merging concepts. It constructs concept co-occurrence graphs and extracts subject concepts from the multi-document set by means of the co-occurrence information between concepts. Subsequently, it builds a vector space model and computes sentence importance in accordance with the subject concepts. As a result of generalizing the concepts well, this model is capable of digging out subjects hidden deep in the document set. Results from the DUC2005 evaluation indicate that the model of content co-occurrence graph can be put into practice.