为了在大量的新闻中快速找到自己感兴趣的内容,提出在单文档中基于加权TextRank算法提取主题句的方法,以得到新闻关键事件信息。通过计算新闻文本句子关键词的互信息值,对新闻报道进行事件句和非事件句的分类,过滤出非事件句。基于TextRank算法的思想,构建一个事件句有向图,引入句子位置、句子相似度和关键词覆盖频率3个影响因子,以此计算句子之间的影响权重,利用TextRank模型对图中的每个点计算权重,并选取排序最靠前的句子作为关键事件的主题句。实验结果表明,该方法的抽取效果优于基于词频-逆文档概率和新闻标题的主题句抽取方法。
In order to quickly find the content you are interested in in large number of news, a method based on weighted TextRank algorithm is proposed to extract the topic sentence in a single document and get information about key news events. It classifies news reports as event sentences and non-event sentences and filters the latter by calculating the mutual information value of the keywords in the news text sentences. It constructs a directed graph of event sentences on the basis of TextRank algorithm, and calculates the influence weight between sentences by introducing three influence factors of the sentence position, sentence similarity and keyword coverage frequency. It calculates the weight for each point in the graph by using TextRank model and selects the most front sorting sentences as topic sentences of the key events. Experimental results show that the proposed method is better than the methods based on Term Frequency-Inverse Document Probabilistic(TF-IDF) and news title in topic sentence extraction.