为从海量微博中高效地获取不同话题下的关键信息,微博观点摘要成为自然语言处理领域近期研究的热点之一。基线方法基于TF.IDF算法抽取微博句中的关键词,并据此计算微博的重要性分数,直接筛选出观点摘要;朴素改进方法在基线方法的基础上,增加了情感分类步骤,并利用微博句之间的语义距离,将摘要句候选集中语义重复、重要度较小的句子去除,生成观点摘要;基于语义图优化算法的方法在朴素改进方法的基础上,利用微博句的重要性分数及微博句之间的语义距离构建语义图结构,并通过图优化算法筛选出观点摘要。朴素改进方法在COAE2016评测任务一测试数据集上,10个话题的平均ROUGE-1值达到26.39%,平均ROUGE-2值达到0.68%,平均ROUGE-SU4值达到5.69%,且评测官方公布结果显示,该方法在9项评价指标中获得6项最佳性能。基于语义图优化算法的方法在评测样例数据集上进行了实验,结果显示,该方法比朴素改进方法在ROUGE-1,ROUGE-2,ROUGE—SU4值上分别提升了0.63%,1.51%,2.69%。
To obtain key information in different topics efficiently, microblog opinion summarization has been a hot spot in natural language processing recently. The baseline method of this paper extracts keywordsusing TF-IDF algorithm, and calculate the importance scores of microblogs to filter out opinion summarization directly; the naive improved methodadded a step of sentiment classification, andremove microblogs which are of low importance and high semantic repetitionusing semantic distance between microblogs to generate opinion summarization;the method based on semantic graph optimization algorithm constructs a complete graph using importance scores and semantic distance of microblogs, and filters out the opinion summarization using graph optimization algorithm. According to the official result of evalua- tion,on the test dataset of COAE2016, the average ROUGE-1 value, ROUGE-2 value and ROUGE-SU4 value of 10topics using the naive improved methodreached 26.39%, 0.68% and 5.69% respectively, and got 6 max values out of 9 kinds of evaluation index. Besides, the results of experiments done on COAE2016 sample datasetshows that by using the method based on semantic graph optimization algorithmthe ROUGE-1 value, ROUGE-2 value and ROUGE- SU4 value increased by 0.63%, 1.51%, 2.69% respectively.