东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于关键词元的话题内事件检测

ISSN号：1000-1239
期刊名称：《计算机研究与发展》
时间：0
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]清华大学计算机科学与技术系,北京100084
相关基金：国家自然科学基金项目（90604025）

关键词：事件检测, 事件关系发现, 关键词元, 话题检测, 新闻组织, event identification, event relation discovery, term committee, topic detection, news organization

中文摘要：

各种媒体每天有大量的新闻报道产生,需要一种自动化的分析方法将新闻以一种更加清晰的组织形式展示给用户.大多已有工作将新闻划分成平面的话题,然而一个话题并非仅仅是简单的新闻集合,而是由一系列相互关联的事件所组成的.由于话题内的事件之间往往非常相似,导致话题内的事件检测精确度较差.为了克服以上问题,提出了基于事件词元委员会的事件检测与关系发现方法.即首先挖掘每个事件的核心词元,随后利用事件的核心词元进行事件检测与关系发现.在Linguistic Data Consortium（LDC）的两个数据集上的实验结果显示,提出的事件检测与关系发现方法可以显著地改善已有方法的效果.

英文摘要：

With the overwhelming volume of news stories created and stored electronically everyday, there is an increasing need for techniques to analyze and present news stories to the users in a more meaningful manner. Most previous research focuses on organizing news set into flat collections （topics） of stories. However, a topic in news is more than a mere collection of stories： it is actually characterized by a definite structure of inter-related events. Unfortunately, it is very difficult to identify events within a topic because stories about the same topic are usually very similar to each other irrespective of the events they belong to. To deal with this problem, two methods based on event key terms to identify events and their relations are proposed. For event identification, some tight term clusters are first captured as term committees of potential events, and then used to find the core story sets of potential events, and each story is assigned to an event core story set at last. For event relation discovery, the term committees are also used to improve story and event similarity calculation. The experimental results on two Linguistic Data Consortium （LDC） datasets show that the proposed methods for event identification and relation discovery outperform previous methods significantly.

同期刊论文项目