各种媒体每天有大量的新闻报道产生,需要一种自动化的分析方法将新闻以一种更加清晰的组织形式展示给用户.大多已有工作将新闻划分成平面的话题,然而一个话题并非仅仅是简单的新闻集合,而是由一系列相互关联的事件所组成的.由于话题内的事件之间往往非常相似,导致话题内的事件检测精确度较差.为了克服以上问题,提出了基于事件词元委员会的事件检测与关系发现方法.即首先挖掘每个事件的核心词元,随后利用事件的核心词元进行事件检测与关系发现.在Linguistic Data Consortium(LDC)的两个数据集上的实验结果显示,提出的事件检测与关系发现方法可以显著地改善已有方法的效果.
With the overwhelming volume of news stories created and stored electronically everyday, there is an increasing need for techniques to analyze and present news stories to the users in a more meaningful manner. Most previous research focuses on organizing news set into flat collections (topics) of stories. However, a topic in news is more than a mere collection of stories: it is actually characterized by a definite structure of inter-related events. Unfortunately, it is very difficult to identify events within a topic because stories about the same topic are usually very similar to each other irrespective of the events they belong to. To deal with this problem, two methods based on event key terms to identify events and their relations are proposed. For event identification, some tight term clusters are first captured as term committees of potential events, and then used to find the core story sets of potential events, and each story is assigned to an event core story set at last. For event relation discovery, the term committees are also used to improve story and event similarity calculation. The experimental results on two Linguistic Data Consortium (LDC) datasets show that the proposed methods for event identification and relation discovery outperform previous methods significantly.