东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种面向突发事件的文本语料自动标注方法

ISSN号：1003-0077
期刊名称：《中文信息学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：上海大学计算机工程与科学学院,上海200444
相关基金：国家自然科学基金（61305053）;国家自然科学基金（61273328）

关键词：突发事件, 语料库, 自动标注, emergency events, corpus, automatic, annotation

中文摘要：

事件语料库是研究语义Web中事件知识的抽取、表示、推理和挖掘的基础和关键技术之一。该文以事件作为文本知识单元,在LTP分析的基础上,用序列模式挖掘算法PrefixSpan从现有的小规模语料库中挖掘事件要素的词性规则等,用同义词词林（扩展版）对触发词表进行了扩充,结合自定义的事件要素词典,采用多遍过滤、逐遍完善的思想提出一种针对大规模突发事件语料库构建的自动标注方法,在实验部分不仅与人工标注做了对比,同时与Stanford CoreNLP NER进行了对比,实验效果理想。

英文摘要：

Event-based text corpus is the foundation for the research on detection, representation, reasoning and exploitation of events in the Semantic Web. This paper proposes an automatic-annotation method for event-based texts to construct large-scale emergencies news corpus. Firstly, this paper presents an event structure model as eventbased knowledge unit; Secondly, on the basis of text process by LTP , we apply the PrefixSpan to mine the rules of event elements from small-scale available corpus; Thirdly, by combining a customized dictionary of event elements, the denoters are expanded by Tonyici Cilin （Extended）. In the experiment, the automatic annotation method is compared with manual tagging method and Stanford CoreNLP NER, showing that this method can improve the efficiency of event-based text annotation effectively.

同期刊论文项目