阐述了事件新闻文本的时间信息抽取与处理对事件研究的重要性,研究了安全事件新闻的时间抽取与转换。考虑到目前采用的基于时间抽取规范TIMEX2/3和机器学习的抽取处理方法得到的时间信息缺少完全统一的形式,在安全事件的舆情发现及分析等场景下很难直接利用的问题,提出了针对安全事件新闻中的时间信息抽取与转换方法。该方法首先对安全事件的新闻根据时间的分类分别对不同形式的时间进行抽取,然后利用六大时间转换算子及时间冲突处理算子输出其时间的年月日时分秒的统一格式。试验表明,采用该方法的抽取结果与使用条件随机场(CRF)的方式进行抽取的结果相差不大,并且在时间转换上的正确率达到90%以上。
The importance of event news reports' the extraction and normalization of temporal sidering time extraction and processing to event research was interpreted expressions for the news reports on security events were studied. , and Con- that now temporal expressions extraction is mainly based on the established norms TIMEX2 or TIMEX3 and machine learning , thus the temporal expressions acquired, discovery and analysis of the public opinions about security and normalization for news reports on security events news temporal are not in an unified form and are not directly applied to incidents, a method for temporal expressions extraction was proposed. This method extracts different kinds of expressions respectively according to the classification of them. And then it uses six temporal expressions normalization operators and the time conflict processing operator to give the unified form for representation of time using year, month, day, hour, minute and second. The proposed method was tested by experiment, and the results indicated that its time extraction effect was similar to the approach using the form of condition random field (CRF). What's more, its correctness of temporal expressions normalization was above 90%.