时间关系的识别成为近年来自然语言处理领域(nature language processing,NLP)的一个研究热点。引入时间片段和主题片段这两种比事件触发词粒度粗的语义单元进行时间关系识别,首先在文本中利用一些时间篇章特点识别时间片段,然后利用相似度计算与支持向量机(support vector maehine,SVM)模型相结合的方法识别主题片段,最后在主题片段范围内,以时间片段为排序对象,使用最大熵分类模型识别时间关系。在TempEval-2010的汉语语料上进行实验,得到的时间关系识别宏平均精确率为60.09%。实验结果表明:引入时间片段后可有效减少不必要的事件时序关系的识别;同时,在主题片段的约束下所得到的时间关系更简洁、语义逻辑性更好。
Temporal relation recognition is a research focus in NLP(nature language processing). This paper identifies temporal relations based on temporal segment and topic segment,which semantic granularities were coarser. First,temporal segments were recognized according to temporal discourse characters. Then,topic segments were recognized through computing similarity between paragraphs and the SVM model. Final,within each topic segment,temporal relations between the adjacent temporal segments were identified by maximum entropy classifier. Experiments were made on TempEval-2010 corpus of Chinese,the macro-average precision of temporal relation recognition was 60.09%. The experimental results show that introduction of temporal segments can reduce the redundant recognition of the temporal relations between events. And with the scope constraint of topic segments,the results of temporal relations become more concise and understandable.