提出了一种基于启发式错误驱动学习的中文时间表达式识别的新方法。该方法先采用依存分析方法以时间触发词为切入点递归地识别时间表达式,有效地解决了长距离依赖的问题,大大提高了识别效果;在此基础上,对比错误识别结果和人工标注,采用启发式A*算法搜索策略进行错误驱动学习,降低了规则学习的复杂度,并具有区分每条规则的有效性和规则间相容性的优点,使系统性能提高近6%。最终在封闭测试集和开放测试集上,F值分别达到了77.96%和77.92%。
This paper proposes a new method tor recognizing Chinese time expression based on the heuristic error-driven learning. The method begins with a time trigger word to recognize the time expressions regressively using the dependency parsing, so it resolves the problem of long distance dependency effectively and improves the system performance greatly. Based on this, it uses the error-driven learning integrating the A^* algorithm to heuristically learn the rules, which not only decreases the complexity of learning rides, but also differentiates the validity of each rule and compatility among rules, resulting in an increase of 6% in system performance. Finally, it creats the F values of 77.96% and 77.92% on the closed test and the open test respectively.