东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

面向人名消歧任务的人名识别系统

期刊名称：中文信息学报
时间：0
页码：17-22
语言：中文
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]东北大学自然语言处理实验室,辽宁沈阳110819
相关基金：国家自然科学基金（61073140,61100089）,中央高校基本科研业务费专项资金（N110404012）,高等学校博士学科点专项科研基金（20100042110031）
相关项目：文本观点倾向性分析和挖掘关键技术研究

关键词：中文句法分析, 移进-归约分析, 伯克利句法分析器, 向上学习, 无标注数据, Chinese syntactic parsing, shift-reduce parsing, Berkeley parser, uptraining, unlabeled data

中文摘要：

基于移进一归约的句法分析系统具有线性的时间复杂度，因此在大规模句法分析任务中具有特别实际的意义。然而目前移进一归约句法分析系统的性能远低于领域内最好的句法分析器，例如，伯克利句法分析器。该文研究如何利用向上学习和无标注数据改进移进一归约句法分析系统，使之尽可能接近伯克利句法分析器的性能。我们首先应用伯克利句法分析器对大规模的无标注数据进行自动分析，然后利用得到的自动标注数据作为额外的训练数据改进词性标注系统和移进一归约句法分析器。实验结果表明，向上学习方法和无标注数据使移进一归约句法分析的性能提高了2．3％，达到82．4％。这个性能与伯克利句法分析器的性能可比。与此同时，该文最终得到的句法分析系统拥有明显的速度优势（7倍速度于伯克利句法分析器）。

英文摘要：

In practical applications such as parsing the Web, the shift-reduce parser is often preferred due to its linear time complexity. To be further comparable to the state-of-the-art parsers publicly available, this paper adopts the uptraining approach to improve the performance of the shift-reduce parser. The basic idea of uptraining is to apply a high-accuracy parser （such as the Berkeley parser used in this paper） to automatically analyze unlabeled data and then the new labeled data is applied as additional training data to build a POS tagger and the shift-reduce parser. Ex- perimental results on Penn Chinese Treebank show that the approach can improve the shift-reduce parsing to 82.4% （with an absolute improvement of 2.3%）, which is comparable to the Berkley parser on the same data and outperforms other state-of-the-art parsers.

同期刊论文项目