东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于分治策略的组块分析

ISSN号：1003-0077
期刊名称：中文信息学报
时间：2012
页码：120-128
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]沈阳航空航天大学知识工程中心,辽宁沈阳110136
相关基金：国家自然科学基金资助项目（60842005）
相关项目：潜在语义分析中特征传递优化技术的研究

关键词：汉语组块分析, 分治策略, 句法分析, 最长名词短语, 条件随机场, 支持向量机, Chinese chunking, divide-and-conquer, complete syntactic parsing, maximal noun phrase, conditionalrandom fields, support vector machines

中文摘要：

组块分析的主要任务是语块的识别和划分，它使句法分析的任务在某种程度上得到简化。针对长句子组块分析所遇到的困难，该文提出了一种基于分治策略的组块分析方法。该方法的基本思想是首先对句子进行最长名词短语识别，根据识别的结果，将句子分解为最长名词短语部分和句子框架部分；然后，针对不同的分析单元选用不同的模型加以分析，再将分析结果进行组合，完成整个组块分析过程。该方法将整句分解为更小的组块分析单元，降低了句子的复杂度。通过在宾州中文树库CTB4数据集上的实验结果显示，各种组块识别结果平均F1值结果为91.79％，优于目前其他的组块分析方法。

英文摘要：

Chunking ipcludes identification and labeling of chunks, which is a way to reduce the difficulty of complete syntactic parsing through segmenting a sentence into small chunking parts. In order to reduce the complexity of long sentence chunking, a divide-and-conquer strategy is described in this paper. The basic idea of this method is to first recognize the maximal noun phrases （MNP） form a full sentence; then identify the chunks within the MNPs and a- mong the frame of the sentence without MNPs ~. Experiments are carried out on the data set of UPenn Chinese Treebank-4 （CTB4） and the results show the the best of overall F1 score of Chinese chunking is 91. 79%, which is higher than the performance produced by the state-of-the-art machine learning models.

同期刊论文项目