东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

汉语框架语义角色的自动标注

期刊名称：软件学报
时间：0
页码：597-611
语言：中文
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]山西大学计算中心,山西太原030006, [2]山西大学数学科学学院,山西太原030006, [3]山西大学计算机与信息技术学院,山西太原030006
相关基金：Supported by the National Natural Science Foundation of China under Grant No.60873128 （国家自然科学基金）; the National High- Teeh Research and Development Plan of China under Grant No.2006AA01Z142 （国家高技术研究发展计划（863））
相关项目：汉语框架语义角色自动标注技术研究

关键词：汉语框架语义知识库, 语义角色标注, 正交表, 特征选择, 条件随机场, Chinese FrameNet, semantic role labeling, orthogonal array, feature selection, conditional random fields

中文摘要：

基于山西大学自主开发的汉语框架语义知识库（CFN），将语义角色标注问题通过IOB策略转化为词序列标注问题，采用条件随机场模型，研究了汉语框架语义角色的自动标注．模型以词为基本标注单元，选择词、词性、词相对于目标词的位置、目标词及其组合为特征．针对每个特征设定若干可选的窗口，组合构成模型的各种特征模板，基于统计学中的正交表，给出一种较优模板选择方法．全部实验在选出的25个框架的6692个例句的语料上进行．对每一个框架，分别按照其例句训练一个模型，同时进行语义角色的边界识别与分类，进行2-fold交叉验证．在给定句子中的目标词以及目标词所属的框架情况下，25个框架交叉验证的实验结果的准确率、召回率、F1-值分别达到74．16％，52．70％和61．62％．

英文摘要：

Based on the semantic knowledge base of Chinese FrameNet （CFN） self-developed by Shanxi University, automatic labeling of the semantic roles of Chinese FrameNet is turned into a sequential tagging problem at word-level by applying IOB （inside/outside/begin） strategies to the exemplified sentences in CFN corpus, and the Conditional Random Fields （CRF） model is adopted. The basic unit of tagging is word. The word, its part of speech, its relative position to the target word, the target word, and their combination are chosen as the features. Various model templates are formed through optional size windows in each feature, and the orthogonal array within statistics is employed for screening of the better template. All experiments are based on the6 692 exemplified sentences of 25 frames selected from CFN corpus. The separate model is trained for each frame on its exemplified sentences by 2-fold cross-validation, and the processing of identification and classification for the semantic roles are taken simultaneously. Finally, with the target word given in a sentence, as well as the frame name of the target word, the experimental results on all 25 frames data for the precision, the recall, and Fl-measure are 74.16%, 52.70%, 61.62%, respectively.

同期刊论文项目