东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

多分类器集成的汉语词义消歧研究

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：0
页码：1354-1361
语言：中文
分类：TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]北京大学信息科学技术学院,北京100871, [2]北京大学软件与微电子学院,北京102600
相关基金：国家自然科学基金项目（60703063）;国家社会科学基金项目（08CYY016）;国家“八六三”高技术研究发展计划基金项目（2007AA012198）;国家“九七三”重点基础研究发展规划基金项目（2004CB318102）
相关项目：基于词语独异性特征的大规模词义标注语料库自动构建研究

关键词：词义消歧, 多分类器集成, 均值, 最大值, word sense disambiguation, ensemble of classifiers, average, max

中文摘要：

词义消歧长期以来一直是自然语言处理中的热点和难题，集成方法被认为是机器学习研究的四大趋势之一．系统研究了9种集成学习方法在汉语词义消歧中的应用．9种集成方法分别是乘法规则、均值、最大值、最小值、多数投票、序列投票、加权投票、概率加权和单分类器融合，其中乘法规则、均值、最大值3种集成方法还未曾应用于词义消歧．选取支持向量机模型、朴素贝叶斯和决策树作为3个单分类器．在两个不同的数据集上进行了实验，其一是选自现代汉语语义标注语料库的18个多义词，其二是国际语义评测SemEval-2007的中英文对译选择词消歧任务．实验结果显示，首次在词义消歧中引入应用的3种集成方法乘法、均值、最大值有良好的性能表现，3种方法的消歧准确率均高于最佳单分类器SVM，而且优于其他6种集成方法．

英文摘要：

Word sense disambiguation has long been a central concern for natural language processing, and ensemble of classifiers is one of the four current directions in machine learning study. This paper makes a systematic study on the ensembles of classifiers for Chinese word sense disambiguation. Nine kinds of combining strategies are experimented in this paper： product, average, max, rain, majority voting, rank-based voting, weighted voting, weighted probability, and best single combining, among which the three combining methods of product, average and max have not been applied in word sense disambiguation in previous works. Support vector machine, naYve Bayes, and decision tree are selected as the three component classifiers. Four kinds of features are used in all of the three classifiers： bag of words, words with position, parts of speech with position and 2-gram collocations. Experiments are conducted in two different datasets： the first dataset is 18 ambiguous words selected from Chinese semantic corpus, and the second dataset is the multilingual Chinese-English lexical sample task at SemEval-2007. The experimental results illustrate that the three kinds of combining strategies of average, product and max, which are applied for the first time in Chinese word sense disambiguation in this paper, exceed the accuracy of best single classifier support vector machine, and also outperform the other six kinds of combining methods.

同期刊论文项目