位置:成果数据库 > 期刊 > 期刊详情页
基于错误驱动算法组合分类器及其在问题分类中的应用
  • ISSN号:1000-1239
  • 期刊名称:《计算机研究与发展》
  • 时间:0
  • 分类:TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]复旦大学计算机科学系,上海200433
  • 相关基金:国家自然科学基金项目(60435020,60503070)
中文摘要:

开放领域问答系统(QA)能够给用户提供相对简洁、准确的结果,越来越受到人们的关注.问题分类把问题分成若干语义类型,是QA系统的一个重要的模块,它的准确性直接影响到QA系统的性能.为提高分类器性能,在问题分类任务中使用了集成学习方法,并且实验比较了词汇、句法、同义词集等不同的分类特征及错误驱动、投票法、BP神经网络等分类器集成方法.通过采用基于错误驱动集成分类器,用规则方法TBL作为统计方法SVM的补充;利用来自Wordnet的同义词集和名词的上位概念及Minipar的依存关系等语言知识作为分类特征,在公开测试集中取得了更高的分类精度.

英文摘要:

As a very active branch of natural language processing, open-domain question answering (QA) system has been attached increasing attention to, for it can understand the question in natural language, and thus provide its users with compact and exact results. Question classification (QC), i.e., putting the questions into several semantic categories, is very important for a question answering system and directly affects the performance of the QA system in selecting correct answers. Its main task is to understand the demand of users. In this paper, to investigate automatic question classification, different classification features, such as Bag-of-words, Bi-gram, synset from Wordnet and dependency structure from Minipar, are compared. Support vector machine (SVM) and such machine learning ensemble approaches as transformation-based error-driven learning (TBL), vote and back propagation artificial neural network (BP) are experimented on. Compared with single-feature SVM, multi-feature SVM classifiers and BP, vote ensemble learning means, and the question classification algorithm are presented in this paper. The method, by using combined multiple SVM-classifiers based on a TBL algorithm and with linguistic knowledge like synset from Wordnet and dependency structure from Minipar as question representations, is proved to be more accurate in open question classification corpus. And using dependency structure, a 1.8 % improvement over the no use of it is achieved.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《计算机研究与发展》
  • 中国科技核心期刊
  • 主管单位:中国科学院
  • 主办单位:中国科学院计算技术研究所
  • 主编:徐志伟
  • 地址:北京市科学院南路6号中科院计算所
  • 邮编:100190
  • 邮箱:crad@ict.ac.cn
  • 电话:010-62620696 62600350
  • 国际标准刊号:ISSN:1000-1239
  • 国内统一刊号:ISSN:11-1777/TP
  • 邮发代号:2-654
  • 获奖情况:
  • 2001-2007百种中国杰出学术期刊,2008中国精品科...,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 俄罗斯文摘杂志,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:40349