位置:成果数据库 > 期刊 > 期刊详情页
中文问答系统中时间敏感问句的识别和检索
  • ISSN号:1000-1239
  • 期刊名称:计算机研究与发展
  • 时间:2013
  • 页码:2612-2620
  • 分类:TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术] TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
  • 作者机构:[1]哈尔滨工业大学深圳研究生院网络环境智能计算重点实验室,广东深圳518055
  • 相关基金:国家自然科学基金面上项目(61272383,61173075)
  • 相关项目:网络信息自主整合关键技术研究
中文摘要:

当前问答系统如“百度知道”、“SoSo问问”等在问句检索时没有考虑时效性要求,对时间敏感问句不能返回满足时效要求的结果.针对该问题,设计了时间敏感问句的识别和检索方法:首先依据时效要求对问句进行分类,识别出时间敏感问句,然后解析时间敏感问句的时效区间,最后根据解析结果对问句检索结果进行过滤,得到满足时效要求的结果.问句分类采用词法、句法和语义等特征,使用决策树、朴素贝叶斯、SVM等机器学习方法进行测试.问句的时效区间使用构造的时间域表达式计算获得.实验表明,使用C5.0决策树进行时间敏感问句的识别准确率达到0.901;与未考虑时间敏感问题的系统相比,时间敏感问句检索结果平均精度得到较大改善.

英文摘要:

Currently, question-answering (Q&A) systems such as Baidu Zhidao, SoSo WenWen, etc., have been able to find out questions semantically relevant to most queries. However, for questions with time constraint, the performance of searching results is much worse than that of the queries without such constraint. To solve this problem, an automatical recognition and retrieval method for time-sensitive questions are proposed. At first, time-sensitive questions is recognized by using classification algorithms; next, time-range of the time-sensitive question is resolved; finally, the question search results are filtered by resolved time-range. To recognize time-sensitive questions, [exical, syntactic and semantic features are extracted; machine learning methods including the decision-tree, naiveBayes and SVM are employed; and AdaBoost algorithm is also adopted to solve the corpus imbalance issue. A resolving method is proposed to calculate question time-range. Based on those, a prototype system of question retrieval is used for validation, which is built from question and answer pairs of financial domain collected from Web. Experimental results show that, lay using the C5.0 decision tree algorithm, the precision of time-sensitive questions recognition reaches 0. 901; the mean average precision(MAP) of the retrieval result for time-sensitive questions is enhanced 0. 039 2 compared with SoSo WenWen, and is enhanced 0. 195 6 compared with Baidu Zhidao, increasing by 74.24% and 197.58% respectively. The average system response time of the question retrieval prototype system is 0. 628 7 s.

同期刊论文项目
期刊论文 23 会议论文 16
同项目期刊论文
期刊信息
  • 《计算机研究与发展》
  • 中国科技核心期刊
  • 主管单位:中国科学院
  • 主办单位:中国科学院计算技术研究所
  • 主编:徐志伟
  • 地址:北京市科学院南路6号中科院计算所
  • 邮编:100190
  • 邮箱:crad@ict.ac.cn
  • 电话:010-62620696 62600350
  • 国际标准刊号:ISSN:1000-1239
  • 国内统一刊号:ISSN:11-1777/TP
  • 邮发代号:2-654
  • 获奖情况:
  • 2001-2007百种中国杰出学术期刊,2008中国精品科...,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 俄罗斯文摘杂志,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:40349