当前问答系统如“百度知道”、“SoSo问问”等在问句检索时没有考虑时效性要求,对时间敏感问句不能返回满足时效要求的结果.针对该问题,设计了时间敏感问句的识别和检索方法:首先依据时效要求对问句进行分类,识别出时间敏感问句,然后解析时间敏感问句的时效区间,最后根据解析结果对问句检索结果进行过滤,得到满足时效要求的结果.问句分类采用词法、句法和语义等特征,使用决策树、朴素贝叶斯、SVM等机器学习方法进行测试.问句的时效区间使用构造的时间域表达式计算获得.实验表明,使用C5.0决策树进行时间敏感问句的识别准确率达到0.901;与未考虑时间敏感问题的系统相比,时间敏感问句检索结果平均精度得到较大改善.
Currently, question-answering (Q&A) systems such as Baidu Zhidao, SoSo WenWen, etc., have been able to find out questions semantically relevant to most queries. However, for questions with time constraint, the performance of searching results is much worse than that of the queries without such constraint. To solve this problem, an automatical recognition and retrieval method for time-sensitive questions are proposed. At first, time-sensitive questions is recognized by using classification algorithms; next, time-range of the time-sensitive question is resolved; finally, the question search results are filtered by resolved time-range. To recognize time-sensitive questions, [exical, syntactic and semantic features are extracted; machine learning methods including the decision-tree, naiveBayes and SVM are employed; and AdaBoost algorithm is also adopted to solve the corpus imbalance issue. A resolving method is proposed to calculate question time-range. Based on those, a prototype system of question retrieval is used for validation, which is built from question and answer pairs of financial domain collected from Web. Experimental results show that, lay using the C5.0 decision tree algorithm, the precision of time-sensitive questions recognition reaches 0. 901; the mean average precision(MAP) of the retrieval result for time-sensitive questions is enhanced 0. 039 2 compared with SoSo WenWen, and is enhanced 0. 195 6 compared with Baidu Zhidao, increasing by 74.24% and 197.58% respectively. The average system response time of the question retrieval prototype system is 0. 628 7 s.