在问答系统问句分类研究中,对问句特征进行组合有助于构造高效的问句分类器.针对当前问句分类中的特征组合问题,提出一种基于差异性和重要性的特征组合(Diversity and Importance based Feature Combination,DIFC)方法.通过计算待组合特征与当前特征组合的错分差异度和正分差异度,以及待组合特征本身的重要度,从候选特征集中动态获取优化的特征组合.在哈工大中文问句集上对词袋绑定特征进行组合的实验结果表明,与其他特征组合方法相比,DIFC方法灵活高效,准确率更高.
In research on question classification in question answering system, combining features can greatly help construct efficient question classifier. In order to deal with the problem of low performance of existing methods, a new method of diversity and importance based feature combination(DIFC) is proposed. By calculating the diversity between candidate feature and current combination for error and correct classification respectively, and the importance of candidate feature, features can be dynamically selected from candidate feature set. The experimental results of bag - of- words binding features on the HIT Chinese question set show that, compared with other methods, the new method is flexible and efficient, and gets more optimal feature combination.