在软件缺陷预测研究中,若考虑了大量度量元会造成数据集中含有大量特征,其中冗余特征和无关特征会降低缺陷预测模型的性能。提出一种两阶段混合特征选择方法 HFS,具体来说,首先基于特征子集评估器移除已有特征集中的无关特征和冗余特征,随后基于特征排序评估器进一步移除其中的无关特征。在实证研究中,以基于实际开发项目的数据集作为评测对象,以NONE、CFS和CAR三种方法作为与HFS方法比较的经典方法。最终基于三种不同类型的分类器(包括决策树法、支持向量机和最近邻法)上,发现HFS方法不仅能够选出更小规模的特征子集,而且在大部分情况下,尤其以决策树作为分类器时,能够有效提高缺陷预测模型的性能。
In software defect prediction,some datasets may have many features as the consider different metrics. Irrelevant and redundant features can influence the effectiveness of defect prediction models. This paper proposed a novel two-stage hybrid feature selection( HFS) approach. In particular,it firstly applied a feature subset evaluator to remove irrelevant and redundant features. Then it applied a feature ranking evaluator to further remove irrelevant features. In empirical studies,it chose datasets from real development projects and used NONE,CFS,and CAR as this baseline approaches. Based on different classifiers( such as decision tree,support vector machine,and nearest neighbors),HFS can not only produces smaller subset of features,but also can improve the performance of software defect prediction models in most cases,especially on decision tree.