软件缺陷预测是改善软件开发质量,提高测试效率的重要途径.文中提出一种基于软件度量元的集成k-NN软件缺陷预测方法.首先,该方法在不同的Bootstrap抽样数据集上迭代训练生成一个基本k-NN预测器集合.然后,这些基本预测器分别对软件模块进行独立预测,各基本预测值将被融合生成最终的预测结果.为判别新的软件模块是否为缺陷模块,设计分类阈值的自适应学习方法.集成预测结果大于该阈值的模块将被识别为缺陷模块,反之则为正常模块.NASA MDP及PROMISE AR标准软件缺陷数据集上的实验结果表明集成k-NN缺陷预测的性能较之广泛采用的对比缺陷预测方法有较明显的提高,同时也证明软件度量元在缺陷预测中的有效性.
Timely identification of defective modules improves both software quality and testing efficiency. A software metrics-based ensemble k-NN algorithm is proposed for software defect prediction. Firstly, a set of base k-NN predictors is constructed iteratively from different bootstrap sampling datasets. Next, the base k-NN predictors estimate the software module independently and their individual outputs are combined as the composite result. Then, an adaptive threshold training approach is designed for the ensemble to classify new software modules. If the composite result is greater than the threshold value, the software module is recognized as defective, otherwise as normal. Finally, the experiments are conducted on NASA MDP and PROMISE AR datasets. Compared with a widely referenced defect prediction approach, the results show the considerable improvements of the ensemble k-NN and prove the effectiveness of software metrics in defect prediction.