针对在本机构的历史缺陷数据缺乏的情况下,如何合理利用跨机构的缺陷数据进行预测的问题,提出一种基于迁移的跨机构软件缺陷集成采样预测方法.首先利用跨机构的源缺陷数据和所要预测的目标缺陷数据共有的特征属性进行K—NN过滤,合理选择带标记的跨机构缺陷数据;然后,综合利用SMOTE过采样和K-means聚类降采样解决所选择的跨机构缺陷数据中的不平衡问题;最后,对平衡后的数据,进行集成投票训练,并在目标数据上验证预测分类的效果.实验结果表明,该缺陷预测方法能够在保持较高的查全率的同时,显著降低误报率,具有一定的实际指导测试过程的能力.
In the case of lack of within-company historical defect data,this paper presents a novel algorithm, which makes use of the cross-company data to build software defect prediction. Firstly,we utilize K-NN filter to calculate the distances between the source data and target data on the same attributes, then choose the top k sample as the similar data. After that, we use SMOTE and K-means clustering methods to balance the similar data. Lastly, we utilize multiple single classifiers to ensemble learning. Experimental results show that this algorithm has good performance, obtaining a higher true positive rate while significantly reducing the false alarm rate. It means that this method has some practical capability to guide testing process.