静态软件缺陷预测是软件工程数据挖掘领域中的一个研究热点.通过分析软件代码或开发过程,设计出与软件缺陷相关的度量元;随后,通过挖掘软件历史仓库来创建缺陷预测数据集,旨在构建出缺陷预测模型,以预测出被测项目内的潜在缺陷程序模块,最终达到优化测试资源分配和提高软件产品质量的目的.对近些年来国内外学者在该研究领域取得的成果进行了系统的总结.首先,给出了研究框架并识别出了影响缺陷预测性能的3个重要影响因素:度量元的设定、缺陷预测模型的构建方法和缺陷预测数据集的相关问题;接着,依次总结了这3个影响因素的已有研究成果;随后,总结了一类特殊的软件缺陷预测问题(即,基于代码修改的缺陷预测)的已有研究工作;最后,对未来研究可能面临的挑战进行了展望.
Static software defect prediction is an active research topic in the domain of software engineering data mining. The phases of the study include designing novel code or process metrics to characterize the faults in the program modules, constructing software defect prediction model based on the training data gathered after mining software historical repositories, using the trained model to predict potential defect-proneness of program modules. The research on software defect prediction can optimize the allocation of testing resources and improve the quality of software. This paper offers a systematic survey of existing research achievements of the domestic and foreign researchers in recent years. First, a research framework is proposed and three key factors (i.e., metrics, model construction approaches, and issues in datasets) influencing the performance of defect prediction are identified. Next, existing research achievements in these three key factors are discussed in sequence. Then, the existing achievements on a special defect prediction issues (i.e., code change based defect prediction) are summarized. Finally a perspective of the future work in this research area is discussed.