针对乳腺癌早期X摄片人为难以甄别的问题,提出了一种新的基于失衡数据挖掘的检测方法,为计算机辅助乳腺癌早期诊断提供一套有效的解决方案.首先,提出了基于聚类簇边界采样(CBS)的方法对数据集进行重采样,通过聚类密度阈值和边界密度阈值来更加科学、准确地确定聚类边界指导重采样.其次,引入集成学习思想有效调节数据失衡对SVM分类算法产生的影响.通过在佛罗里达大学的乳腺X摄片图像数据库中进行的对比实验表明该方法与传统方法比较,采用CBS前后的AUC值从0.577提升到0.717,再引入集成学习方法,AUC值提升到0.83.结果表明所提出的方法可以有效地检测出X摄片图像中异常的潜在钙化点,实现辅助医生提高乳腺癌早期诊断的成功率.
Aiming at the difficulty in the recognization of early breast cancer X radiography,this paper proposes a new detection method based on imbalance data mining,which provides an effective solution for computer aided diagnosis of early breast cancer.Firstly,we propose a method of cluster boundary sampling (CBS),which uses the clustering density threshold and boundary density threshold to determine the cluster boundaries,and guide the process of re-sampling more scientifically and accurately.Then,we adopt the ensemble learning to regulate the influence of data imbalance on SVM classification algorithm.Comparison experiments on the digital database for screening mammography (DDSM) from University of Florida show that compared with the tradition method,after adopting CBS method,the value of AUC increases from 0.577 to 0.717 ; and then by introducing ensemble learning method,the value of AUC increases to 0.83.The results show that the proposed method is able to detect the abnormal potential calcification points effectively and assist doctors to improve the success rate of breast cancer early diagnosis.