稀有类挖掘是数据挖掘的一个重要研究领域,具有广泛的应用背景.文中针对传统稀有类识别算法存在的缺陷,提出一种基于密度差异与簇间分离性判据相结合的稀有类识别算法(RDACS).该算法以特征权重相似度作为稀有类簇与周围数据样本问分离性的判据,并辅以积极学习的方法实现稀有类识别.在UCI公共数据集和KDD99数据集上的实验表明,与现有的同类算法相比,RDACS在询问次数指标上有较明显优势,能提高效率并减少人为误差,是现有稀有类识别方法的一种补充算法.
The rare category mining, which is an important research field in data mining, is widely applied. Aiming at the defects of the traditional rare category recognition methods, an rare category detection algorithm based on cluster separability (RDACS), is proposed based on the combination of density difference and inter-cluster separability criterion for rare category mining. An active-learning scenario is used to detect rare category. The similarity of feature weight is applied to the separability of rare category cluster and its surrounding samples. The experimental results on UCI public datasets and KDD99 datasets show that compared with the existing similar algorithms, the RDACS algorithm has an advantage in the number of inquiries, which can significantly improve the efficiency and reduce human errors. RDACS is complementary to the existing rare category recognition methods.