作为数据挖掘的一项重要任务,离群点检测已经引起人们的广泛关注.本文基于粗糙集理论来讨论离群点的定义与检测问题,提出了一种新的离群点定义——粗糙序列离群点以及相应的离群点检测算法RSOD.该算法利用粗糙集理论中的知识熵和属性重要性等概念来构建三种类型的序列,并通过分析序列中元素的变化情况来检测离群点.在UCI标准数据集上,将RSOD算法与现有的离群点检测算法进行了比较分析,实验结果表明,我们所提出的离群点检测方法是有效的.
As an important task of data mining,outlier detection has attracted much attention.We discuss the issues of outlier definition and detection based on rough set theory.We propose a new definition for outlier-rough sequence outlier,and the corresponding outlier detection algorithm RSOD.The algorithm constructs three kinds of sequences exploiting the notions of knowledge entropy and significance of attribute in rough sets,and detects outliers by analyzing changes of the elements in the sequences.We compare algorithm RSOD with the current outlier detection algorithms on UCI data sets.And experimental results show that our method is effective for outlier detection.