针对传统单类学习模型对多模态或多密度分布数据描述能力不足的问题,将集成聚类和聚类稳定性分析引入单类学习.首先将确定聚类簇个数与确定聚类簇分布统一到同一个增强单类学习框架中,之后各聚类簇互为正负类分别建起立多个单类分类模型,最后采用最大融合体积方法融合其决策边界.以经典的支持向量数据描述(SVDD)为例,设计了基于集成聚类的稳定支持向量数据描述算法——ECS-SVDD.在标准UCI数据集和一个真实恶意程序行为数据集上的实验结果表明,ECS-SVDD的性能较单个支持向量数据描述及同类单类学习方法更优.该方法可直接推广到其他最小包含体积集合类型的单类学习算法上,以增强单类学习算法处理多模态和多密度分布数据的能力.
Conventional one-class learning models perform poorly when data are multi-modal or multidensity. To address this problem, ensemble clustering and clustering stability analysis for one class learning are introduced. Firstly, identifying the number of clusters and their distributions are unified in one enhancing framework. Then multiple one-class learning models are constructed to describe clusters of the target class. Lastly these one-class learning models are fused following the maximum fusion volume method. Using classic support vector data description (SVDD) as an instance of one-class learning algorithm, an ensemble cluster based stable SVDD, ECS-SVDD, is proposed. Experimental results on UCI benchmark datasets and a real-world malware detection dataset show that the ECS-SVDD outperforms the single SVDD and some other related one-class learning algorithms. Besides, the method proposed can also enhance the abilities of handling multi-modal and multi-density data of other one-class learning algorithms that follow the volume set minimizing scheme.