目标数据呈簇分布、基于超平面的单类分类器要求嵌入结构信息时,必须分别考虑各簇数据对超平面的影响,为此,提出可用于簇分布的结构大间隔单类分类器(structural large margin one-class classifier,SLMOCC)。该算法通过分别约束各簇数据到超平面的马氏距离,并最大化最小马氏间隔,保证目标数据落入正半空间的同时,充分利用数据的簇结构信息,通过序列二次锥规划优化方法线性搜索到最优超平面。为捕捉数据簇结构,SLMOCC采用凝聚型层次聚类并借助拐点确定聚类数目,最后通过人工数据和UCI数据集与相关算法比较,验证了SLMOCC的有效性。
In one-class classifier( OCC) design,considering the structure of the target data is a possible way to improve the generalization ability of the model.However,while the targets follow multi-cluster distributions,it is more reasonable to consider each cluster’s structure individually rather than just to treat all of them as a whole.The novel algorithm struc-ture large margin OCC( SLMOCC) fulfills the above strategy by restricting each data’s Mahalanobis distance to the hyper-plane.Through maximizing the minimum Mahalanobis margin,SLMOCC is able to find the more reasonable optimal hyper-plane attributed to its finer cluster granularity description compared with other alternatives.As for extracting the underlying data structure,this work adopts the Ward’s agglomerative hierarchical clustering on input data or data mapping in kernel space.Experimental results on toy data and UCI benchmark datasets have shown that SLMOCC outperforms other structural OCCs.