子空间聚类能有效的发现各簇与所属于的子空间的联系,同时减少高维数据中因为数据冗余和不相关属性对聚类过程产生的干扰.已有的子空间聚类方法强调各子空间中簇的发现,往往忽略子空间的划分.提出了基于属性最大间隔的子空间聚类,该方法主要思想是对子空间的划分时信息的丢失达到最小,从而子空间聚类的结果的效果比较好.主要工作包括:第一,建立了子空间划分的目标函数,也就是使各划分的子空间相互依赖达到最小,第二,设计了基于属性最大间隔的子空间聚类算法Maximum Margin Subspace Clustering(MMSC)进行子空间聚类集成.最后,采用UCI和NIPS2013比赛等数据来做实验,结果表明,在大多数数据上MMSC算法比其他子空间算法能得到更好的聚类结果.
Subspace Clustering can effectively discover the relationship between clusters and the subspaces,and it can reduce the interference caused by data redundancy and unrelated features in high dimensional datasets.Existing Subspace Clustering algorithms focus on the detection of clusters in subspace,while the division of subspace is ignored.This paper proposed a Subspace Clustering method based on features maximum margin,and its main idea is that minimum information is lost during the divide of subspaces,so the results of subspace clustering are better.There are two works in this paper.Firstly,the objective function of the subspace division is stated,and it makes the dependence of different subspaces to be minimum.Secondly,Subspace Clustering algorithm Maximum Margin Subspace Clustering(MMSC)based on maximum margin is designed for Subspace Clustering ensemble.At last,UCI and NIPS2013 competition datasets are used for experiments and the results show that MMSC algorithm on most datasets performs better results than other Subspace Clustering algorithms.