针对现有子空间聚类方法处理类簇问存在重叠时聚类准确率较低的问题,文中提出基于概率模型的重叠子空间聚类算法.首先采用混合范数的子空间表示方法将高维数据分割为若干个子空间.然后使用服从指数族分布的概率模型判断子空间内数据的重叠部分,并将数据分配到正确的子空间内,进而得到聚类结果,在参数估计时利用交替最大化方法确定函数最优解.在人造数据集和UCI数据集上的测试实验表明,文中算法具有良好的聚类性能,适用于较大规模的数据集.
Due to the low clustering accuracy of the existing subspace clustering methods in dealing with the problem of overlapping clusters, an overlapping subspace clustering algorithm based on probability model (OSCPM) is proposed. Firstly, the high-dimensional data is divided into several subspaces by using the subspace representation of mixed-norm. Then, a probability model of the exponential family distribution is used to determine the overlapping part of the clusters in the subspace, and the data is assigned to the correct class clusters to get the clustering results. An alternating maximization method is used to determine the optimal solution of the objective function in the process of parameter estimation. Experimental results on artificial datasets and UCI datasets show that OSCPM produces better clustering performance compared with other algorithms and it is suitable for large scale datasets.