针对高维数据的聚类问题,提出一种基于间隔Fisher分析(MFA)的半监督聚类算法。该算法首先使用已标记样本进行MFA映射,得到投影矩阵W后,再利用求得的投影方法对未标记样本进行降维;然后在低维空间引入基于约束的球形K-means(PCSKM)算法对降维后的数据进行半监督聚类,根据第一次的聚类结果,交替进行降维与聚类操作,直到算法收敛为止。该算法利用监督信息有效地集成了数据降维和半监督聚类。实验结果表明,该方法能够有效处理高维数据,同时能提高聚类性能。
For the problem of clustering with high dimensional data,this paper presented a novel semi-supervised clustering approach based marginal fisher analysis(MFASSC).All the data were first projected to a low-dimensional space using marginal Fisher analysis(MFA) and then clustered by PCSKM in the projected space.The algorithm effectively utilized supervised information to integrate dimensionality reduction and semi-supervised clustering.According to the clustering results above,it conducted dimensionality reduction operations and clustering analysis alternately until convergence.Experimental results show MFASSC can effectively deal with the high-dimensional data and simultaneously improve the clustering performance.