支持向量聚类(Support Vector Clustering,SVC)算法主要分为两个阶段:训练阶段和聚类分配阶段。由于需要计算邻接矩阵,聚类分配阶段消耗的计算时间远比训练阶段多。本文在计算邻接矩阵前先利用核矩阵对数据进行初始分类,在每个初始类中寻找一个代表点。因为代表点和它所在的初始类拥有相同的簇标号,所以只需计算这些代表点集上的邻接矩阵。给每个代表点分配一个簇标号,代表点所代表的初始类内的数据点也就获得相同的簇标号,这样将有效减少聚类分配的时间。数值实验结果表明,本文提出的改进SVC算法不仅显著改善了SVC算法的时间性能,而且在聚类精度上也有一定程度的提高。
The support vector clustering(SVC) algorithm consists of two phases,which are training and cluster assignment. Because of the calculation of the adjacency matrix,the latter phase needs to consume much more computing time than the former. To overcome the disadvantage,in this paper,before calculating the adjacency matrix,we first make use of the kernel matrix to decompose the given data set into a small number of disjoint groups where each group is represented by its representative point and all of its member points belong to the same cluster.Then label the representative points,which results in labeling the whole data points.This method significantly reduces the time of cluster assignment.Experimental results show that the improved SVC algorithm proposed in this paper outperforms classical SVC method,not only for time complexity,but also for clustering precision.