面对日益增长的图像数据库,为用户提供一个简洁高效的搜索和浏览解决方案成为一个紧迫而且充满挑战的问题.图像聚类技术可以在许多方面为此提供帮助,例如图像数据预处理、用户界面设计.以及对搜索结果的聚类等.在众多聚类算法中,谱聚类(spectral clustering)方法由于能够解决复杂分布数据的聚类问题,以及接近全局最优的性能,成为近年来广受关注的一种方法.然而,目前存在的谱聚类方法,譬如normalized cut在处理新增数据点的聚类时,计算复杂度很高.提出了一种新的聚类算法——保局聚类.保局聚类在拥有许多非线性谱聚类方法优点的同时,又具有独特的数学特性——能提供显式的映射函数.这为在原数据集和新增数据集上进行高效的聚类提供了可能.实验结果显示,保局聚类比K均值聚类和主成分分析后的K均值聚类效果要好.实验同样显示,保局聚类与normalized cut效果可比,而前者更加高效.
It is important and challenging to make the growing image repositories easy to search and browse. Image clustering is a technique that helps in several ways, including image data preprocessing, the user interface design, and search result representation. Spectral clustering method has been one of the most promising clustering methods in the last few years, because it can cluster data with complex structure, and the (nearly) global optimum is guaranteed. However, the existing spectral clustering algorithms, like normalized cut (NCut), are difficult to use to handle data points out of training set. In this paper, a clustering algorithm named LPC (locality preserving clustering) is proposed, which shares many of the data representation properties of nonlinear spectral method. Yet the LPC provides an explicit mapping function, which is defined everywhere, on both training data points and testing points. Experimental results show that LPC is more accurate than both "direct Kmeans" and "PCA + Kmeans". It is also shown that LPC produces comparable results with NCut, yet is more efficient than NCut.