准确地检测出近似重复图像对于冗余去除和版权侵犯检测具有重要的意义。为了改善基于均匀分裂外部支持向量机聚类算法的性能,提出了一种结合贪婪树和外部支持向量机的近似重复图像聚类算法。该方法先利用外部支持向量机将数据集聚为两类,然后采用贪婪树生长算法选择"最优"的类进行分解,重复上述过程直到不可分为止。此外,为了克服图像视觉单词的同义性问题,利用概率潜在语义分析模型将同现的图像视觉单词映射到潜在语义空间中的同一方向上。实验结果表明,与内部支持向量聚类算法和基于均匀分裂的外部支持向量机聚类算法相比,该方法在聚类性能方面有了明显的提高。
Detecting near-duplicate images accurately is very important for redundancy removal and copyright infringement detection.To improve the performance of Uniform Splitting based Support Vector Machine External Clustering(US-SVMEC),an near-duplicate image clustering algorithm which combines Greedy Tree with SVMEC(GT-SVMEC) is proposed in this paper.Firstly,SVMEC is applied to cluster the dataset into two clusters.Then,greedy tree growing algorithm is used to choose the "best" cluster to split.Repeat above procedure until no improvement can be achieved.In addition,to overcome the problem of visual word synonymy,Probabilistic Latent Semantic Analysis(PLSA) model is adopted to map the co-occurring image visual words to the same direction in the latent semantic space.Experimental results show that compared with SVM-Internal Clustering(SVMIC) and US-SVMEC,our proposed approach improves the clustering performance obviously.