针对SIFT算法处理较大图像库的效率低和检索结果中图像排序不合理的问题,提出一种基于分布式并行计算的SIFT算法,在Spark平台下利用K-means算法对图像特征库进行聚类,将初始图像特征库划分成若干特征簇,每一个特征可由每一类库中的单位特征向量来表示,这样就可以高效地使用多特征的相似性度量进行比较图像间的相似度,即多特征代替单一特征度量来达到优化图像检索结果排序的效果,以改善用户体验。实验结果表明,与SIFT算法相比,改进的SIFT算法在性能上得到显著提高。
To overcome the problem that the SIFT algorithm handles large amount of images in lower efficiency and the sequence of the result images is unreasonable, proposes SIFT algorithm based on distributed parallel compute, the new algorithm utilizes K-means algorithm to cluster initial image feature library, divides these features into several feature clusters, every feature can be expressed by each unit feature vector in every cluster so that it can effectively use multi feature similarity measures compare the similarity between images, namely multi feature instead of a single feature to optimize the sort of the retrieved image set and improve the user's experience. Experimental results show that compared with the SIFT algorithm, the improved algorithm has been significantly improved in performance.