针对传统的图像检索方法在处理海量数据时面临的问题,提出一种基于改进的分布式K-Means特征聚类的海量场景图像检索方法。对分布式K-Means算法进行改进,优化了初始聚类中心的选择和迭代过程,并将其应用与场景图像的特征聚类中;充分利用Hadoop分布式平台的海量存储能力和强大并行计算能力,提出了海量场景图像的存储和检索方案,设计了场景图像特征提取、特征聚类以及图像检索三个阶段分布式并行处理的Map和Reduce任务。多组实验表明,提出的方法数据伸缩率曲线平缓,取得了优良的加速比,效率大于0.6,检索的平均准确率达到了88%左右,适合海量场景图像数据的检索。
Concerning that traditional image retrieval methods are confronted with the problems when processing massive data,we put forward a retrieval method for massive scene images,which is based on improved k-means feature clustering.We improved the distributed K-means algorithm,optimised the selection of initial cluster centres and the iteration procedure,and applied it to feature clustering of scene images.We made full use of the massive storage capacity and the powerful parallel computing ability of Hadoop distributed platform,proposed the storage and retrieval scheme on massive scene image,and designed the Map and Reduce tasks of three-phase distributed parallel processing on scene image with feature extraction,feature clustering and image retrieval.Sets of experiments demonstrated that the proposed method has gentle curve of data expansion rate,achieves good speedup ratio,the efficiency is greater than 0.6,and the average accuracy rate of retrieval reaches about 88%.The proposed scheme is suitable for large-scale scene image data retrieval.