为了更有效地进行图片检索,提出了一种面向Web2.0协作标签系统的图片检索聚类方法。该算法首先针对标签空间由于标签表达多样性带来的不一致问题,并通过挖掘标签间的词汇关系实现语义级查询扩展来得到语义可能相关的扩展图片结果集;然后根据标签间的相关度度量选出图片结果集中与查询标签高相关的标签集,接着采用一种自顶向下启发式的图划分算法来自动对次相关标签集进行分类。最后图片结果集即根据标签分类结果被聚类。为验证该方法的效果,从标签图片共享网站Flickr上随机下载了大量真实图片集以及所含带的标签元数据,在已实现的图片检索原型系统PivotBrowser上进行了大量实验,结果证明,该聚类算法能有效解决标签空间存在的标签表达不一致问题和标签查询歧义性问题,能提供更满意的用户检索。
In this paper, we propose a novel image clustering algorithm for effective image retrieval in Web2.0 tag-space. Different users may use different tags to describe the same object, causing inconsistency in tagging. Our algorithm capture the semantically similar tags to perform query expansion, and retrieve the candidate images which are possibly relevant to the query. The candidate tags can be shortlisted according to their tag relevances to the query tags. The shortlisted tags are then clustered on-the-fly using a graph partitioning algorithm. The candidate images are clustered based on the tag cluster results. The proposed algorithm is implemented in a prototype system called PivotBrowser. Experiment results performed on a large scale images that random downloaded from Flickr reveal that our proposal effectively address the inconsistency and ambiguity problems in tag-space image retrieval, and provide improved user satisfactory.