针对场景分类问题中,传统的"词包"模型不包含图像的上下文信息,且没有考虑图像特征间的类别差异问题,本文提出一种多方向上下文特征结合空间金字塔模型的场景分类方法。该方法首先对图像进行均匀网格分块并提取尺度不变(SIFT)特征,对每个局部图像块分别结合其周围三个方向的空间相邻区域,形成三种上下文特征;然后,将每类训练图像的上下文特征分别聚类形成视觉词汇,再将其连接形成最终的视觉词汇表,得到图像的视觉词汇直方图;最后,结合空间金字塔匹配算法形成金字塔直方图,并采用SVM分类器来进行分类。该方法将图像块在特征域的相似性同空间域的上下文关系有机地结合起来并加以类别区分,从而形成了具有更好区分力的视觉词汇表。在通用场景图像库上的实验表明,相比传统方法具有更好的分类性能。
The traditional bag of words model for scene classification doesn't consider the context information of images and the category differences between image features,a scene classification method based on multi-direction context features and spatial pyramid model is presented to solve this problem.At first,the images are divided into patches by a regular grid,and their scale invariant features (SIFT) are extracted,for each local image patch,its three context features are formed by combining the features from its neighboring regions in three directions respectively.After that,the visual words are formed by clustering the context features from different categories separately and collated to form the final codebook,then the visual words histogram of images are obtained in the second step. At last,pyramid histogram of visual words are obtained by spatial pyramid matching and classified by support vector machine(SVM). According to different scene categories,this method combines the feature similarity and contextual relation together,which makes the codebook more discriminative.Experiments in common scene image databases show that this method performs better than the existed methods.