将概率潜语义分析模型(PLSA)应用于高光谱影像聚类,提出一种基于语义信息的影像聚类方法。首先,利用ISODATA算法获取影像的初次聚类结果,从而形成PLSA模型中的视觉词;其次,利用影像分割算法对高光谱影像进行分割,并将分割体作为PLSA模型的文档;再次,利用多种最佳聚类类别数估计方法确定PLSA模型的潜语义主题的个数;进而估计PLSA模型的参数,获得概率主题内视觉词的概率分布和每个分割体中各概率主题的混合比例;最后利用统计模式识别方法获取每个影像文档中各个视觉词对应的潜语义主题的类型,从而实现影像的层次聚类分析。相关实验结果表明,本文的层次聚类结果较K—MEANS算法、ISODATA算法聚类结果的面向对象特性更明显,其与真实地物的空间分布更接近。
The paper introduces the Probabilistic Latent Semantic Analysis (PLSA) to the image clustering and an effective im- age clustering algorithm using the semantic information from PLSA is proposed which is used for hyperspectral images. Firstly, the ISODATA algorithm is used to obtain the initial clustering result of hyperspeetral image and the clusters of the initial cluste- ring result are considered as the visual words of the PLSA. Secondly, the object-oriented image segmentation algorithm is used to partition the hyperspectral image and segments with relatively pure pixels are regarded as documents in PLSA. Thirdly, a va- riety of identification methods which can estimate the best number of cluster centers is combined to get the number of latent se- mantic topics. Then the conditional distributions of visual words in topics and the mixtures of topics in different documents are estimated by using PLSA. Finally, the conditional probabilistie of latent semantic topics are distinguished using statistical pat- tern recognition method, the topic type for each visual in each document will be given and the clustering result of hyperspectral image are then achieved. Experimental results show the clusters of the proposed algorithm are better than K-MEANS and ISO- DATA in terms of object-oriented property and the clustering result is closer to the distribution of real spatial distribution of surface.