现有的半监督分类方法由于时间复杂度较高等原因无法用于稍大规模的图像分类.该文根据聚类假设,通过寻找标签在图中进行传播的最主要路径,即最小代价路径,提出了最小代价路径标签传播算法(Minimum Cost Path Label Propagation,MCPLP).该算法通过变形的最小生成树得到无标记样本到标记样本间的最小代价路径,使标记沿着节点间代价最小的路径传播来实现分类,每个节点仅需被传播一次就能得到它们的标记.同时发现本文算法以及其他这类基于图的标签传播半监督分类方法由于构建的稀疏图存在图的连通性问题,导致可能出现标签不能被传播到所有节点,即存在数据不能被分类的情况.我们研究了图的双向不连通问题和图的单向不连通问题(非对称图),提出构建稀疏对称矩阵增强图的连通性以及对未分类数据进行再次分类的方法,解决由连通性带来的数据不能被全部分类的问题.分析及实验结果表明提出的MCPLP算法不仅具有较低的时间复杂度,而且有较高的分类正确率.通过对大规模图像的分类实验,验证了MCPLP算法同样适合于大规模的图像数据分类.
The graph-based semi-supervised classification methods available are not suitable for large-scale images due to their high time complexities.Motivated by cluster assumption,we propose a minimum cost path label propagation (MCPLP)algorithm.The method propagates label by the main path which is called minimum cost path between nodes.The method uses an improved minimum spanning tree algorithm to produce the minimum cost path between unlabeled and labeled nodes.Then propagates label along the minimum cost path,and the nodes can get their label by propagating only once.We find there is connectivity problem comes from sparse graph existing in the proposed and other graph-based label propagation semi-supervised classification methods.This problem may makes label cannot be propagated to some node,which means some data cannot be classified.We research the disconnected problem in any way on graph (unconnected graph)and the problem that the labeled nodes connect the unlabeled nodes in only one direction on graph (Asymmetry graph).In order to solve these problems,we present a sparse symmetric graph construction method to improve the connectivity of the graph and a strategy of re-classify the data in the disconnected regions.Then all of the data can be classified.The analysis and experiments demonstrate that the MCPLP method not only possesses low classification time complexity,but also obtains high classification accuracy.There are some benchmark experiments on the MCPLP to show its higher accuracy and less time cost.The experiments for the large-scalenbsp;image data show that the proposed method is suitable for large-scale data classification.