多视角主动学习是一种相比于传统主动学习能够取得更大程度版本空间缩减的技术,已被应用于多种类型的大数据分析中.本文针对现有的多视角主动学习算法在分类假设生成和采样策略中存在的不足分别提出了相应的改进方案.本文将Boosting思想应用到多视角主动学习框架中,通过将历史上各次查询得到的分类假设进行加权式投票来实现每次查询后分类假设的强化;与此同时,还提出了一种自适应的分级竞争采样策略,当分类争议样本规模较大时通过无监督谱聚类获得上述样本的空间分布描述,并在各个聚类中结合样本的分类不确定度和冗余度信息通过二次规划求解以获得可靠的批处理采样.为了证明上述改进的有效性,本文将多视角主动学习应用到图像分类领域中,并通过基于不同图像特征的视角来分别生成相应的分类假设.实验表明,本文提出的两点改进策略不仅均有助于提升多视角主动学习的性能,而且基于上述不同视角随机组合的多视角主动学习方法相比于经典的单视角主动学习算法能够更快地实现收敛并达到较高的场景分类准确性.
Multi-viewactive learning is a technique which can realize more significant reduction on version space than traditional active learning and has been used in large-scale data analysis. This paper proposes two improvements in both hypothesis generation and sampling strategy. We integrate boosting-like idea into the active learning framework which uses the weighted voting of all hypothetic outputs from the past queries. Furthermore,a novel adaptive hierarchical competition sampling is presented. In this sampling strategy,if the number of the contention samples is large,an unsupervised spectral clustering is activated to obtain the coarse distribution of these contention samples in the feature space and then both the classification uncertainty and redundancy measures are considered in each cluster to query the unlabeled samples in batch mode by solving quadratic programming. We apply multi-viewactive learning in image classification in order to prove the effectiveness of the improvements and different image features are used as views to generate the corresponding hypothesis. The experiments demonstrate that our two proposals can both efficiently improve the performance of the multi-viewactive learning and the random combination of these views can also obtain faster convergence and better classification accuracy than state-of-theart single-viewactive learning algorithms.