东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于采样策略的主动学习算法研究进展

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：2012.6.6
页码：1162-1173
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001
相关基金：国家自然科学基金项目（61171185,60932008,60832010）;中国博士后科学基金特别资助项目（201003446）
相关项目：大豆RNA结构与进化分析的信息处理方法研究

关键词：机器学习, 主动学习, 采样策略, 标记代价, 样例选择, machine learning, active learning, sampling strategy, labeling cost, instances selection

中文摘要：

主动学习算法通过选择信息含量大的未标记样例交由专家进行标记，多次循环使分类器的正确率逐步提高，进而在标记总代价最小的情况下获得分类器的强泛化能力，这一技术引起了国内外研究人员的关注．侧重从采样策略的角度，详细介绍了主动学习中学习引擎和采样引擎的工作过程，总结了主动学习算法的理论研究成果，详细评述了主动学习的研究现状和发展动态．首先，针对采样策略选择样例的不同方式将主动学习算法划分为不同类型，进而，对基于不同采样策略的主动学习算法进行了深入地分析和比较，讨论了各种算法适用的应用领域及其优缺点．最后指出了存在的开放性问题和进一步的研究方向．

英文摘要：

The classifier in active learning algorithms is trained by choosing the most informative unlabeled instances for human experts to label. In the cycling procedure, the classification accuracy of the model is improved, and then the classifier with high generalization capability is obtained by minimizing the totally labeling cost. Active learning has attracted attentions of researchers both at home and abroad widely. It is pointed out that the active learning technique is a very important research at present. In this paper, the active learning algorithms are introduced by putting a particular emphasis on the sampling strategies. The iterative processes of the learning engine and the sampling engine are described in detail. The existing theories of active learning are summarized. The recent work and the development of active learning are discussed, including their approaches and corresponding sampling strategies. Firstly, the active learning algorithms are categorized into three main classes according to different ways of selecting the examples. And then, the sampling strategies are summarized by analyzing their correlations. The advantages and the shortcomings of sampling strategies are discussed and compared deeply within real applications. Finally the open problems which are still remained, and the interests of active learning in future research are forecasted.

同期刊论文项目