东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于Hadoop架构的数据驱动的SVM并行增量学习算法

ISSN号：1001-9081
期刊名称：《计算机应用》
时间：0
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]天津大学计算机科学与技术学院,天津300350, [2]天津市认知计算与应用重点实验室(天津大学),天津300350
相关基金：国家自然科学基金资助项目（61170177）;国家863计划重点项目（2015AA020101）;国家973计划项目（2013CB32930X）.

关键词： HADOOP, HBase, 支持向量机, 增量学习, 集成学习, 遗忘因子, 控制器组件, Hadoop, HBase, Support Vector Machine （SVM）, incremental learning, ensemble leaning, forgetting factor, controller component

中文摘要：

针对传统支持向量机（SVM）算法难以处理大规模训练数据的困境,提出一种基于Hadoop的数据驱动的并行增量Adaboost-SVM算法（PIASVM）。利用集成学习策略,局部分类器处理一个分区的数据,融合其分类结果得到组合分类器;增量学习中用权值刻画样本的空间分布特性,对样本进行迭代加权,利用遗忘因子实现新增样本的选择及历史样本的淘汰;采用基于HBase的控制器组件用以调度迭代过程,持久化中间结果并减小MapReduce原有框架迭代过程中的带宽压力。多组实验结果表明,所提算法具有优良的加速比、扩展率和数据伸缩度,在保证分类精度的基础上提高了SVM算法对大规模数据的处理能力。

英文摘要：

Traditional Support Vector Machine （SVM） algorithm is difficuh to deal with the problem of large scale training data, an efficient data driven Parallel Incremental Adaboost-SVM （PIASVM） learning algorithm based on Hadoop was proposed. An ensemble system was used to make each classifier process a partition of the data, and then integrated the classification results to get the combination classifier. Weights were used to depict the spatial distribution prosperities of samples which were to be iteratively reweighted during the incremental training stage, and forgetting factor was applied to select new samples and eliminate historical samples. Also, the controller component based on HBase was used to schedule the iterative procedure, persist the intermediate results and reduce the bandwidth pressure of iterative MapReduce. The experimental results on multiple data sets demonstrate that the proposed algorithm has good performance in speedup, sizeup and scaleup, and high processing capacity of large-scale data while guaranteeing high accuracy.

同期刊论文项目

基因表达与调控组织特异性模式发现及评价机制研究

期刊论文 25 会议论文 13

同项目期刊论文

基于时序互信息构建基因调控网络

基于模型融合的分布式贝叶斯网络学习算法

Knowledge Enrichment Analysis for Human Tissue-Specific Genes Uncover New Biological Insights

基于OLA的K匿名算法的改进

Mining Topological Structures of PPI Networks for Human Brain Specific Genes

Computational Analyses of Simple Sequence Repeats on Human Tissue Specific Genes Promoters

Comprehensive mining on medical homepage records using Bayesian network approach

Predicting Protein - RNA Binding Sites Using Sequence Statistical Feature of Amino Acids

PAC-Bayes理论及应用研究综述

基于PAC-Bayes边界理论的SVM模型选择方法

BAAQ: An infrastructure for application integration and knowledge discovery in Bioinformatics

On the PAC-Bayes Bound Calculation based on Reproducing Kernel Hilbert Space

On the Generalization of PAC-Bayes Bound for SVM Linear Classifier

Hybridizing adaptive biogeography-based optimization with differential evolution for motif discovery

Hybridizing biogeography-based optimization with differential evolution for motif discovery problem

Mapcombine: A lightweight solution to improve the efficiency of iterative mapreduce

A probability based similarity scoring for DNA motifs Comparison

Biogeography-based optimization for motif discovery problem

一种基于半马尔科夫随机游走的迭代加权子图查询算法

人类肾脏组织特异性蛋白网络构建及分析

An Improved Algorithm for K-anonymity

期刊信息

《计算机应用》
北大核心期刊（2011版）

主管单位:四川省科学技术协会
主办单位:四川省计算机学会中国科学院成都分院
主编：张景中
地址：成都市人民南路四段九号科分院计算所
邮编：610041
邮箱：xzh@joca.cn
电话：028-85224283

国际标准刊号：ISSN：1001-9081
国内统一刊号：ISSN：51-1307/TP
邮发代号:62-110

获奖情况:
全国优秀科技期刊一等奖,国家期刊奖提名奖,中国期刊方阵双奖期刊,中文核心期刊,中国科技核心期刊

国内外数据库收录:
俄罗斯文摘杂志,波兰哥白尼索引,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:53679