东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Boosting方法在高维数据分析中的应用

期刊名称：中国医院统计. 2011. 18(1): 1-5
时间：0
分类：R331.143[医药卫生—人体生理学;医药卫生—基础医学]
作者机构：[1]哈尔滨医科大学公共卫生学院统计教研室,黑龙江省哈尔滨市150081
相关基金：国家自然科学基金资助（30872185）
相关项目：代谢组动态指纹图谱的统计特征提取及数据分析方法研究

关键词： BOOSTING, 高维数据, 分类研究, 特征筛选, Boosting High dimensional data Classification Feature selection

中文摘要：

目的随着现代基因组学、蛋白组学和代谢组学等研究兴起,产生了大量的高维组学数据.对高维组学数据的分析,其重要任务是对样品进行分类及筛选出具有生物学意义的特征标志物.本项研究针对这一问题,采用目前公认效果较好的Boosting方法进行高维数据分析,并探讨Boosting算法在高维数据研究中的应用条件和效果.方法通过多次迭代,Boosting能够将基础弱分类器（决策树）形成优效分类器.模拟试验研究和验证了在含有大量无差异变量情况下对分类及变量重要性度量的效果,并通过实际基因表达数据进一步考核其应用效果.结果模拟试验显示,应用Boosting方法与决策树所建的组合模型对分类具有较高的准确性,并对噪声变量的干扰具有一定的抵抗能力.分类的同时能够对变量的重要性进行有效的评价;在保留了所有基因的情况下,对结肠癌真实基因表达数据的分类效果甚为理想,并为医学研究中结肠癌致病基因的发现提供了线索.结论基于决策树所构造的Boosting组合分类模型,可以有效地应用于高维数据的判别分类及变量重要性评价的问题.Boosting算法在解决小样本、多噪声的高维问题中表现出许多潜在的优势,与目前使用的其他方法相比,对于具有复杂结构高维数据,Boosting算法有其明显的自身特点,如运算速度快,适用性更强,软件实现相对容易等,是一种值得推荐和进一步研究的方法.

英文摘要：

Objective High-dimensional omics data are generated along with the rise of modern genomics, proteomics and metabonomics experiments. The primary task for high-dimensional omics data analysis is classification of the samples and se- lection of the biologically significant biomarkers. We adopted boosting, a well-recommended machine learning method to analysis high-dimensional data, and discussed the conditions and the effects of boosting in the application of high-dimensional data. Methods By the way of multiple iteration, boosting would change the weak classifier （ decision trees） into a strong one. The effect of the classifier was tested by simulations and real gene expression data. Results Simulations showed that models con- structed by boosting performed well even when the amount of noise increased. While classifying, boosting evaluated the impor- tance of variables effectively. Under the condition of keeping all the genes, similar results also got from real gene expression data of colon cancer, features selected by boosting provided important clues for the discovery of pathogenic genes in colon cancer. Conclusion Boosting models could be effectively used in the field of classification high dimensional data and the evaluation of the importance of viables. Comparing with other methods used nowadays, when dealing with complicated high dimensional data, boosting shows lots of potential advantages, such as rapid computation, wide applicability and easy programming. Therefore, boosting is a recommended method and needs further studies.

同期刊论文项目

代谢组动态指纹图谱的统计特征提取及数据分析方法研究

期刊论文 15

同项目期刊论文

关联规则分析在中医古代医籍方剂筛选中的应用

One-step synthesis of mesoporous two-line ferrihydrite for effective elimination of arsenic contamin

The toxicity and long-term efficacy of nedaplatin and paclitaxel treatment as neoadjuvant chemothera

Facile synthesis of Ag nanoparticles supported on MWCNTs with favorable stability and their bacteric

随机森林方法在基因表达数据分析中的应用及研究进展

组学数据的核主成分聚类分析的可视化方法

中药研究中代谢组指纹图谱数据分析的思想与方法

贝叶斯因果关系网络模型在断面调查数据中的应用

无金标准诊断试验灵敏度和特异度的贝叶斯估计方法

药品临床试验电子数据采集(EDC)系统的研究与开发

Radviz可视化方法在基因表达数据分析中的应用

多重假设检验中FDR的控制与估计方法