东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Comparison of Supervised Clustering Methods for the Analysis of DNA Microarray Expression Data

ISSN号：0253-9772
期刊名称：《遗传》
时间：0
分类：S188[农业科学—农业基础科学]
作者机构：[1]Jiangsu Provincial Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou 225009, P.R.China, [2]The School of Public Health, Nantong University, Nantong 226001, P.R.China
相关基金：This research was supported by the National Natural Science Foundation of China （30370758） and Program for New Century Excellent Talents in Universities （NCET） of Ministry of Education to Dr. Xu Chenwu （NCET-05-0502）.

作者： XIAO Jing[1,2], WANG Xue-feng[1], YANG Ze-feng[1], XU Chen-wu[1]

关键词：支持向量机器, DNA, 基因表达, 微阵列, microarray, supervised clustering, k-nearest-neighbor （KNN）, support vector machines （SVMs）

中文摘要：

几个典型监督聚类的方法象 Gaussian 混合那样基于模型聚类监督(GMM ) ， k-nearest-neighbor (KNN ) ，二进制代码支持向量机器(SVM ) ，多，类支持向量机器(MC-SVMs ) 被采用分类计算机模拟数据和二个真实微数组表达式数据集。假积极，假否定，真积极，真否定，聚类精确性和马修的相关系数(MCC ) 在这些方法之中被比较。结果如下：(1 ) 在分类几千个基因表示数据，二个 GMM 方法的表演根据微数组数据的整个集合是有限混合物的假设有最大的聚类精确性和最少外套 FP+FN 错误数字多变量 Gaussian。而且，当训练样品的数字是很小的时， GMM-II 方法的聚类的精确性在 GMM-I 方法上有优势。( 2 )一般来说， MC-SVMs 的优异分类表演更柔韧、更实际，它对维数的诅咒不太敏感，并且不仅靠着在聚类精确性到几千个基因表示数据的 GMM 方法，而且对高度维的基因表示样品的一个小数字更柔韧比另外的技术。(3 ) 在大样本尺寸上更好 MC-SVMs， OVO 和 DAGSVM 表现，而五个 MC-SVMs 方法在中等样品容量上有很类似的性能。当样品容量是小的时，在另外的情况，改写， WW 和 CS 中，产量更好导致。那么，这被推荐至少二个候选人方法，根据真实数据特征和试验性的条件选择，应该被执行并且与相比获得更好的聚类结果。

英文摘要：

Several typical supervised clustering methods such as Gaussian mixture model-based supervised clustering （GMM）, k- nearest-neighbor （KNN）, binary support vector machines （SVMs） and multiclass support vector machines （MC-SVMs） were employed to classify the computer simulation data and two real microarray expression datasets. False positive, false negative, true positive, true negative, clustering accuracy and Matthews＇ correlation coefficient （MCC） were compared among these methods. The results are as follows：（1） In classifying thousands of gene expression data, the performances of two GMM methods have the maximal clustering accuracy and the least overall FP＋FN error numbers on the basis of the assumption that the whole set of microarray data are a finite mixture of multivariate Gaussian distributions. Furthermore, when the number of training sample is very small, the clustering accuracy of GMM-Ⅱ method has superiority over GMM- Ⅰ method. （2） In general, the superior classification performance of the MC-SVMs are more robust and more practical, which are less sensitive to the curse of dimensionality, and not only next to GMM method in clustering accuracy to thousands of gene expression data, but also more robust to a small number of high-dimensional gene expression samples than other techniques. （3） Of the MC-SVMs, OVO and DAGSVM perform better on the large sample sizes, whereas five MC-SVMs methods have very similar performance on moderate sample sizes. In other cases, OVR, WW and CS yield better results when sample sizes are small. So, it is recommended that at least two candidate methods, choosing on the basis of the real data features and experimental conditions, should be performed and compared to obtain better clustering result.

同期刊论文项目

阈性状数量基因图的构建方法及其应用研究

期刊论文 35 会议论文 5

同项目期刊论文

A Method for identification of

四向杂交设计QTL分析的极大似然

贝叶斯统计在QTL作图中的应用研

基于株平均值的胚乳性状QTL作图

基于单粒观察值的胚乳性状QTL图

胚乳性状主基因的分离分析方法

谷物胚乳性状数量基因定位新方法

A new statistical method for m

Comparative Study of SBP-box G

多个相关数量性状主基因的联合分

贝叶斯回归分析方法及其在QTL作

谷类作物胚乳品质性状的QTL分析

基于EM算法的遗传重组率估计方法

Multivariate segregation analy

A Mixture Model Approach to th

标记辅助选择育种中QTL基因型的

A multivariate model for ordin

一种基于似然极大的动态聚类方法

基于DNA微阵列数据的基因聚类方

An EM algorithm for mapping qu

Joint mapping of quantitative

微阵列表达谱监督聚类方法的比较

Joint Analysis Method for Majo

多亲本杂交衍生的多个相关群体QTL作图的通用方法

一种基于似然极大的动态聚类方法及其应用

四向杂交设计QTL分析的极大似然方法

拟南芥和水稻cystatin基因家族的生物信息学分析

贝叶斯统计在QTL作图中的应用研究进展

Joint Analysis Method for Major Genes Controlling Multiple Correlated Quantitative Traits

复杂性状遗传分析策略和方法研究进展

微阵列表达谱监督聚类方法的比较研究

基于单粒观察值的胚乳性状QTL图的构建方法

基于株平均值的胚乳性状QTL作图的极大似然方法

Advances in the Research of Strategies and Methods for Analyzing Complex Traits

期刊信息

《遗传》
中国科技核心期刊

主管单位:中国科学院
主办单位:中国遗传学会
主编：张永清
地址：北京朝阳区北辰西路1号院中国科学院遗传发育所
邮编：100101
邮箱：yczz@genetics.ac.cn
电话：010-64807669

国际标准刊号：ISSN：0253-9772
国内统一刊号：ISSN：11-1913/R
邮发代号:2-810

获奖情况:
中国自然科学核心期刊,《CAJ-CD》执行优秀奖,2008年12月获“中国精品科技期刊”证书和北京市印...

国内外数据库收录:
美国化学文摘（网络版）,英国农业与生物科学研究中心文摘,荷兰文摘与引文数据库,美国生物医学检索系统,美国生物科学数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:23270