位置:成果数据库 > 期刊 > 期刊详情页
Comparison of Supervised Clustering Methods for the Analysis of DNA Microarray Expression Data
  • ISSN号:0253-9772
  • 期刊名称:《遗传》
  • 时间:0
  • 分类:S188[农业科学—农业基础科学]
  • 作者机构:[1]Jiangsu Provincial Key Laboratory of Crop Genetics and Physiology, Yangzhou University, Yangzhou 225009, P.R.China, [2]The School of Public Health, Nantong University, Nantong 226001, P.R.China
  • 相关基金:This research was supported by the National Natural Science Foundation of China (30370758) and Program for New Century Excellent Talents in Universities (NCET) of Ministry of Education to Dr. Xu Chenwu (NCET-05-0502).
中文摘要:

几个典型监督聚类的方法象 Gaussian 混合那样基于模型聚类监督(GMM ) , k-nearest-neighbor (KNN ) ,二进制代码支持向量机器(SVM ) ,多,类支持向量机器(MC-SVMs ) 被采用分类计算机模拟数据和二个真实微数组表达式数据集。假积极,假否定,真积极,真否定,聚类精确性和马修的相关系数(MCC ) 在这些方法之中被比较。结果如下:(1 ) 在分类几千个基因表示数据,二个 GMM 方法的表演根据微数组数据的整个集合是有限混合物的假设有最大的聚类精确性和最少外套 FP+FN 错误数字多变量 Gaussian。而且,当训练样品的数字是很小的时, GMM-II 方法的聚类的精确性在 GMM-I 方法上有优势。( 2 )一般来说, MC-SVMs 的优异分类表演更柔韧、更实际,它对维数的诅咒不太敏感,并且不仅靠着在聚类精确性到几千个基因表示数据的 GMM 方法,而且对高度维的基因表示样品的一个小数字更柔韧比另外的技术。(3 ) 在大样本尺寸上更好 MC-SVMs, OVO 和 DAGSVM 表现,而五个 MC-SVMs 方法在中等样品容量上有很类似的性能。当样品容量是小的时,在另外的情况,改写, WW 和 CS 中,产量更好导致。那么,这被推荐至少二个候选人方法,根据真实数据特征和试验性的条件选择,应该被执行并且与相比获得更好的聚类结果。

英文摘要:

Several typical supervised clustering methods such as Gaussian mixture model-based supervised clustering (GMM), k- nearest-neighbor (KNN), binary support vector machines (SVMs) and multiclass support vector machines (MC-SVMs) were employed to classify the computer simulation data and two real microarray expression datasets. False positive, false negative, true positive, true negative, clustering accuracy and Matthews' correlation coefficient (MCC) were compared among these methods. The results are as follows: (1) In classifying thousands of gene expression data, the performances of two GMM methods have the maximal clustering accuracy and the least overall FP+FN error numbers on the basis of the assumption that the whole set of microarray data are a finite mixture of multivariate Gaussian distributions. Furthermore, when the number of training sample is very small, the clustering accuracy of GMM-Ⅱ method has superiority over GMM- Ⅰ method. (2) In general, the superior classification performance of the MC-SVMs are more robust and more practical, which are less sensitive to the curse of dimensionality, and not only next to GMM method in clustering accuracy to thousands of gene expression data, but also more robust to a small number of high-dimensional gene expression samples than other techniques. (3) Of the MC-SVMs, OVO and DAGSVM perform better on the large sample sizes, whereas five MC-SVMs methods have very similar performance on moderate sample sizes. In other cases, OVR, WW and CS yield better results when sample sizes are small. So, it is recommended that at least two candidate methods, choosing on the basis of the real data features and experimental conditions, should be performed and compared to obtain better clustering result.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《遗传》
  • 中国科技核心期刊
  • 主管单位:中国科学院
  • 主办单位:中国遗传学会
  • 主编:张永清
  • 地址:北京朝阳区北辰西路1号院中国科学院遗传发育所
  • 邮编:100101
  • 邮箱:yczz@genetics.ac.cn
  • 电话:010-64807669
  • 国际标准刊号:ISSN:0253-9772
  • 国内统一刊号:ISSN:11-1913/R
  • 邮发代号:2-810
  • 获奖情况:
  • 中国自然科学核心期刊,《CAJ-CD》执行优秀奖,2008年12月获“中国精品科技期刊”证书和北京市印...
  • 国内外数据库收录:
  • 美国化学文摘(网络版),英国农业与生物科学研究中心文摘,荷兰文摘与引文数据库,美国生物医学检索系统,美国生物科学数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:23270